GPR not optimizing #12444

klinnell · 2018-10-23T18:06:37Z

Description

When using GPR, it currently does NOT optimize the hyperparameters when kernel is set to "None" as it says it is should. (GaussianProcessRegressor from gpr.py)

Steps/Code to Reproduce

The code below should produce some GPR interpolation, but if you change the y values you can notice that the kernel doesn't change. This is because the optimization step is not being performed in grp.py.

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel
#from __future__ import division

x = np.random.rand(10,2)
y = np.array(np.exp(range(0,10)))
y2 = np.array(np.exp(range(-10,0)))

GP = GaussianProcessRegressor(n_restarts_optimizer=20)
GP.fit(x,y)
print GP.kernel_
GP.fit(x,y2)
print GP.kernel_

What I did to fix this:
The commented code is what WAS there (line 173-175 of gpr.py) and the uncommented code is what I replaced it with. declaring the bounds as "fixed" overrides the ability to optimize.

        # if self.kernel is None:  # Use an RBF kernel as default
        #     self.kernel_ = C(1.0, constant_value_bounds="fixed") \
        #         * RBF(1.0,)
        if self.kernel is None:  # Use an RBF kernel as default
            self.kernel_ = C(1.0) \
                * RBF(1.0)

jnothman · 2018-10-28T05:45:15Z

I think the docs are a bit ambiguous, but I don't think we have any motivation to change the behaviour. Please feel free to submit a pull request clarifying the documentation.

ogrisel · 2020-10-02T13:06:53Z

I think the docs are a bit ambiguous, but I don't think we have any
motivation to change the behaviour. Please feel free to submit a pull
request clarifying the documentation.

I think having a self-tuning kernel by default is a nice feature and matches the "spirit" of the docstring. It might be considered a bug that it does not do so by default.

How many tests in the existing test suite would fail if we changed that?

klinnell · 2020-10-02T13:13:45Z

Hi, I’m sorry- should I be responding to this? Thanks, Kelsey On Oct 2, 2020, at 9:07 AM, Olivier Grisel <notifications@github.com> wrote: I think the docs are a bit ambiguous, but I don't think we have any motivation to change the behaviour. Please feel free to submit a pull request clarifying the documentation. I think having a self-tuning kernel by default is a nice feature and matches the "spirit" of the docstring. It might be considered a bug that it does not do so by default. How many tests in the existing test suite would fail if we changed that? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#12444 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHHRRHLT5SR6ENVZLUCS3UDSIXF7ZANCNFSM4F6ZC3DA>.

hkennyv · 2020-10-02T16:33:32Z

@ogrisel I agree that having the kernel adapt was my expected behavior when using this. I removed the "fixed" bounds from the default kernel and ran the test suite and it seems like no tests are breaking and the doc examples pass (unless I'm not testing another affected module):

(venv) khuynh@kmac:github/scikit-learn ‹clarify-gaussian-process-regressor-docs*›$ pytest sklearn/gaussian_process 
==================================================================== test session starts ====================================================================
platform darwin -- Python 3.8.4, pytest-6.1.0, py-1.9.0, pluggy-0.13.1
rootdir: /Users/khuynh/me/develop/github/scikit-learn, configfile: setup.cfg
collected 422 items                                                                                                                                         

sklearn/gaussian_process/_gpc.py .                                                                                                                    [  0%]
sklearn/gaussian_process/_gpr.py .                                                                                                                    [  0%]
sklearn/gaussian_process/kernels.py .............                                                                                                     [  3%]
sklearn/gaussian_process/tests/test_gpc.py ....................................                                                                       [ 12%]
sklearn/gaussian_process/tests/test_gpr.py ..........................................................................................                 [ 33%]
sklearn/gaussian_process/tests/test_kernels.py ...................................................................................................... [ 57%]
..................................................................................................................................................... [ 92%]
..............................                                                                                                                        [100%]

============================================================= 422 passed, 72 warnings in 10.37s =============================================================
(venv) khuynh@kmac:github/scikit-learn ‹clarify-gaussian-process-regressor-docs*›$ pytest doc/modules/gaussian_process.rst 
==================================================================== test session starts ====================================================================
platform darwin -- Python 3.8.4, pytest-6.1.0, py-1.9.0, pluggy-0.13.1
rootdir: /Users/khuynh/me/develop/github/scikit-learn, configfile: setup.cfg
collected 1 item                                                                                                                                            

doc/modules/gaussian_process.rst .                                                                                                                    [100%]

However, I did notice that there's this test that would probably have to change if we decide to change the default kernel behavior:

scikit-learn/sklearn/gaussian_process/tests/test_gpr.py

Lines 438 to 451 in 2538489

    
           def test_no_fit_default_predict(): 
        
               # Test that GPR predictions without fit does not break by default. 
        
               default_kernel = (C(1.0, constant_value_bounds="fixed") * 
        
                                 RBF(1.0, length_scale_bounds="fixed")) 
        
               gpr1 = GaussianProcessRegressor() 
        
               _, y_std1 = gpr1.predict(X, return_std=True) 
        
               _, y_cov1 = gpr1.predict(X, return_cov=True) 
        
               gpr2 = GaussianProcessRegressor(kernel=default_kernel) 
        
               _, y_std2 = gpr2.predict(X, return_std=True) 
        
               _, y_cov2 = gpr2.predict(X, return_cov=True) 
        
               assert_array_almost_equal(y_std1, y_std2) 
        
               assert_array_almost_equal(y_cov1, y_cov2)

P.S. sorry for bumping such an old issue @klinnell, I just opened a PR at #18518 to try to address this. it seems like you're not the only one who expected a self-adapting kernel though!

edit: looks like there are quite a few tests that do fail in sklearn/tests/test_common.py

(venv) khuynh@kmac:github/scikit-learn ‹clarify-gaussian-process-regressor-docs*›$ pytest sklearn/tests/test_common.py -k GaussianProcessRegressor --no-summary
==================================================================== test session starts ====================================================================
platform darwin -- Python 3.8.4, pytest-6.1.0, py-1.9.0, pluggy-0.13.1
rootdir: /Users/khuynh/me/develop/github/scikit-learn, configfile: setup.cfg
collected 6792 items / 6751 deselected / 41 selected

sklearn/tests/test_common.py .FF....FFFFFFFFFFFFFF.FFFFF..FFFFF..FFFFF                                                                                [100%]

================================================ 31 failed, 10 passed, 6751 deselected, 15 warnings in 6.68s ================================================

most of them seem related to GaussianProcessRegressor.kernel_ not being set until after .fit is run

adrinjalali added Documentation help wanted module:gaussian_process labels Jun 17, 2020

hkennyv mentioned this issue Oct 2, 2020

DOC Clarify GaussianProcessRegressor default kernel behavior #18518

Merged

cmarmo removed the help wanted label Oct 2, 2020

hkennyv mentioned this issue Oct 5, 2020

Gaussian Process Regressor Kernel Default inaccurate #17871

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPR not optimizing #12444

GPR not optimizing #12444

klinnell commented Oct 23, 2018 •

edited by rth

jnothman commented Oct 28, 2018 via email

ogrisel commented Oct 2, 2020

klinnell commented Oct 2, 2020 via email

hkennyv commented Oct 2, 2020 •

edited

GPR not optimizing #12444

GPR not optimizing #12444

Comments

klinnell commented Oct 23, 2018 • edited by rth

Description

Steps/Code to Reproduce

jnothman commented Oct 28, 2018 via email

ogrisel commented Oct 2, 2020

klinnell commented Oct 2, 2020 via email

hkennyv commented Oct 2, 2020 • edited

klinnell commented Oct 23, 2018 •

edited by rth

hkennyv commented Oct 2, 2020 •

edited