FIX compute `y_std` properly with multi-target in GPR #20761

patrickctrf · 2021-08-17T04:47:07Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

GaussianProcessRegressor could not retrieve y_std
for predict(X, return_std=True) when n_targets were bigger than 1.
This happened because line 415 in file
"sklearn/gaussian_process/_gpr.py" tried to multiply
y_var * self._y_train_std ** 2 using simple multiplication ( a1 * a2).

However, it fails when self._y_train_std has more than one feature
(when n_targets is more than 1), so we need to implement this
multiplication using np.outer product, because it will handle
the conventional scalar-array multiplication for EACH output feature
(self._y_train_std contains one normalization rate for each output
feature).

After this fix, each output feature will receive its respective
normalized variance when return_std == True.

A simple example of how to train gpr and reproduce the error:

#=====breaking-code-before-fix============================================

import numpy as np

X_train = np.random.rand((11, 10))
y_train = np.random.rand((11, 6)) # 6 target features == ERROR

X_test = np.random.rand((4, 10))

import sklearn.gaussian_process as gp

kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))

model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)

model.fit(X_train, y_train)
params = model.kernel_.get_params()

y_pred, std = model.predict(X_test, return_std=True)

#=====end-of-breaking-code-before-fix======================================


#=====well-running-code-before-fix============================================

import numpy as np

X_train = np.random.rand((11, 1))
y_train = np.random.rand((11, 1)) # 1 target feature == FINE

X_test = np.random.rand((4, 1))

import sklearn.gaussian_process as gp

kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))

model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)

model.fit(X_train, y_train)
params = model.kernel_.get_params()

y_pred, std = model.predict(X_test, return_std=True)

#=====end-of-well-running-code-before-fix======================================

Line 415 in file "sklearn/gaussian_process/_gpr.py":

# before
y_var = y_var * self._y_train_std ** 2

# after
y_var = np.outer(y_var, self._y_train_std ** 2)

It could not retrieve y_std for predict(X, return_std=True) when n_targets were bigger than 1. This happened because line 415 in file "sklearn/gaussian_process/_gpr.py" tried to multiply y_var * self._y_train_std ** 2 using simple multiplication ( a1 * a2). However, it fails when self._y_train_std has more than one feature (when n_targets is more than 1), so we need to implement this multiplication using np.outer product, because it will handle the conventional scalar-array multiplication for each output feature (self._y_train_std contains one normalization rate for each output feature).

glemaitre · 2021-08-17T08:11:47Z

Tests are failing, you need to:

reshape to (n_samples,) instead of (n_samples, 1) for a single target (this is the reason why the tests fail
add a non-regression test
add an entry in whats new

patrickctrf · 2021-08-24T22:35:11Z

Tests are failing, you need to:

reshape to (n_samples,) instead of (n_samples, 1) for a single target (this is the reason why the tests fail

add a non-regression test

add an entry in whats new

Thanks for your review! What is a non-regression task? I'm only editing GPR, not GPC (classifier).
Also, where is "what's new" session? I couldn't find it, sorry. It is my first contribution.

glemaitre · 2021-08-30T11:43:01Z

We are still missing a non-regression test and an entry to the what's new.

glemaitre · 2021-09-02T17:02:19Z

You can read the following regarding regression testing

Here, the idea is to write a test function that will check that a code that was previously failing is giving the proper result now. Basically, it corresponds to isolate the code that you put in your first comment breaking-code-before-fix and add some assert. This function should be implemented in test_gpr.py.

Please add an entry to the change log at doc/whats_new/v1.0.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

ToDo: Fix linting error.

patrickctrf · 2021-09-09T06:40:59Z

I am getting a Linting error after I've added an entry in whats_new and the non-regressive test.
Can you tell me where the error is happening? I've tried different linting inspections but I had no success.

glemaitre · 2021-09-09T07:41:30Z

You can see the failure there. You can as well click on the different "Details" that will land on this page.

The issue is that you need to run black on the file such that no style diff is detected by the CI. The following command would do the job:

black -t py37 sklearn/gaussian_process/tests/test_gpr.py

and commit the changes. Something handy instead to do this change manually is to install pre-commit as mentioned in the contributing guide (item 9.)

glemaitre · 2021-09-09T07:45:29Z

sklearn/gaussian_process/tests/test_gpr.py

+    Check if GPR can compute y_std in predict() method when normalize_y==True
+    in multi-target regression.
+    """
+    X_train = np.random.rand((11, 10))


Your test is failing as well: cf. there

Here it would be best to define a random state and then you can just use randn:

rng = np.random.RandomState(42) n_samples, n_features, n_targets = 12, 10, 6 X_train = rng.randn(n_samples, n_features) y_train = rng.randn(n_samples, n_targets) X_test = rng.randn(n_samples, n_features)

glemaitre · 2021-09-09T07:47:14Z

sklearn/gaussian_process/tests/test_gpr.py

+    # Generic kernel
+    kernel = kernels.ConstantKernel(1.0, (1e-1, 1e3))
+    kernel *= kernels.RBF(10.0, (1e-3, 1e3))


Suggested change

# Generic kernel

kernel = kernels.ConstantKernel(1.0, (1e-1, 1e3))

kernel *= kernels.RBF(10.0, (1e-3, 1e3))

# Generic kernel

kernel = (

kernels.ConstantKernel(1.0, (1e-1, 1e3))

* kernels.RBF(10.0, (1e-3, 1e3))

)

glemaitre · 2021-09-09T07:47:51Z

sklearn/gaussian_process/tests/test_gpr.py

+    in multi-target regression.
+    """
+    X_train = np.random.rand((11, 10))
+    # 6 target features -> multi-target


no need for this comment. Variable names and the code should be as much as possible self-explanatory.

glemaitre · 2021-09-09T07:48:00Z

sklearn/gaussian_process/tests/test_gpr.py

+    kernel = kernels.ConstantKernel(1.0, (1e-1, 1e3))
+    kernel *= kernels.RBF(10.0, (1e-3, 1e3))
+
+    # normalize_y == True


remove this comment

glemaitre · 2021-09-09T07:48:28Z

sklearn/gaussian_process/tests/test_gpr.py

+                                     alpha=0.1,
+                                     normalize_y=True)
+    model.fit(X_train, y_train)
+    y_pred, std = model.predict(X_test, return_std=True)


Suggested change

y_pred, std = model.predict(X_test, return_std=True)

y_pred, y_std = model.predict(X_test, return_std=True)

It is true that the previous code was failing already. But we could make a couple of assertions regarding the shape of the returned values.

assert y_pred.shape == (n_samples, n_targets)

and something similar for y_std

glemaitre · 2021-09-09T07:51:52Z

sklearn/gaussian_process/tests/test_gpr.py

+    """
+    Regression test for issues #17394 and #18065.
+    Check if GPR can compute y_std in predict() method when normalize_y==True
+    in multi-target regression.
+    """


Suggested change

"""

Regression test for issues #17394 and #18065.

Check if GPR can compute y_std in predict() method when normalize_y==True

in multi-target regression.

"""

"""Check that `y_std` is properly computed when `normalize_y=True`.

Non-regression test for:

https://github.com/scikit-learn/scikit-learn/issues/17394

https://github.com/scikit-learn/scikit-learn/issues/18065

"""

glemaitre · 2021-09-09T07:52:12Z

doc/whats_new/v1.0.rst

+:mod:`sklearn.gaussian_process`
+.........................
+
+- |Fix| Compute 'y_std' properly with multi-target in


Suggested change

- |Fix| Compute 'y_std' properly with multi-target in

- |Fix| Compute `y_std` properly with multi-target in

glemaitre · 2021-09-09T07:52:29Z

doc/whats_new/v1.0.rst

+:mod:`sklearn.gaussian_process`
+.........................


Suggested change

:mod:`sklearn.gaussian_process`

.........................

:mod:`sklearn.gaussian_process`

...............................

We will let this entry here for the moment but we will have to move it in a section 1.0.1 most probably.

patrickctrf · 2021-09-09T15:42:13Z

Thanks for your support, @glemaitre. Just one more question: When I run black, other functions in test_gpr.py were formatted. Is it ok or should I change it back to its original state?

glemaitre · 2021-09-09T17:10:14Z

Is it ok or should I change it back to its original state?

I think it is fine because the same command will be run and therefore lead to the same changes on the CI, normally.

glemaitre · 2021-09-09T17:10:59Z

I relaunched the failing build but it certainly only a timeout. So everything should be fine otherwise.

patrickctrf · 2021-09-10T22:20:21Z

Do I need to do anything else?

glemaitre · 2021-09-13T09:45:20Z

everything looks fine

glemaitre · 2021-09-16T09:41:00Z

Tagging this PR for 1.0.1 to include it in the next bug fix release

patrickctrf · 2021-09-28T16:04:02Z

Ok. It is asking for changes in whats_new section. Should I change it?

jjerphan

Thank you @patrickctrf!

LGTM with a few modifications for the change log and to follow conventions.

sklearn/gaussian_process/tests/test_gpr.py

jjerphan · 2021-09-30T11:38:37Z

sklearn/gaussian_process/tests/test_gpr.py

-            kernel=kernel,
-            n_restarts_optimizer=n_restarts_optimizer,
-            random_state=0,
+            kernel=kernel, n_restarts_optimizer=n_restarts_optimizer, random_state=0


This is unrelated to this PR motives and also comes with no semantic changes.
Has black changed it? If it is not the case, could you revert this change, please?

Yes, black changed it.

Can you try to revert those changes? Normally it should be possible to keep the original lines.

jjerphan · 2021-09-30T11:39:18Z

sklearn/gaussian_process/tests/test_gpr.py

+        theta_opt, func_min = (
+            initial_theta,
+            obj_func(initial_theta, eval_gradient=False),


Same comment as above.

Black changed it too.

Identically, can you try to revert those changes? Normally it should be possible to keep the original lines.

doc/whats_new/v1.0.rst

sklearn/gaussian_process/_gpr.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

jeremiedbb

thanks @patrickctrf. The same issue appears in the if return_cov branch. y_cov has a shape (n_samples, n_samples) but should have a shape (n_samples, n_samples, n_targets). Would you mind fixing that too ? np.einsum("ij,k->ijk", y_var, self._y_train_std ** 2) followed by a np.squeeze will do it.

sklearn/gaussian_process/_gpr.py

jeremiedbb

LGTM. Thanks @patrickctrf !

patrickctrf · 2021-10-21T11:12:42Z

Thanks for your support, guys!

…-learn#20761)

github-actions bot added the module:gaussian_process label Aug 17, 2021

glemaitre changed the title ~~fix: GaussianProcessRegressor fails to compute y_std when n_targets > 1~~ FIX compute y_std properly with multi-target in GPR Aug 17, 2021

fix: Reshape to (n_samples,) instead of (n_samples, 1) if single target

e6b7c2e

glemaitre added this to REVIEWED AND WAITING FOR CHANGES in Guillaume's pet Sep 6, 2021

patrickctrf added 6 commits September 9, 2021 02:16

feature: Add an entry in doc/whats_new, explaining changes and credits.

d59a825

feature: Add a non-regression test for the issue being fixed.

c321c32

fix: Linting error in tests file.

7acf698

fix: Persistent linting error in tests file.

acc4458

checkpoint: Testing linting demands

7d0d97a

feature: Added whats_new entry and non-regressive test.

eecb8bc

ToDo: Fix linting error.

glemaitre reviewed Sep 9, 2021

View reviewed changes

patrickctrf added 3 commits September 9, 2021 10:48

fix: Fix whats_new formatting.

df652ae

fix: test_gpr.py formatting fixed using black.

937b00a

fix: There was a typo in test function.

8a17971

checkpoint: Testing pipeline execution.

b8f7f3d

glemaitre approved these changes Sep 13, 2021

View reviewed changes

glemaitre mentioned this pull request Sep 13, 2021

PRs to include in 1.0.rc2 #21008

Merged

glemaitre added this to the 1.0.1 milestone Sep 16, 2021

jjerphan approved these changes Sep 30, 2021

View reviewed changes

patrickctrf and others added 4 commits October 9, 2021 01:28

refactor: Reformat sklearn/gaussian_process/tests/test_gpr.py

18871a7

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

fix: doc/whats_new/v1.0.rst pattern

1dc94bd

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

refactor: Update sklearn/gaussian_process/_gpr.py

57c0b26

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

refactor: Undo Black changes.

6b3d12e

glemaitre self-assigned this Oct 14, 2021

glemaitre added 5 commits October 14, 2021 11:34

Merge remote-tracking branch 'origin/main' into pr/patrickctrf/20761

c100c0d

fix whats new

c40487d

black

25c944b

Merge remote-tracking branch 'origin/main' into pr/patrickctrf/20761

7189727

fix changelog

da31226

jeremiedbb reviewed Oct 20, 2021

View reviewed changes

sklearn/gaussian_process/_gpr.py Outdated Show resolved Hide resolved

jeremiedbb self-requested a review October 20, 2021 15:22

jeremiedbb added 3 commits October 20, 2021 17:28

Merge branch 'master' into pr/patrickctrf/20761

1027317

y_cov as well

d0a1a74

cln

d52a00e

jeremiedbb approved these changes Oct 20, 2021

View reviewed changes

jeremiedbb merged commit 9b210ae into scikit-learn:main Oct 20, 2021

glemaitre mentioned this pull request Oct 23, 2021

Release 1.0.1 #21404

Merged

10 tasks

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 23, 2021

FIX compute y_std and y_cov properly with multi-target in GPR (scikit…

dbe615f

…-learn#20761)

glemaitre pushed a commit that referenced this pull request Oct 25, 2021

FIX compute y_std and y_cov properly with multi-target in GPR (#20761)

d4a48aa

glemaitre moved this from REVIEWED AND WAITING FOR CHANGES to MERGED in Guillaume's pet Oct 26, 2021

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

FIX compute y_std and y_cov properly with multi-target in GPR (scikit…

9fe47b6

…-learn#20761)

This was referenced Jan 10, 2022

Multi-target GPR predicts only 1 std when normalize_y=False #22174

Closed

Multi-target GPR sample_y fails when normalize_y=True #22175

Closed

	y_pred, std = model.predict(X_test, return_std=True)
	y_pred, y_std = model.predict(X_test, return_std=True)

-    """
-    Regression test for issues #17394 and #18065.
-    Check if GPR can compute y_std in predict() method when normalize_y==True
-    in multi-target regression.
-    """
+    """Check that `y_std` is properly computed when `normalize_y=True`.
+    Non-regression test for:
+    https://github.com/scikit-learn/scikit-learn/issues/17394
+    https://github.com/scikit-learn/scikit-learn/issues/18065
+    """

	- \|Fix\| Compute 'y_std' properly with multi-target in
	- \|Fix\| Compute `y_std` properly with multi-target in

FIX compute y_std properly with multi-target in GPR #20761

FIX compute y_std properly with multi-target in GPR #20761

Conversation

patrickctrf commented Aug 17, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

glemaitre commented Aug 17, 2021

patrickctrf commented Aug 24, 2021

glemaitre commented Aug 30, 2021

glemaitre commented Sep 2, 2021

patrickctrf commented Sep 9, 2021

glemaitre commented Sep 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickctrf commented Sep 9, 2021

glemaitre commented Sep 9, 2021

glemaitre commented Sep 9, 2021

patrickctrf commented Sep 10, 2021

glemaitre commented Sep 13, 2021

glemaitre commented Sep 16, 2021

patrickctrf commented Sep 28, 2021

jjerphan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

patrickctrf commented Oct 21, 2021

FIX compute `y_std` properly with multi-target in GPR #20761

FIX compute `y_std` properly with multi-target in GPR #20761