`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights. #13998

jeromedockes · 2019-05-31T14:26:59Z

when store_cv_values=True, RidgeCV stores the leave-one-out squared errors,
when scoring=None, or the leave-one-out predictions, when scoring is
provided by the user, in its cv_values_ attribute.

However, when scoring is not None, it stores the predictions for the
preprocessed data, i.e. rescaled by the square roots of the sample weights and
with the mean of y removed:

import numpy as np
from sklearn.linear_model import RidgeCV
from sklearn.datasets import make_regression

x, y = make_regression(n_samples=6, n_features=2, random_state=0)
squared_error = RidgeCV(
    store_cv_values=True, alphas=[10.]).fit(x, y).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y)
# to get the actual predictions we need to add the y mean
custom = (y - (custom_scoring.cv_values_.ravel() + y.mean()))**2
assert np.allclose(squared_error, custom)

sw = np.arange(6) + 1
squared_error = RidgeCV(store_cv_values=True, alphas=[10.]).fit(
    x, y, sample_weight=sw).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y, sample_weight=sw)
# to get the actual predictions we need to rescale by inverse square root
# sample weights and add the y mean
custom = sw * (y
               - (custom_scoring.cv_values_.ravel() / np.sqrt(sw)
                  + np.average(y, weights=sw)))**2
assert np.allclose(squared_error, custom)

I think that for a user, it would be easier to get directly the predictions in
the original space, and not need to do this post-processing of cv_values_.

Should we rescale the cv values and add the intercept during fit?

The text was updated successfully, but these errors were encountered:

glemaitre · 2019-11-15T15:14:15Z

ping @agramfort

agramfort · 2019-11-17T16:11:18Z

yes cv_values_ should be put in native space and fixed when using a custom scoring.

jeromedockes mentioned this issue Oct 12, 2019

[MRG] Does not store all cv values nor all dual coef in _RidgeGCV fit #15183

Closed

thomasjpfan added the Needs Decision Requires decision label Oct 26, 2019

qinhanmin2014 linked a pull request Nov 18, 2019 that will close this issue

[MRG] FIX and ENH in _RidgeGCV #15648

Open

glemaitre linked a pull request Dec 10, 2019 that will close this issue

FIX provide predictions in the original space in RidgeCV #15854

Open

bmreiniger mentioned this issue Sep 9, 2020

RidgeCV cv_values_ documentation when scoring is not None #18364

Closed

cmarmo added the module:linear_model label Feb 27, 2022

glemaitre mentioned this issue May 18, 2022

Scorer in sklearn.linear_model.RidgeCV #23377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights. #13998

`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights. #13998

jeromedockes commented May 31, 2019

glemaitre commented Nov 15, 2019

agramfort commented Nov 17, 2019

RidgeCV cv_values_ are for preprocessed data: centered and scaled by sample weights. #13998

RidgeCV cv_values_ are for preprocessed data: centered and scaled by sample weights. #13998

Comments

jeromedockes commented May 31, 2019

glemaitre commented Nov 15, 2019

agramfort commented Nov 17, 2019

`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights. #13998

`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights. #13998