Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RidgeCV cv_values_ are for preprocessed data: centered and scaled by sample weights. #13998

Open
jeromedockes opened this issue May 31, 2019 · 2 comments · May be fixed by #15648 or #15854
Open

RidgeCV cv_values_ are for preprocessed data: centered and scaled by sample weights. #13998

jeromedockes opened this issue May 31, 2019 · 2 comments · May be fixed by #15648 or #15854

Comments

@jeromedockes
Copy link
Contributor

when store_cv_values=True, RidgeCV stores the leave-one-out squared errors,
when scoring=None, or the leave-one-out predictions, when scoring is
provided by the user, in its cv_values_ attribute.

However, when scoring is not None, it stores the predictions for the
preprocessed data, i.e. rescaled by the square roots of the sample weights and
with the mean of y removed:

import numpy as np
from sklearn.linear_model import RidgeCV
from sklearn.datasets import make_regression

x, y = make_regression(n_samples=6, n_features=2, random_state=0)
squared_error = RidgeCV(
    store_cv_values=True, alphas=[10.]).fit(x, y).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y)
# to get the actual predictions we need to add the y mean
custom = (y - (custom_scoring.cv_values_.ravel() + y.mean()))**2
assert np.allclose(squared_error, custom)

sw = np.arange(6) + 1
squared_error = RidgeCV(store_cv_values=True, alphas=[10.]).fit(
    x, y, sample_weight=sw).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y, sample_weight=sw)
# to get the actual predictions we need to rescale by inverse square root
# sample weights and add the y mean
custom = sw * (y
               - (custom_scoring.cv_values_.ravel() / np.sqrt(sw)
                  + np.average(y, weights=sw)))**2
assert np.allclose(squared_error, custom)

I think that for a user, it would be easier to get directly the predictions in
the original space, and not need to do this post-processing of cv_values_.

Should we rescale the cv values and add the intercept during fit?

@glemaitre
Copy link
Member

ping @agramfort

@agramfort
Copy link
Member

yes cv_values_ should be put in native space and fixed when using a custom scoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants