Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX provide predictions in the original space in RidgeCV #15854

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

glemaitre
Copy link
Member

Reference Issues/PRs

closes #13998

What does this implement/fix? Explain your changes.

The predictions returned by RidgeCV are not in the original space but in the normalized space (mean removed and divided by the std. dev.). One has to always denormalize them to be able to compute a score. This PR fixes this issue.

Any other comments?

@glemaitre glemaitre changed the title FIX provide predictions in the original space in RidgeCV [MRG] FIX provide predictions in the original space in RidgeCV Dec 10, 2019
@glemaitre
Copy link
Member Author

@glemaitre
Copy link
Member Author

@qinhanmin2014 This PR still does not focus on correcting the errors reported. I will address this thing in the next PR, changing the tests accordingly.

@qinhanmin2014
Copy link
Member

Perhaps it's better tp correct the wrong scores in this PR?

@glemaitre
Copy link
Member Author

Perhaps it's better to correct the wrong scores in this PR?

I would prefer to go step by step. By I promise that I will do the PR :)

@glemaitre
Copy link
Member Author

To be more convincing, the 2 bugs are not related so we could have to bug fixes :)

Base automatically changed from master to main January 22, 2021 10:51
self.cv_values_[:, i] = predictions.ravel()
# compute the predictions in the original space
self.cv_values_[:, i] = (
predictions.ravel() / np.tile(sqrt_sw, n_y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may not be a very meaningful input but I don't think anything forbids some sample weights to be 0

import numpy as np
from sklearn.linear_model import RidgeCV
from sklearn.datasets import make_regression

x, y = make_regression()
sw = np.ones(len(x))
sw[0] = 0.0
r = RidgeCV(store_cv_values=True, scoring="neg_mean_squared_error").fit(x, y, sample_weight=sw)
print(r.cv_values_.mean(axis=0)) # nan

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you do in this case. You cannot recover predictions in the original space, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest to set these values to y_offset since this is the best we can do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I guess we cannot recover the predictions but at least with the change you made we avoid the division by 0 warning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as rescaling multiplies by the square root of sample weights the prediction would indeed be the intercept for those samples

@glemaitre glemaitre changed the title [MRG] FIX provide predictions in the original space in RidgeCV FIX provide predictions in the original space in RidgeCV Nov 7, 2023
Copy link

github-actions bot commented Nov 7, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a04c0d6. Link to the linter CI: here

Copy link
Contributor

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! now the next step is to fix the scores when we use the default scoring with sample weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RidgeCV cv_values_ are for preprocessed data: centered and scaled by sample weights.
3 participants