-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX provide predictions in the original space in RidgeCV #15854
base: main
Are you sure you want to change the base?
Conversation
@qinhanmin2014 This PR still does not focus on correcting the errors reported. I will address this thing in the next PR, changing the tests accordingly. |
Perhaps it's better tp correct the wrong scores in this PR? |
I would prefer to go step by step. By I promise that I will do the PR :) |
To be more convincing, the 2 bugs are not related so we could have to bug fixes :) |
sklearn/linear_model/_ridge.py
Outdated
self.cv_values_[:, i] = predictions.ravel() | ||
# compute the predictions in the original space | ||
self.cv_values_[:, i] = ( | ||
predictions.ravel() / np.tile(sqrt_sw, n_y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may not be a very meaningful input but I don't think anything forbids some sample weights to be 0
import numpy as np
from sklearn.linear_model import RidgeCV
from sklearn.datasets import make_regression
x, y = make_regression()
sw = np.ones(len(x))
sw[0] = 0.0
r = RidgeCV(store_cv_values=True, scoring="neg_mean_squared_error").fit(x, y, sample_weight=sw)
print(r.cv_values_.mean(axis=0)) # nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you do in this case. You cannot recover predictions in the original space, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you suggest to set these values to y_offset
since this is the best we can do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I guess we cannot recover the predictions but at least with the change you made we avoid the division by 0 warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as rescaling multiplies by the square root of sample weights the prediction would indeed be the intercept for those samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! now the next step is to fix the scores when we use the default scoring with sample weights
Reference Issues/PRs
closes #13998
What does this implement/fix? Explain your changes.
The predictions returned by
RidgeCV
are not in the original space but in the normalized space (mean removed and divided by the std. dev.). One has to always denormalize them to be able to compute a score. This PR fixes this issue.Any other comments?