FIX scoring != None for RidgeCV should used unscaled y for evaluation#29842
Merged
Conversation
Member
|
Cross-linking #16298 as it might be related. |
Member
Author
|
So I added a test that was failing on I'm going to open another PR to not overload this PR. |
thomasjpfan
reviewed
Sep 18, 2024
Member
Author
|
I added a new parametrization to check that we support multioutput properly. |
lorentzenchr
approved these changes
Sep 18, 2024
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
thomasjpfan
approved these changes
Sep 18, 2024
Member
|
@glemaitre Could you fix the typos in the whatsnew entry? |
lorentzenchr
pushed a commit
that referenced
this pull request
Sep 18, 2024
kbharat1210
pushed a commit
to kbharat1210/scikit-learn
that referenced
this pull request
Sep 25, 2024
…scikit-learn#29842) Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
kbharat1210
pushed a commit
to kbharat1210/scikit-learn
that referenced
this pull request
Sep 25, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #13998
closes #15648
While discussing with @jeromedockes, we recall to have observed something weird in the
RidgeCVcode. I check a bit closer and I open this PR to highlight what is the potential problem.In
RidgeCV, when havingsample_weightwe scale the data using thesqrt(sample_weight):scikit-learn/sklearn/linear_model/_ridge.py
Lines 2133 to 2136 in 35164b3
The idea is that the mean squared error can be expressed as:
scikit-learn/sklearn/linear_model/_base.py
Lines 212 to 223 in 35164b3
Those "centered" data are used to optimize the ridge loss. Later in the code, we want to compute a score that can be an arbitrary metric via a scorer.
scikit-learn/sklearn/linear_model/_ridge.py
Lines 2158 to 2169 in 35164b3
The problem here is that
predictionsis computed efficiently as provided in the GCV paper. But these predictions are in the "scaled" space and it seems incorrect to compute any metric in this space with an arbitrary metric. Instead, we should unscale these predictions and the scaled true targets to compute the metric in the original space.This is what this PR is intended to. I did not add any non-regression test (I assume that using the MedAE should lead to some failures) because I wanted to be sure that what I'm saying is correct.
@jeromedockes @ogrisel @lorentzenchr Does the above description make sense to you?
Edit: It seems that it relates to #13998 and #15648
Probably, I should check the tests that were written in #15648