-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] _BaseRidgeCV doesn't pass scoring if cv is not None / RidgeClassifierCV shouldn't use continuous setting in inner CV #3302
Conversation
In retrospect, I think supporting the cv option in RidgeCV and RidgeClassifierCV was a mistake. It makes the code more complicated and it leads to inconsistencies (e.g., sample_weight is not handled properly). So I'd personally vote for focusing on doing only generalized CV (efficient LOO) well. This will allow us to remove the dependency on GridSearchCV and simplify the code. Users who want to use arbitrary CV objects should use GridSearchCV(Ridge(), cv=...) or GridSearchCV(RidgeClassifier(), cv=...) instead. Supporting the cv option would make sense if we did something a bit smarter than just delegating to GridSearchCV though (e.g., using warm-start to speed up conjugate gradient - but I'm not sure it would make much of a difference). |
Some of the Ridge solvers definitely don't benefit from having a dedicated However, in the dense setting, multiple penalities can share the same Additionally, if I have been meaning to make these changes in any case. I am just wondering whether once again I am in a niche usecase or this is useful for others. |
This would indeed be nice. See also #582.
Sounds interesting, could you elaborate? To clarify my point, I just don't like wrapping GridSearchCV within RidgeCV. I think the value of RidgeCV is to do parameter selection more cleverly than what GridSearchCV can. I'd be +1 to remove wrapping GridSearchCV and implement #582 instead. We can add a new solver constructor option to let the user choose between LOO, #582 and any other technique. We may also need to deprecate the gcv_mode option. |
Same gut feeling. |
On Fri, Jun 20, 2014 at 9:30 AM, Mathieu Blondel notifications@github.com
A
I totally agree that wrapping
|
On Fri, Jun 20, 2014 at 9:37 AM, Gael Varoquaux notifications@github.com
A
|
Exactly. So the solvers could be:
And an "auto" mode to make the best possible choice automatically. The loo-* solvers would raise an exception if a CV object is passed. |
OK, that pretty clearly delineates the refactoring project I had in mind, which I can work on mid July. |
@eickenberg Shall we close this PR? |
Yes. I'll post a gist with |
I could not create a failing test for the failure to pass scoring through to the
GridSearchCV
which is involved, since these objects are not persisted in the_BaseRidgeCV
.Passing the the
scoring
to theGridSearchCV
within_BaseRidgeCV
creates a new problem, since the internal CV of this object seems to be exclusively working in the continuous setting, thus making e.g.f1_score
raise an exception due to discrepancy ofy_true
andy_pred
types.While there may exist consistency proofs for accuracy score wrt continous Ridge, I am unsure if this is the case for f1_score (?). So the solution would be to have
RidgeClassifierCV
use an inner CV evaluated in a classifier setting, not a regression approximation to this setting. This looks like a slightly bigger change than I thought.The diff I am pushing shows the situation where
RidgeClassifierCV
breaks due toscoring='f1'
once it is passed through to theGridSearchCV
within_BaseRidgeCV
.I have some ideas for a general refactoring of this code, but will first collect those in a gist for discussion.