Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] _BaseRidgeCV doesn't pass scoring if cv is not None / RidgeClassifierCV shouldn't use continuous setting in inner CV #3302

Closed

Conversation

eickenberg
Copy link
Contributor

I could not create a failing test for the failure to pass scoring through to the GridSearchCV which is involved, since these objects are not persisted in the _BaseRidgeCV.

Passing the the scoring to the GridSearchCV within _BaseRidgeCV creates a new problem, since the internal CV of this object seems to be exclusively working in the continuous setting, thus making e.g. f1_score raise an exception due to discrepancy of y_true and y_pred types.

While there may exist consistency proofs for accuracy score wrt continous Ridge, I am unsure if this is the case for f1_score (?). So the solution would be to have RidgeClassifierCV use an inner CV evaluated in a classifier setting, not a regression approximation to this setting. This looks like a slightly bigger change than I thought.

The diff I am pushing shows the situation where RidgeClassifierCV breaks due to scoring='f1' once it is passed through to the GridSearchCV within _BaseRidgeCV.

I have some ideas for a general refactoring of this code, but will first collect those in a gist for discussion.

@mblondel
Copy link
Member

In retrospect, I think supporting the cv option in RidgeCV and RidgeClassifierCV was a mistake. It makes the code more complicated and it leads to inconsistencies (e.g., sample_weight is not handled properly). So I'd personally vote for focusing on doing only generalized CV (efficient LOO) well. This will allow us to remove the dependency on GridSearchCV and simplify the code. Users who want to use arbitrary CV objects should use GridSearchCV(Ridge(), cv=...) or GridSearchCV(RidgeClassifier(), cv=...) instead. Supporting the cv option would make sense if we did something a bit smarter than just delegating to GridSearchCV though (e.g., using warm-start to speed up conjugate gradient - but I'm not sure it would make much of a difference).

@eickenberg
Copy link
Contributor Author

Some of the Ridge solvers definitely don't benefit from having a dedicated RidgeCV object - as you say, warmstarting CG may not make that much of a difference.

However, in the dense setting, multiple penalities can share the same svd or eigh decomposition, which is impossible if you only wrap with GridSearchCV, and leads to a very efficient parameter selection loop.

Additionally, if n_features >> n_samples, then for parameter selection one can bypass establishing the weight vector and go to prediction straight away.

I have been meaning to make these changes in any case. I am just wondering whether once again I am in a niche usecase or this is useful for others.

@mblondel
Copy link
Member

However, in the dense setting, multiple penalities can share the same svd or eigh decomposition, which is impossible if you only wrap with GridSearchCV, and leads to a very efficient parameter selection loop.

This would indeed be nice. See also #582.

Additionally, if n_features >> n_samples, then for parameter selection one can bypass establishing the weight vector and go to prediction straight away.

Sounds interesting, could you elaborate?

To clarify my point, I just don't like wrapping GridSearchCV within RidgeCV. I think the value of RidgeCV is to do parameter selection more cleverly than what GridSearchCV can. I'd be +1 to remove wrapping GridSearchCV and implement #582 instead. We can add a new solver constructor option to let the user choose between LOO, #582 and any other technique. We may also need to deprecate the gcv_mode option.

@GaelVaroquaux
Copy link
Member

So I'd personally vote for focusing on doing only generalized CV
(efficient LOO) well.

Same gut feeling.

@eickenberg
Copy link
Contributor Author

On Fri, Jun 20, 2014 at 9:30 AM, Mathieu Blondel notifications@github.com
wrote:

However, in the dense setting, multiple penalities can share the same svd
or eigh decomposition, which is impossible if you only wrap with
GridSearchCV, and leads to a very efficient parameter selection loop.

This would indeed be nice. See also #582
#582.

Oh! This is exactly the lines along which I was thinking. Write a
ridge_path function which mutualizes the decomposition between penalties
and multiple targets (my local implementation works like this).

Additionally, if n_features >> n_samples, then for parameter selection
one can bypass establishing the weight vector and go to prediction straight
away.

Sounds interesting, could you elaborate?

In the n_features >> n_samples case, the matrix R establishing the
weights (w = R.dot(y)) is of shape X.T.shape. Predictions are done by
pre-multiplying X to this. If, by associativity, one first evaluates
(X.dot(R)).dot(y) instead of X.dot(w) == X.dot(R.dot(y)), then one is
much better off ...

A ridge_path provided with an X_test as well as X_train could easily
take advantage of this.

To clarify my point, I just don't like wrapping GridSearchCV within
RidgeCV. I think the value of RidgeCV is to do parameter selection more
cleverly than what GridSearchCV can. I'd be +1 to remove wrapping
GridSearchCV and implement #582
#582 instead. We can
add a new solver constructor option to let the user choose between LOO,
#582 #582 and any
other technique. We may also need to deprecate the gcv_mode option.

I totally agree that wrapping GridSearchCV is not extremely useful, since
this can be done in any case, by the user, if no other option exists.
Wrapping GridSearchCV creates the illusion that the object is doing
something smart, although it isn't :)


Reply to this email directly or view it on GitHub
#3302 (comment)
.

@eickenberg
Copy link
Contributor Author

On Fri, Jun 20, 2014 at 9:37 AM, Gael Varoquaux notifications@github.com
wrote:

So I'd personally vote for focusing on doing only generalized CV
(efficient LOO) well.

Same gut feeling.

A ridge_path could also do LOOV/E very efficiently.


Reply to this email directly or view it on GitHub
#3302 (comment)
.

@mblondel
Copy link
Member

Wrapping GridSearchCV creates the illusion that the object is doing something smart, although it isn't

Exactly.

So the solvers could be:

  • loo-eigen
  • loo-svd
  • svd
  • eigen
  • cg-warm-start

And an "auto" mode to make the best possible choice automatically. The loo-* solvers would raise an exception if a CV object is passed.

@eickenberg
Copy link
Contributor Author

OK, that pretty clearly delineates the refactoring project I had in mind, which I can work on mid July.

@mblondel
Copy link
Member

@eickenberg Shall we close this PR?

@eickenberg
Copy link
Contributor Author

Yes. I'll post a gist with ridge_path as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants