WIP Auc grid search #1014

Closed
wants to merge 3 commits into
from

Projects

None yet

5 participants

@amueller
Member

This enables grid search with AUC score.

What do you think about doing it like this?

cc @ogrisel

@amueller
Member
amueller commented Sep 1, 2012

This should work now robustly in the two class case. still lacking tests and errors if called with multiclass, I think

@ogrisel
Member
ogrisel commented Sep 1, 2012

Maybe we could define a new decorator for score functions:

def requires_thresholds(f):
    f.requires_thresholds = True
    return f

@requires_thresholds
def auc_score(y_true, y_thresholds):
    [...]

@requires_thresholds
def averaged_precision_score(y_true, y_thresholds):
    [...]

And then instead of testing for auc_score and average_precision_score explicitly we could check for getattr(clf, 'requires_thresholds', False). That would make it possible to grid search for custom score implementations that require thresholds instead of categorical predictions.

@ogrisel
Member
ogrisel commented Sep 1, 2012

@GaelVaroquaux you might want to have a look at this PR :)

@amueller
Member
amueller commented Sep 1, 2012

@ogrisel that seems like a workable approach. My solution seemed hacky but I didn't see much need to go beyond that for the moment. Yours is nicer of course.

@GaelVaroquaux
Member

Guys, just looking at this PR. IMHO, this is the kind of problem that I
would tackle by subclassing the estimator to have it's 'score' method
compute an AUC. Keeping this approach in mind is just so much more
versatile and avoids having to do all kinds of hacks in the codebase.

@ogrisel
Member
ogrisel commented Sep 4, 2012

ROC AUC is such a common evaluation criterion (to compare results with benchmarks in the literature or to compete on challenges for instance) that I think the default scikit-learn API should make it possible to compute those scores for the default estimators without having to subclass them.

@amueller
Member
amueller commented Sep 4, 2012

+1 There should definitely be a way for the user to do this.

@GaelVaroquaux
Member

Right, but I think that it should be solved at the estimator level, not
at the 'grid_search' level.

Maybe we need a new Mixin, and that mixing should define a new estimator
parameter 'score_type' that would switch the 'score' method from the
zero-one loss to other scores?

@ogrisel
Member
ogrisel commented Sep 4, 2012

I think it would still be good to have a declarative description of the expectations of the scoring functions: it would be interesting to declare:

  • can be used for binary classification
  • can be used for multiclass classification
  • can be used for multilabel classification
  • can be used for single target regression
  • can be used for multiple targets regression
  • accepts prediction thresholds
  • accepts categorical predictions
  • ... maybe more?

This sorts of declarations could be useful both for computing automated documentation summaries and for building model selection and evaluation utilities (both as part of the scikit-learn project or by third party software).

@amueller
Member
amueller commented Sep 4, 2012

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)
I also agree with Olivier that we should have a better, self-contained description on what our objects / functions are capable. I feel that is a mostly unrelated issue, though.

I'll think about @GaelVaroquaux suggestion and might come up with a PR at some point.

@GaelVaroquaux
Member

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)

No, I actually changed my mind based on your feedback :). Sorry, I was
writing emails on my mobile phone, and thus was probably too terse.

I also agree with Olivier that we should have a better, self-contained
description on what our objects / functions are capable. I feel that is a
mostly unrelated issue, though.

I agree. I'd like to postpone this for later, as it is a challenging
design issue (if you come around in Paris, we could hack on it together
:P ).

G

@amueller
Member
amueller commented Sep 4, 2012

I wish it was that easy ;)
Well once I won the kaggle competition I will be rich and have unlimited time....
Only it is not looking so good at the moment :-/

@GaelVaroquaux
Member

Well once I won the kaggle competition I will be rich and have unlimited
time....
Only it is not looking so good at the moment :-/

Tell us if we can do something about it. If you just want time to focus
on other things, I must say that it is perfectly understandable, and I
have the same problem.

@amueller
Member
amueller commented Sep 4, 2012

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

@GaelVaroquaux
Member

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

Don't worry: I am ridiculously busy too: I hardly have a day free on my
agenda till mid October :(

@kyleabeauchamp
Contributor

So wouldn't another idea be to rework the interface between grid search and the score functions such that:

  1. Score functions accept y_true, x, and a model as input, rather than y_true and y_hat
  2. The score function manipulates the model object to calculate whatever it needs (e.g. y_hat or p_hat)

There are some issues with this approach:

A. The added complexity of the score function needing to directly interact with a model object.

I think we can address (A) by using inheritance or having a few helper functions. The helper_function could be something like:

def score_by_probabilities(y_true, x, model,score_function):
    #... We would add lots of error checking here to make sure that the score-function is compatible with the model
    p_hat = model.predict_proba(x)
    return score_function(y_true, phat)

I suppose a similar approach would be to leave score functions are they are now, but create something like a GridScoreFunction() that does what I discussed above and is assembled from the raw score function. Then the input to GridSearch isn't the raw score function, but the GridScoreFunction object.

@amueller
Member

If i understand your (first) proposal, it basically is the same as the proposal above, only that the score function is now a separate function that gets the model, instead of a method, right?

@kyleabeauchamp
Contributor

Yes, I think you're right.

I wonder if there's an even cleaner solution that involves create a new type of class to pass around (e.g. Scorer). IMHO, we'd like a general solution that doesn't require us to keep track of all the various combinations of possibilities (single / multi-class, y_hat versus p_hat, models lacking p_hat support).

@ogrisel
Member
ogrisel commented Sep 11, 2012

@kyleabeauchamp I agree on this other hand we don't want the user to have to build nested objects constructs to be able to call basic operations such as computing a score for an estimator. Hence I like the ability of have declarative decorators.

Alternatively we could turn the scoring utilities into classes with explicit method names to either take a categorical predictions, thresholds or predictive models as different inputs to different methods depending on the score capabilities.

@amueller
Member

I liked @GaelVaroquaux's idea and I'll try it out when I get back home in about two weeks.

@VirgileFritsch
Member

If I may (after reading this discussion): +1 for @GaelVaroquaux 's proposal.

@amueller
Member

Just to make sure, @GaelVaroquaux's suggestion (as I understand it) means that the __init__ of all classifiers must call the __init__ of the ClassifierMixin.
I'm for that, just want everybody to be on the same page.

@amueller
Member

I reread the discussion above and now I noticed that @GaelVaroquaux proposed to add a new mixin. I'd rather add the score to the ClassifierMixin as I said in my last post.

@ogrisel
Member
ogrisel commented Sep 18, 2012

I think it would be easier to make alternative design proposals with WIP pull request without documentation and just a single test to demonstrate concrete usage of the API.

@amueller
Member

That is definitely good, only some work ;) I'll give it a shot.
I'm a bit concerned about multiple inheritance atm, though... but maybe I'm getting ahead of myself again...

@GaelVaroquaux
Member

I reread the discussion above and now I noticed that [1]@GaelVaroquaux
proposed to add a new mixin. I'd rather add the score to the
ClassifierMixin as I said in my last post.

I am afraid that AUC requires a decision_function or predict_proba, and
thus that not every classifier will be able to offer AUC. Am I wrong?

@amueller
Member

I thought all classifiers either had predict_proba or decision_function but that might not be true.

@amueller
Member
amueller commented Feb 3, 2013

Closed via #1381.

@amueller amueller closed this Feb 3, 2013
@amueller amueller deleted the amueller:auc_grid_search branch Feb 3, 2013
@amueller amueller restored the amueller:auc_grid_search branch Feb 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment