New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Auc grid search #1014

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@amueller
Member

amueller commented Aug 11, 2012

This enables grid search with AUC score.

What do you think about doing it like this?

cc @ogrisel

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 1, 2012

Member

This should work now robustly in the two class case. still lacking tests and errors if called with multiclass, I think

Member

amueller commented Sep 1, 2012

This should work now robustly in the two class case. still lacking tests and errors if called with multiclass, I think

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 1, 2012

Member

Maybe we could define a new decorator for score functions:

def requires_thresholds(f):
    f.requires_thresholds = True
    return f

@requires_thresholds
def auc_score(y_true, y_thresholds):
    [...]

@requires_thresholds
def averaged_precision_score(y_true, y_thresholds):
    [...]

And then instead of testing for auc_score and average_precision_score explicitly we could check for getattr(clf, 'requires_thresholds', False). That would make it possible to grid search for custom score implementations that require thresholds instead of categorical predictions.

Member

ogrisel commented Sep 1, 2012

Maybe we could define a new decorator for score functions:

def requires_thresholds(f):
    f.requires_thresholds = True
    return f

@requires_thresholds
def auc_score(y_true, y_thresholds):
    [...]

@requires_thresholds
def averaged_precision_score(y_true, y_thresholds):
    [...]

And then instead of testing for auc_score and average_precision_score explicitly we could check for getattr(clf, 'requires_thresholds', False). That would make it possible to grid search for custom score implementations that require thresholds instead of categorical predictions.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 1, 2012

Member

@GaelVaroquaux you might want to have a look at this PR :)

Member

ogrisel commented Sep 1, 2012

@GaelVaroquaux you might want to have a look at this PR :)

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 1, 2012

Member

@ogrisel that seems like a workable approach. My solution seemed hacky but I didn't see much need to go beyond that for the moment. Yours is nicer of course.

Member

amueller commented Sep 1, 2012

@ogrisel that seems like a workable approach. My solution seemed hacky but I didn't see much need to go beyond that for the moment. Yours is nicer of course.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 4, 2012

Member

Guys, just looking at this PR. IMHO, this is the kind of problem that I
would tackle by subclassing the estimator to have it's 'score' method
compute an AUC. Keeping this approach in mind is just so much more
versatile and avoids having to do all kinds of hacks in the codebase.

Member

GaelVaroquaux commented Sep 4, 2012

Guys, just looking at this PR. IMHO, this is the kind of problem that I
would tackle by subclassing the estimator to have it's 'score' method
compute an AUC. Keeping this approach in mind is just so much more
versatile and avoids having to do all kinds of hacks in the codebase.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 4, 2012

Member

ROC AUC is such a common evaluation criterion (to compare results with benchmarks in the literature or to compete on challenges for instance) that I think the default scikit-learn API should make it possible to compute those scores for the default estimators without having to subclass them.

Member

ogrisel commented Sep 4, 2012

ROC AUC is such a common evaluation criterion (to compare results with benchmarks in the literature or to compete on challenges for instance) that I think the default scikit-learn API should make it possible to compute those scores for the default estimators without having to subclass them.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 4, 2012

Member

+1 There should definitely be a way for the user to do this.

Member

amueller commented Sep 4, 2012

+1 There should definitely be a way for the user to do this.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 4, 2012

Member

Right, but I think that it should be solved at the estimator level, not
at the 'grid_search' level.

Maybe we need a new Mixin, and that mixing should define a new estimator
parameter 'score_type' that would switch the 'score' method from the
zero-one loss to other scores?

Member

GaelVaroquaux commented Sep 4, 2012

Right, but I think that it should be solved at the estimator level, not
at the 'grid_search' level.

Maybe we need a new Mixin, and that mixing should define a new estimator
parameter 'score_type' that would switch the 'score' method from the
zero-one loss to other scores?

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 4, 2012

Member

I think it would still be good to have a declarative description of the expectations of the scoring functions: it would be interesting to declare:

  • can be used for binary classification
  • can be used for multiclass classification
  • can be used for multilabel classification
  • can be used for single target regression
  • can be used for multiple targets regression
  • accepts prediction thresholds
  • accepts categorical predictions
  • ... maybe more?

This sorts of declarations could be useful both for computing automated documentation summaries and for building model selection and evaluation utilities (both as part of the scikit-learn project or by third party software).

Member

ogrisel commented Sep 4, 2012

I think it would still be good to have a declarative description of the expectations of the scoring functions: it would be interesting to declare:

  • can be used for binary classification
  • can be used for multiclass classification
  • can be used for multilabel classification
  • can be used for single target regression
  • can be used for multiple targets regression
  • accepts prediction thresholds
  • accepts categorical predictions
  • ... maybe more?

This sorts of declarations could be useful both for computing automated documentation summaries and for building model selection and evaluation utilities (both as part of the scikit-learn project or by third party software).

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 4, 2012

Member

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)
I also agree with Olivier that we should have a better, self-contained description on what our objects / functions are capable. I feel that is a mostly unrelated issue, though.

I'll think about @GaelVaroquaux suggestion and might come up with a PR at some point.

Member

amueller commented Sep 4, 2012

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)
I also agree with Olivier that we should have a better, self-contained description on what our objects / functions are capable. I feel that is a mostly unrelated issue, though.

I'll think about @GaelVaroquaux suggestion and might come up with a PR at some point.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 4, 2012

Member

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)

No, I actually changed my mind based on your feedback :). Sorry, I was
writing emails on my mobile phone, and thus was probably too terse.

I also agree with Olivier that we should have a better, self-contained
description on what our objects / functions are capable. I feel that is a
mostly unrelated issue, though.

I agree. I'd like to postpone this for later, as it is a challenging
design issue (if you come around in Paris, we could hack on it together
:P ).

G

Member

GaelVaroquaux commented Sep 4, 2012

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)

No, I actually changed my mind based on your feedback :). Sorry, I was
writing emails on my mobile phone, and thus was probably too terse.

I also agree with Olivier that we should have a better, self-contained
description on what our objects / functions are capable. I feel that is a
mostly unrelated issue, though.

I agree. I'd like to postpone this for later, as it is a challenging
design issue (if you come around in Paris, we could hack on it together
:P ).

G

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 4, 2012

Member

I wish it was that easy ;)
Well once I won the kaggle competition I will be rich and have unlimited time....
Only it is not looking so good at the moment :-/

Member

amueller commented Sep 4, 2012

I wish it was that easy ;)
Well once I won the kaggle competition I will be rich and have unlimited time....
Only it is not looking so good at the moment :-/

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 4, 2012

Member

Well once I won the kaggle competition I will be rich and have unlimited
time....
Only it is not looking so good at the moment :-/

Tell us if we can do something about it. If you just want time to focus
on other things, I must say that it is perfectly understandable, and I
have the same problem.

Member

GaelVaroquaux commented Sep 4, 2012

Well once I won the kaggle competition I will be rich and have unlimited
time....
Only it is not looking so good at the moment :-/

Tell us if we can do something about it. If you just want time to focus
on other things, I must say that it is perfectly understandable, and I
have the same problem.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 4, 2012

Member

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

Member

amueller commented Sep 4, 2012

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 4, 2012

Member

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

Don't worry: I am ridiculously busy too: I hardly have a day free on my
agenda till mid October :(

Member

GaelVaroquaux commented Sep 4, 2012

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

Don't worry: I am ridiculously busy too: I hardly have a day free on my
agenda till mid October :(

@kyleabeauchamp

This comment has been minimized.

Show comment
Hide comment
@kyleabeauchamp

kyleabeauchamp Sep 11, 2012

Contributor

So wouldn't another idea be to rework the interface between grid search and the score functions such that:

  1. Score functions accept y_true, x, and a model as input, rather than y_true and y_hat
  2. The score function manipulates the model object to calculate whatever it needs (e.g. y_hat or p_hat)

There are some issues with this approach:

A. The added complexity of the score function needing to directly interact with a model object.

I think we can address (A) by using inheritance or having a few helper functions. The helper_function could be something like:

def score_by_probabilities(y_true, x, model,score_function):
    #... We would add lots of error checking here to make sure that the score-function is compatible with the model
    p_hat = model.predict_proba(x)
    return score_function(y_true, phat)

I suppose a similar approach would be to leave score functions are they are now, but create something like a GridScoreFunction() that does what I discussed above and is assembled from the raw score function. Then the input to GridSearch isn't the raw score function, but the GridScoreFunction object.

Contributor

kyleabeauchamp commented Sep 11, 2012

So wouldn't another idea be to rework the interface between grid search and the score functions such that:

  1. Score functions accept y_true, x, and a model as input, rather than y_true and y_hat
  2. The score function manipulates the model object to calculate whatever it needs (e.g. y_hat or p_hat)

There are some issues with this approach:

A. The added complexity of the score function needing to directly interact with a model object.

I think we can address (A) by using inheritance or having a few helper functions. The helper_function could be something like:

def score_by_probabilities(y_true, x, model,score_function):
    #... We would add lots of error checking here to make sure that the score-function is compatible with the model
    p_hat = model.predict_proba(x)
    return score_function(y_true, phat)

I suppose a similar approach would be to leave score functions are they are now, but create something like a GridScoreFunction() that does what I discussed above and is assembled from the raw score function. Then the input to GridSearch isn't the raw score function, but the GridScoreFunction object.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 11, 2012

Member

If i understand your (first) proposal, it basically is the same as the proposal above, only that the score function is now a separate function that gets the model, instead of a method, right?

Member

amueller commented Sep 11, 2012

If i understand your (first) proposal, it basically is the same as the proposal above, only that the score function is now a separate function that gets the model, instead of a method, right?

@kyleabeauchamp

This comment has been minimized.

Show comment
Hide comment
@kyleabeauchamp

kyleabeauchamp Sep 11, 2012

Contributor

Yes, I think you're right.

I wonder if there's an even cleaner solution that involves create a new type of class to pass around (e.g. Scorer). IMHO, we'd like a general solution that doesn't require us to keep track of all the various combinations of possibilities (single / multi-class, y_hat versus p_hat, models lacking p_hat support).

Contributor

kyleabeauchamp commented Sep 11, 2012

Yes, I think you're right.

I wonder if there's an even cleaner solution that involves create a new type of class to pass around (e.g. Scorer). IMHO, we'd like a general solution that doesn't require us to keep track of all the various combinations of possibilities (single / multi-class, y_hat versus p_hat, models lacking p_hat support).

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 11, 2012

Member

@kyleabeauchamp I agree on this other hand we don't want the user to have to build nested objects constructs to be able to call basic operations such as computing a score for an estimator. Hence I like the ability of have declarative decorators.

Alternatively we could turn the scoring utilities into classes with explicit method names to either take a categorical predictions, thresholds or predictive models as different inputs to different methods depending on the score capabilities.

Member

ogrisel commented Sep 11, 2012

@kyleabeauchamp I agree on this other hand we don't want the user to have to build nested objects constructs to be able to call basic operations such as computing a score for an estimator. Hence I like the ability of have declarative decorators.

Alternatively we could turn the scoring utilities into classes with explicit method names to either take a categorical predictions, thresholds or predictive models as different inputs to different methods depending on the score capabilities.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 12, 2012

Member

I liked @GaelVaroquaux's idea and I'll try it out when I get back home in about two weeks.

Member

amueller commented Sep 12, 2012

I liked @GaelVaroquaux's idea and I'll try it out when I get back home in about two weeks.

@VirgileFritsch

This comment has been minimized.

Show comment
Hide comment
@VirgileFritsch

VirgileFritsch Sep 13, 2012

Member

If I may (after reading this discussion): +1 for @GaelVaroquaux 's proposal.

Member

VirgileFritsch commented Sep 13, 2012

If I may (after reading this discussion): +1 for @GaelVaroquaux 's proposal.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 17, 2012

Member

Just to make sure, @GaelVaroquaux's suggestion (as I understand it) means that the __init__ of all classifiers must call the __init__ of the ClassifierMixin.
I'm for that, just want everybody to be on the same page.

Member

amueller commented Sep 17, 2012

Just to make sure, @GaelVaroquaux's suggestion (as I understand it) means that the __init__ of all classifiers must call the __init__ of the ClassifierMixin.
I'm for that, just want everybody to be on the same page.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 17, 2012

Member

I reread the discussion above and now I noticed that @GaelVaroquaux proposed to add a new mixin. I'd rather add the score to the ClassifierMixin as I said in my last post.

Member

amueller commented Sep 17, 2012

I reread the discussion above and now I noticed that @GaelVaroquaux proposed to add a new mixin. I'd rather add the score to the ClassifierMixin as I said in my last post.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 18, 2012

Member

I think it would be easier to make alternative design proposals with WIP pull request without documentation and just a single test to demonstrate concrete usage of the API.

Member

ogrisel commented Sep 18, 2012

I think it would be easier to make alternative design proposals with WIP pull request without documentation and just a single test to demonstrate concrete usage of the API.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 18, 2012

Member

That is definitely good, only some work ;) I'll give it a shot.
I'm a bit concerned about multiple inheritance atm, though... but maybe I'm getting ahead of myself again...

Member

amueller commented Sep 18, 2012

That is definitely good, only some work ;) I'll give it a shot.
I'm a bit concerned about multiple inheritance atm, though... but maybe I'm getting ahead of myself again...

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 19, 2012

Member

I reread the discussion above and now I noticed that [1]@GaelVaroquaux
proposed to add a new mixin. I'd rather add the score to the
ClassifierMixin as I said in my last post.

I am afraid that AUC requires a decision_function or predict_proba, and
thus that not every classifier will be able to offer AUC. Am I wrong?

Member

GaelVaroquaux commented Sep 19, 2012

I reread the discussion above and now I noticed that [1]@GaelVaroquaux
proposed to add a new mixin. I'd rather add the score to the
ClassifierMixin as I said in my last post.

I am afraid that AUC requires a decision_function or predict_proba, and
thus that not every classifier will be able to offer AUC. Am I wrong?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 20, 2012

Member

I thought all classifiers either had predict_proba or decision_function but that might not be true.

Member

amueller commented Sep 20, 2012

I thought all classifiers either had predict_proba or decision_function but that might not be true.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Feb 3, 2013

Member

Closed via #1381.

Member

amueller commented Feb 3, 2013

Closed via #1381.

@amueller amueller closed this Feb 3, 2013

@amueller amueller deleted the amueller:auc_grid_search branch Feb 3, 2013

@amueller amueller restored the amueller:auc_grid_search branch Feb 3, 2013

@amueller amueller deleted the amueller:auc_grid_search branch May 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment