Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoring functions don't know classes_ #6231

Open
pkch opened this issue Jan 26, 2016 · 27 comments
Open

Scoring functions don't know classes_ #6231

pkch opened this issue Jan 26, 2016 · 27 comments

Comments

@pkch
Copy link

pkch commented Jan 26, 2016

Moving the discussion with @amueller from pydata/patsy#77 (comment).

Proposing to:

  • add an optional 'labels' argument to log_loss
  • add an argument to make_scorer to enable passing labels argument to the custom loss function

Related to this - perhaps also allow make_scorer's need_threshold = True to be used for non-binary classification problems. Not sure why it's limited to binary. Totally meaningful for multi-class situations, where the metric might use the decision function to evaluate based on the rank order or something (for example, SVC doesn't have a cheap predict_proba, but has a cheap decision_function, so it would allow it to be used with rank-based metris). From make_scorer perspective, it just seems like a completely arbitrary restriction because it shouldn't care about what the custom loss function does with the decision_function output.

@jnothman
Copy link
Member

Responding to that second dot point only, make_scorer will automatically
pass on any additional kwargs.

On 26 January 2016 at 15:32, pkch notifications@github.com wrote:

Moving the discussion from github/pydata/77.

Proposing to:

  • add an optional 'labels' argument to log_loss
  • add an argument to make_scorer to enable passing labels argument to
    the custom loss function

Related to this - perhaps also allow make_scorer's need_threshold = True
to be used for non-binary classification problems. Not sure why it's
limited to binary. Totally meaningful for multi-class situations, where the
metric might use the decision function to evaluate based on the rank order
or something (for example, SVC doesn't have a cheap predict_proba, but
has a cheap decision_function, so it would allow it to be used with
rank-based metris). From make_scorer perspective, it just seems like a
completely arbitrary restriction because it shouldn't care about what the
custom loss function does with the decision_function output.


Reply to this email directly or view it on GitHub
#6231.

@pkch
Copy link
Author

pkch commented Jan 28, 2016

@jnothman - I don't see how it would work. I meant that the when the classifier finishes training, its classes_ attribute should be passed to the custom loss function. This would help avoid calling LabelEncoder inside the custom loss function (which is required to match the unlabeled np.ndarray returned by predict_proba with the labels in the true y).

make_scorer can only pass additional kwargs to the custom loss function if these arguments are known at the time make_scorer is called. But make_scorer is called to create a scorer, which is to be used on arbitrary future datasets, with arbitrary future classifiers. The class labels won't be known until the classifier is trained on a particular dataset.

On another thought, if sklearn were to return DataFrame instead of np.ndarray this would be unnecessary. But I doubt it's in the plans?

@amueller
Copy link
Member

amueller commented Oct 7, 2016

+1 on adding labels to all metrics.

And I agree this is an issue with the current design of the scorers. I think the classifiers should actually pass the classes_ to the scorer. There is even more issues with this, if this happens in cross-validation, for example, as different folds might have different classes, so we might want to tell all classifiers what the classes are (but that proposal was rejected a while ago because it really clutters the API).

I'm fine with a fix on the scorer level, though. The scorers could actually inspect the classes_ attribute and act accordingly. This is somewhat of a big API change though. I'd be happy to see a test and PRs though.

This relates closely to how we compute the R^2 by the way, but that is harder to fix. The R^2 should use the training set mean, not the test set mean, but that's unknown to the scorer. Getting classes_ in the call from scorer is easy, but the R^2 computation is hard, as the model doesn't actually store the class mean :-(

Ideally I'd fix both issues in one go, but I'm not sure how to do that.

@amueller amueller changed the title Scoring functions Scoring functions don't know classes_ Oct 7, 2016
@amueller amueller added this to the 0.19 milestone Oct 7, 2016
@amueller
Copy link
Member

amueller commented Oct 7, 2016

I feel this is pretty important, as currently results can be pretty wrong if not all classes are present in a test set.

On second thought, I'm not sure what the right way to handle this is, though. Should we warn or error if macro-average is requested and not all classes are present?

@jnothman
Copy link
Member

jnothman commented Oct 8, 2016

I'd be interested in seeing scorers access classes_ to pass that on to
capable metrics...

On 8 October 2016 at 05:30, Andreas Mueller notifications@github.com
wrote:

+1 on adding labels to all metrics.

And I agree this is an issue with the current design of the scorers. I
think the classifiers should actually pass the classes_ to the scorer.
There is even more issues with this, if this happens in cross-validation,
for example, as different folds might have different classes, so we might
want to tell all classifiers what the classes are (but that proposal was
rejected a while ago because it really clutters the API).

I'm fine with a fix on the scorer level, though. The scorers could
actually inspect the classes_ attribute and act accordingly. This is
somewhat of a big API change though. I'd be happy to see a test and PRs
though.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6231 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6-Xq3GVCuOYch82zcOeF0dOtVBZ6ks5qxo_fgaJpZM4HMN5L
.

@jnothman
Copy link
Member

Re my comment at #9144 that scorer objects should look at the set of classes before cv, we should also consider how this might work with nested CV...

@amueller
Copy link
Member

Hm so we make labels a required parameter (with deprecation)?

and do we want to store the labels in the scorer actually? we can also just make it a required argument of the __call__ and always provide it in GridSearchCV and cross_val_score etc.

In general this problem also happens for the score method of GridSearchCV, where we don't have labels... but we stored them in fit... but these where the training labels. But I guess that's the best we can do?

@amueller
Copy link
Member

If we don't pass the labels along with X and y, then doing it for the built-in ones and the make_scorer ones would be quite different.

@jnothman
Copy link
Member

jnothman commented Jun 20, 2017 via email

@jmschrei
Copy link
Member

The same issue arises in the models that support partial fit if a batch of data happens to not have a certain label. Allowing the user to specify the classes more generally might be worth pursuing. Potentially an optional parameter?

@amueller
Copy link
Member

@jmschrei how so? the first pass to partial_fit is required to specify all the classes.

@amueller
Copy link
Member

@jnothman so we add an extra constructor argument to all classifiers?
We said "no" to that a while ago, but I guess we can change that. I actually agree with your assessment.
So we add an optional constructor parameter to all classifiers?
I'm happy to do that (or rather, have @aarshayj do it ;). But I would like buy-in from @ogrisel @agramfort @GaelVaroquaux if possible before embarking on this.

@jnothman
Copy link
Member

jnothman commented Jun 28, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 28, 2017 via email

@jnothman
Copy link
Member

jnothman commented Jun 28, 2017 via email

@ogrisel
Copy link
Member

ogrisel commented Jun 29, 2017

Explicit is better than implicit. I guess this applies in this case. +1 for a PR that adds classes as an optional hyperparam to all classifiers and a common test that looks like this:

rng = np.random.RandomState(42)
X = rng.normal(size=(50, 5))
y = rng.random.randint(2, 4, size=50)  # 2 or 3

all_classes = np.array([1, 2, 3, 42])

for clf_type in all_classifier_types:
    clf = clf_type(classes=all_classes).fit(X, y)
    assert clf.classes_ == all_classes

    if hasattr(clf, 'decision_function'):
        assert clef.decision_function(X).shape == (50, 4)
    if hasattr(clf, 'predict_proba'):
        assert clef.predict_proba(X).shape == (50, 4)
    ...

@amueller
Copy link
Member

See #9290 for a draft of the API.

@adrinjalali
Copy link
Member

Is adding classes to all classifier inits still on the table? If yes, I could work on it.

@jnothman
Copy link
Member

jnothman commented Sep 1, 2018 via email

@amueller
Copy link
Member

amueller commented Sep 1, 2018

There's actually a pretty good step towards this in #9585. There's also a PR that adds classes everywhere #9532 but that's indeed hairy.

@adrinjalali
Copy link
Member

Moving this to 2.0, this needs somebody to pay attention to it though.

@adrinjalali adrinjalali modified the milestones: 1.0, 2.0 Aug 22, 2021
@nelsonpeace1
Copy link

Hi all,
I am recently reading Advances in Financial Machine Learning and this bug is discussed here. I cannot find any links to '2.0' mentioned above. I wanted to ask the community, has a smarter person than me come up with a solution to this yet?

Context:

  1. Scoring functions do not know classes_, as a consequence of sklearn’s reliance on numpy arrays rather than pandas series: https://github.com/scikit- learn/scikit-learn/issues/6231
  2. cross_val_score will give different results because it passes weights to the fit method, but not to the log_loss method: https://github.com/scikit- learn/scikit-learn/issues/9144

@adrinjalali
Copy link
Member

@glemaitre as for the scorers, don't we support labels in some cases now? I guess the scorer can now use classes_ to pass as labels?

@glemaitre
Copy link
Member

If the function support labels then we can pass it when calling make_scorer. However, we are not consistent regarding the labels parameter in the different scoring functions.

@adrinjalali
Copy link
Member

So I guess a next step here is to add labels to all scoring functions where it can be relevant?

@amueller
Copy link
Member

Hm that would fix the issue in cross-validation with different classes in different test sets, but if there are different classes in the training set, it's still tricky, right?

@nelsonpeaceAB
Copy link

Hi folks
Sorry to chase, just wanted to keep this issue active. Let me know if I can assist. Seems like an important one to me. (correct me if I am wrong)

Nels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests