Function to get scorers for task #12385

jnothman · 2018-10-15T10:19:48Z

I would like to see a utility which would construct a set of applicable scorers for a particular task, returning a Mapping from string to callable scorer. It will be hard to design the API of this right the first time. [Maybe this should be initially developed outside this project and contributed to scikit-learn-contrib, but I think it reduces risk of mis-specifying scorers, so it's of benefit to this project.]

The user will be able to select a subset of the scorers, either with a dict comprehension or with some specialised methods or function parameters. Initially it wouldn't be efficient to run all these scorers, but hopefully we can do something to fix #10802 :|.

Let's take for instance a binary classification task. The function get_applicable_scorers(y, pos_label='yes') for binary y might produce something like:

{
    'accuracy': make_scorer(accuracy_score),
    'balanced_accuracy': make_scorer(balanced_accuracy_score),
    'matthews_corrcoef': make_scorer(matthews_corrcoef),
    'cohens_kappa': make_scorer(cohens_kappa),
    'precision': make_scorer(precision_score, pos_label='yes'),
    'recall': make_scorer(recall_score, pos_label='yes'),
    'f1': make_scorer(f1_score, pos_label='yes'),
    'f0.5': make_scorer(f1_score, pos_label='yes', beta=0.5),
    'f2': make_scorer(f1_score, pos_label='yes', beta=2),
    'specificity': ...,
    'miss_rate': ...,
    ...
    'roc_auc': make_scorer(roc_auc_score, needs_threshold=True),
    'average_precision': make_scorer(average_precision_score, needs_threshold=True),
    'neg_log_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
    'neg_brier_score_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
}

Doing the same for multiclass classification would pass labels as appropriate, and would optionally would get per-class binary metrics, as well as overall multiclass metrics.

I'm not sure how sample_weight fits in here, but ha! we still don't support weighted scoring in cross validation (#1574), so let's not worry about that.

The text was updated successfully, but these errors were encountered:

whiletruelearn · 2018-10-16T08:39:17Z

@jnothman Trying to understand. Can this be implemented by maintaining separate dict of scorers as in

scikit-learn/sklearn/metrics/scorer.py

Lines 501 to 521 in a1d0e96

    
           SCORERS = dict(explained_variance=explained_variance_scorer, 
        
                          r2=r2_scorer, 
        
                          max_error=max_error_scorer, 
        
                          neg_median_absolute_error=neg_median_absolute_error_scorer, 
        
                          neg_mean_absolute_error=neg_mean_absolute_error_scorer, 
        
                          neg_mean_squared_error=neg_mean_squared_error_scorer, 
        
                          neg_mean_squared_log_error=neg_mean_squared_log_error_scorer, 
        
                          accuracy=accuracy_scorer, roc_auc=roc_auc_scorer, 
        
                          balanced_accuracy=balanced_accuracy_scorer, 
        
                          average_precision=average_precision_scorer, 
        
                          neg_log_loss=neg_log_loss_scorer, 
        
                          brier_score_loss=brier_score_loss_scorer, 
        
                          # Cluster metrics that use supervised evaluation 
        
                          adjusted_rand_score=adjusted_rand_scorer, 
        
                          homogeneity_score=homogeneity_scorer, 
        
                          completeness_score=completeness_scorer, 
        
                          v_measure_score=v_measure_scorer, 
        
                          mutual_info_score=mutual_info_scorer, 
        
                          adjusted_mutual_info_score=adjusted_mutual_info_scorer, 
        
                          normalized_mutual_info_score=normalized_mutual_info_scorer, 
        
                          fowlkes_mallows_score=fowlkes_mallows_scorer)

for different tasks such as classification , regression, clustering etc.

jnothman · 2018-10-16T12:30:49Z

No, while related to that:

this needs to be a function of other parameters, such as a single or multiple classes of interest.
this needs to exclude inappropriate scorers, such as those for regression or clustering tasks when performing classification.

baluyotraf · 2019-01-21T06:27:36Z

Not sure if I really get what you want but can't we do a class wrapper for the scorers?

For example a scorer wrapper class would have something like

class Scorer:

    is_binary_scorer = True
    needs_probability = True

    def __init__(self, *args, **kwargs):
        pass

    def compute(self, y_true, y_pred):
        pass

Then to take binary scorers you can do a list comprehension.

params = {
    'weights': 10,
    'something': 'pass'
}
binary_scorers = [s(**params) for s in ALL_SCORERS where s.is_binary_scorer]

If a scorer needs a predict_proba when doing multi metrics you can also skip them with a warning if the model used doesn't have it. Just my 2 cents.

You can also do something like this for the multi metrics

for s in scorers:
    if s.needs_probability:
        if proba is None:
            try:
                proba = estimator.predict_proba(X_test)
                score = s.compute(y_true, proba)
            except AttributeError:
                pass
        else:
            score = s.compute(y_true, proba)
    else:
        if y_pred is None:
            estimator.predict(X_test)
        score = s.compute(y_true, y_pred)

Oops seems like something like this was already posted in the referenced issue.

jnothman · 2019-01-21T06:38:00Z

This is not about computing various metrics efficiently, but about helping users find metrics appropriate for a task, and making sure they do not get caught in some traps. For example, the scorers available do not return any per-class metrics in the case of multiclass, multilabel, etc. They also make some sometimes-poor assumptions, including that:

y_true and y_pred contain all relevant classes. This makes a big difference for macro-averaged metrics (and for any metrics that change their behaviour depending on whether their inputs is binary or multiclass). It can be overridden in metrics by passing a labels parameter.
in a binary problem the highest-valued class label is the one that you are interested in.

baluyotraf · 2019-01-21T06:52:34Z

If that's the case won't sub classes or separate metrics module do fine? If I'm working on a classification problem and sklearn.metrics.classification exists it's kind of intuitive that I must use the scorers in those right? Similar to the sklearn.metrics.cluster. Classes might also include other scoring information which may help the users select the metrics they want to use.

jnothman · 2019-01-21T07:01:46Z

It still doesn't satisfy the need for specifying labels, nor does it readily create scorers for search.

chkoar · 2022-04-04T17:53:37Z

The function get_applicable_scorers(y, pos_label='yes') for binary y might produce something like:

I suppose that most likely you will need the model too.
Having the model itself you would be able to exclude/include scorers.

jnothman · 2022-04-05T22:24:42Z

Yes, you're right, the estimator might be useful to determine predict_proba or return_std support.

DCoupry · 2024-05-02T12:23:28Z

Bit stale, but still: the sklearn.utils.discovery module could be the best place for a function all_scorers([task_filter]) to live in.

jnothman added New Feature Moderate Anything that requires some knowledge of conventions and best practices API help wanted labels Oct 15, 2018

jnothman mentioned this issue Oct 17, 2018

Multi-metric scoring is incredibly slow because it repeats predictions for every metric #10802

Closed

jnothman mentioned this issue Sep 10, 2019

[MRG] Adds _MultimetricScorer for Optimized Scoring #14593

Merged

jnothman mentioned this issue Sep 19, 2019

Allow scoring callable to return a dict {name: score} #15021

Closed

thomasjpfan mentioned this issue Oct 31, 2019

BUG Fixes error with multiclass roc auc scorer #15274

Merged

jnothman mentioned this issue Nov 10, 2019

MNT Improve error message with implicit pos_label in brier_score_loss #15412

Closed

jnothman mentioned this issue Dec 17, 2019

Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function #15910

Open

jnothman mentioned this issue Jun 24, 2020

ENH add a parameter pos_label in roc_auc_score #17594

Merged

glemaitre linked a pull request Jul 10, 2020 that will close this issue

[WIP] ENH create a generator of applicable metrics depending on the target y #17889

Open

cmarmo removed the help wanted label Aug 23, 2020

jnothman mentioned this issue Oct 4, 2020

Feature request: regression report #18454

Open

jnothman mentioned this issue Mar 25, 2021

Confusion Matrix is of wrong shape #19756

Closed

This was referenced Jul 28, 2021

Add pass classes argument to scorers and allow_subset_labels in metrics #9585

Closed

FEA Confusion matrix derived metrics #19556

Open

jnothman mentioned this issue Nov 21, 2021

RFC Principled metrics for scoring and calibration of supervised learning #21718

Open

thomasjpfan mentioned this issue Feb 4, 2024

[API] A public API for creating and using multiple scorers in the sklearn-ecosystem #28299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function to get scorers for task #12385

Function to get scorers for task #12385

jnothman commented Oct 15, 2018

whiletruelearn commented Oct 16, 2018

jnothman commented Oct 16, 2018

baluyotraf commented Jan 21, 2019 •

edited

Loading

jnothman commented Jan 21, 2019

baluyotraf commented Jan 21, 2019

jnothman commented Jan 21, 2019

chkoar commented Apr 4, 2022

jnothman commented Apr 5, 2022

DCoupry commented May 2, 2024

Function to get scorers for task #12385

Function to get scorers for task #12385

Comments

jnothman commented Oct 15, 2018

whiletruelearn commented Oct 16, 2018

jnothman commented Oct 16, 2018

baluyotraf commented Jan 21, 2019 • edited Loading

jnothman commented Jan 21, 2019

baluyotraf commented Jan 21, 2019

jnothman commented Jan 21, 2019

chkoar commented Apr 4, 2022

jnothman commented Apr 5, 2022

DCoupry commented May 2, 2024

baluyotraf commented Jan 21, 2019 •

edited

Loading