Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Separate predefined scorer names from the ones requiring make_scorer #28750

Merged
merged 9 commits into from
Apr 11, 2024
54 changes: 30 additions & 24 deletions doc/modules/model_evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,8 @@ Scoring Function
'neg_mean_poisson_deviance' :func:`metrics.mean_poisson_deviance`
'neg_mean_gamma_deviance' :func:`metrics.mean_gamma_deviance`
'neg_mean_absolute_percentage_error' :func:`metrics.mean_absolute_percentage_error`
betatim marked this conversation as resolved.
Show resolved Hide resolved
'd2_absolute_error_score' :func:`metrics.d2_absolute_error_score`
'd2_pinball_score' :func:`metrics.d2_pinball_score`
'd2_tweedie_score' :func:`metrics.d2_tweedie_score`
==================================== ============================================== ==================================


Usage examples:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing before to merge: I think that we should move the example below above the table that you added because it does not rely on make_scorer. However, we could add a similar example to show that we can pass a scorer object obtained via make_scorer.

In this case, we have a usage example for each configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some back and forth already about the exact position of the table. I feel if it is below the example, and just above "Defining your scoring strategy from metric functions" section, then it might as well be within that section. Perhaps @ogrisel can chime in too?

And if I'm understanding correctly what you mean by "similar example to show that we can pass a scorer object obtained via make_scorer", then that example is in the 3.3.1.2. section already too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the example right now is:

Usage examples:

    >>> from sklearn import svm, datasets
    >>> from sklearn.model_selection import cross_val_score
    >>> X, y = datasets.load_iris(return_X_y=True)
    >>> clf = svm.SVC(random_state=0)
    >>> cross_val_score(clf, X, y, cv=5, scoring='recall_macro')
    array([0.96..., 0.96..., 0.96..., 0.93..., 1.        ])

I feel strange that it comes right after the table that mentioning that you should make_scorer

Would it not be more friendly to have:

====
Table showing string metric
====


    >>> from sklearn import svm, datasets
    >>> from sklearn.model_selection import cross_val_score
    >>> X, y = datasets.load_iris(return_X_y=True)
    >>> clf = svm.SVC(random_state=0)
    >>> cross_val_score(clf, X, y, cv=5, scoring='recall_macro')
    array([0.96..., 0.96..., 0.96..., 0.93..., 1.        ])

Narration about `make_scorer` only metric

===
Table for `make_scorer` metric
===

Usage example:

    >>> from sklearn import svm, datasets
    >>> from sklearn.metrics import fbeta_score, make_scorer
    >>> from sklearn.model_selection import cross_val_score
    >>> X, y = datasets.load_iris(return_X_y=True)
    >>> clf = svm.SVC(random_state=0)
    >>> scorer = make_scorer(fbeta_score, beta=2)
    >>> cross_val_score(clf, X, y, cv=5, scoring=scorer)
    array([...])

It might be slightly redundant with the section below but this is just a short example usage.

But indeed @ogrisel had maybe another opinion, so let's see what I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this would be a more logical way to structure it, and yes, also agree on the redundancy since the next section talks about make_scorer in more detail and gives that exact same example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's move back the second table at the beginning of the "3.3.1.2. Defining your scoring strategy from metric functions" section. We can start the section with a short sentence such as the one you wrote:

The following metrics functions are not implemented as named scorers. They cannot be passed to the scoring parameters; instead their callable needs to be passed to make_scorer together with the value of the user-settable parameters

(without link to the section since we are already in it).

Then you can simplify the paragraph that starts with the "Many metrics are not given names to ..." to remove redundancies but keep introducing the usage example with the snippet that shows how to combine make_scorer with user settable parameters for a grid search.

Then, after the example snippet, finally insert the paragraph with the two bullet points:

image

This is not as important as the rest so I would move it to the end of the section but it's still useful information to speak about the naming conventions of the metric functions and how to set the higher_is_better keyword parameter in a consistent way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this proposal.


>>> from sklearn import svm, datasets
Expand All @@ -130,27 +126,25 @@ Usage examples:
Defining your scoring strategy from metric functions
-----------------------------------------------------

The module :mod:`sklearn.metrics` also exposes a set of simple functions
measuring a prediction error given ground truth and prediction:

- functions ending with ``_score`` return a value to
maximize, the higher the better.

- functions ending with ``_error`` or ``_loss`` return a
value to minimize, the lower the better. When converting
into a scorer object using :func:`make_scorer`, set
the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
parameter description below).

Metrics available for various machine learning tasks are detailed in sections
below.

Many metrics are not given names to be used as ``scoring`` values,
The following metrics functions are not implemented as named scorers,
sometimes because they require additional parameters, such as
:func:`fbeta_score`. In such cases, you need to generate an appropriate
scoring object. The simplest way to generate a callable object for scoring
is by using :func:`make_scorer`. That function converts metrics
into callables that can be used for model evaluation.
:func:`fbeta_score`. They cannot be passed to the ``scoring``
parameters; instead their callable needs to be passed to
:func:`make_scorer` together with the value of the user-settable
parameters.

===================================== ========= ==============================================
Function Parameter Example usage
===================================== ========= ==============================================
**Classification**
:func:`metrics.fbeta_score` ``beta`` ``make_scorer(fbeta_score, beta=2)``

**Regression**
:func:`metrics.mean_tweedie_deviance` ``power`` ``make_scorer(mean_tweedie_deviance, power=1.5)``
:func:`metrics.mean_pinball_loss` ``alpha`` ``make_scorer(mean_pinball_loss, alpha=0.95)``
:func:`metrics.d2_tweedie_score` ``power`` ``make_scorer(d2_tweedie_score, power=1.5)``
:func:`metrics.d2_pinball_score` ``alpha`` ``make_scorer(d2_pinball_score, alpha=0.95)``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing d2_absolute_error_score here. But this PR made me wonder if we should actually accept it as a named scorer. Thoughts on this @glemaitre, @ogrisel ?

Of course, that would be done in another PR. As this one already looks in good shape.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this would be for another PR. I'm not sure that there is a meaningful default for all use-case and it might be better to make sure that a user choose the parameter each time.

@ogrisel will have better insights on these metrics indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion in: #28750 (comment)

I think it was removed because you do not have to provide a parameter to make_scorer. I also first thought that d2_absolute_error_score got forgotten. Maybe that is a sign we should add a sentence (in a future PR?) to the table to say "there are more metrics, this is just a subset"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that d2_absolute_error_score is already equivalent to make_scorer(d2_pinball_score, alpha=0.5). Accepting it as a named scorer (a string to be passed to scoring) is redundant but could be a common enough practice to be worth the shortcut. Indeed, its presence in model_evaluation.rst L105 makes me think it was meant to be the case, but the OP from #22118 forgot to add it to the _SCORERS dict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in a dedicated follow-up PR (and document it as part of the first table for named scorers that do not require setting parameters).

===================================== ========= ==============================================

One typical use case is to wrap an existing metric function from the library
with non-default values for its parameters, such as the ``beta`` parameter for
Expand All @@ -163,6 +157,18 @@ the :func:`fbeta_score` function::
>>> grid = GridSearchCV(LinearSVC(dual="auto"), param_grid={'C': [1, 10]},
... scoring=ftwo_scorer, cv=5)

The module :mod:`sklearn.metrics` also exposes a set of simple functions
measuring a prediction error given ground truth and prediction:

- functions ending with ``_score`` return a value to
maximize, the higher the better.

- functions ending with ``_error``, ``_loss``, or ``_deviance`` return a
value to minimize, the lower the better. When converting
into a scorer object using :func:`make_scorer`, set
the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
parameter description below).


|details-start|
**Custom scorer objects**
Expand Down