Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Expand Brier score, fix docstring #18051

Merged
merged 7 commits into from Aug 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
27 changes: 24 additions & 3 deletions doc/modules/model_evaluation.rst
Expand Up @@ -1505,9 +1505,9 @@ for binary classes. Quoting Wikipedia:
This function returns a score of the mean square difference between the actual
outcome and the predicted probability of the possible outcome. The actual
outcome has to be 1 or 0 (true or false), while the predicted probability of
the actual outcome can be a value between 0 and 1.
the actual outcome can be a value between 0 and 1 [Brier1950]_.

The brier score loss is also between 0 to 1 and the lower the score (the mean
The Brier score loss is also between 0 to 1 and the lower the score (the mean
square difference is smaller), the more accurate the prediction is. It can be
thought of as a measure of the "calibration" of a set of probabilistic
predictions.
Expand Down Expand Up @@ -1536,6 +1536,16 @@ Here is a small example of usage of this function:::
>>> brier_score_loss(y_true, y_prob > 0.5)
0.0

The Brier score can be used to assess how well a classifier is calibrated
however, a lower Brier score does not always mean a better calibration. This is
because the Brier score can be decomposed as the sum of calibration loss and
refinement loss [Bella2012]_. Calibration loss is defined as the mean squared
deviation from empirical probabilities derived from the slope of ROC segments.
Refinement loss can be defined as the expected optimal loss as measured by the
area under the optimal cost curve. Refinement loss can change independently
from calibration loss, thus a lower Brier score does not necessarily mean a
better calibrated model. "Only when refinement loss remains the same does a
lower Brier score always mean better calibration" [Bella2012]_, [Flach2008]_.

.. topic:: Example:

Expand All @@ -1545,10 +1555,21 @@ Here is a small example of usage of this function:::

.. topic:: References:

* G. Brier, `Verification of forecasts expressed in terms of probability
.. [Brier1950] G. Brier, `Verification of forecasts expressed in terms of
probability
<ftp://ftp.library.noaa.gov/docs.lib/htdocs/rescue/mwr/078/mwr-078-01-0001.pdf>`_,
Monthly weather review 78.1 (1950)

.. [Bella2012] Bella, Ferri, Hernández-Orallo, and Ramírez-Quintana
`"Calibration of Machine Learning Models"
<http://dmip.webs.upv.es/papers/BFHRHandbook2010.pdf>`_
in Khosrow-Pour, M. "Machine learning: concepts, methodologies, tools
and applications." Hershey, PA: Information Science Reference (2012).

.. [Flach2008] Flach, Peter, and Edson Matsubara. `"On classification, ranking,
and probability estimation." <https://drops.dagstuhl.de/opus/volltexte/2008/1382/>`_
Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum fr Informatik (2008).

.. _multilabel_ranking_metrics:

Multilabel ranking metrics
Expand Down
17 changes: 9 additions & 8 deletions sklearn/metrics/_classification.py
Expand Up @@ -2382,23 +2382,24 @@ def brier_score_loss(y_true, y_prob, *, sample_weight=None, pos_label=None):
"""Compute the Brier score.

The smaller the Brier score, the better, hence the naming with "loss".
Across all items in a set N predictions, the Brier score measures the
mean squared difference between (1) the predicted probability assigned
to the possible outcomes for item i, and (2) the actual outcome.
Therefore, the lower the Brier score is for a set of predictions, the
better the predictions are calibrated. Note that the Brier score always
The Brier score measures the mean squared difference between the predicted
probability and the actual outcome. The Brier score always
takes on a value between zero and one, since this is the largest
possible difference between a predicted probability (which must be
between zero and one) and the actual outcome (which can take on values
of only 0 and 1). The Brier loss is composed of refinement loss and
of only 0 and 1). It can be decomposed is the sum of refinement loss and
calibration loss.
ogrisel marked this conversation as resolved.
Show resolved Hide resolved

The Brier score is appropriate for binary and categorical outcomes that
can be structured as true or false, but is inappropriate for ordinal
variables which can take on three or more values (this is because the
Brier score assumes that all possible outcomes are equivalently
"distant" from one another). Which label is considered to be the positive
label is controlled via the parameter pos_label, which defaults to 1.
Read more in the :ref:`User Guide <calibration>`.
label is controlled via the parameter `pos_label`, which defaults to
the greater label unless `y_true` is all 0 or all -1, in which case
`pos_label` defaults to 1.

Read more in the :ref:`User Guide <brier_score_loss>`.

Parameters
----------
Expand Down