Improve documentation: consistent scoring functions #10584

lorentzenchr · 2018-02-04T14:42:42Z

Abstract

Improve the documentation 3.3. Model Evaluation. Give advice to use strictly proper scoring functions.

Explanation

The documentation of scikit-learn is amazing and a strong argument for usage (and fame). I think that one could get a bit lost by choosing the right scoring function or metric out of the many options for model evaluation. There are some influential papers which advocate the usage of strictly proper scoring functions:

For classification and regression, most of the time one is interested in the mean functional of the distribution of the target variable y. Then, the scoring function used to compare the predictive power of different models (like when choosing the regularization strength via cross validation) should be strictly consistent for the mean functional.

Examples

For binary classification knowing the mean of y is equivalent to knowing the whole distribution of y. Brier (squared error) and logistic (log) loss are strictly consistent for the mean. Accuracy and 0-1 loss are only consistent, but not strictly consistent (they are strictly consistent for the mode which is less informative than the mean for classification). ROC, precision, recall etc. are not consistent for the mean, afaik.

For regression, Bregman functions (eq. (18) of 1.) are strictly consistent for the mean. For targets y on the whole real numbers, the squared error is one example. For positive-valued targets y, the squared error (b=2), Gamma deviance (b=0) and Poisson deviance (b=1) are examples, see eq. (20) of 1.

Maybe one should check the many metrics already provided by scikit-learn, if they are (strictly) consistent for a certain functional and if they are equivalent to one another.

Disclaimer

I do not know the authors of the cited papers and I'm not pushing my own research, just my opinion on how to improve this fantastic library 😏

The text was updated successfully, but these errors were encountered:

jnothman · 2018-02-04T21:32:39Z

interesting. I'll need to read up more on these topics, but perhaps you'd like to propose something more specific. One option is a "which metric should I use?" section of the user guide, another is advantages and disadvantages sections under each metric as we have for classification. The advantage of the former is that it is straightforward to use, and makes clear what questions are involved in the decision etc. But it might be harder to maintain.

jnothman · 2018-02-04T22:46:50Z

Without yet understanding any of the theory you refer to, we may need to be clear on "the mean of what" when we say "strictly consistent for the mean". In cross validation we care about the mean score over validation sets, not over samples in each validation set. A measure like recall (or balanced accuracy or AUROC) effectively weights positive and negative samples differently (note rand accuracy = prevalence-weighted average recall over classes), so over samples a measure like balanced accuracy explicitly adopts a weighted mean. Is using a weighted mean over samples a problem? (A problem with precision is that the weight is a function of the prediction under this interpretation.)

lorentzenchr · 2018-03-12T19:34:29Z

@jnothman I updated the above text slightly in order to make clear that it is always the mean of the target variable y. A scoring function should then be strictly consistent for the mean, i.e. the mean functional in general.

I would go for the first option and write a section "Which metric should I use?". I can make a proposal via a PR, but it may take some time.

…#10584)

lorentzenchr pushed a commit to lorentzenchr/scikit-learn that referenced this issue Jul 4, 2018

[MRG] DOC add guideline for choosing a scoring function (scikit-learn…

2c21743

…#10584)

lorentzenchr mentioned this issue Jul 4, 2018

[MRG] DOC add guideline for choosing a scoring function #11430

Open

cmarmo added the Documentation label Jan 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation: consistent scoring functions #10584

Improve documentation: consistent scoring functions #10584

lorentzenchr commented Feb 4, 2018 •

edited

jnothman commented Feb 4, 2018 via email

jnothman commented Feb 4, 2018 via email

lorentzenchr commented Mar 12, 2018

Improve documentation: consistent scoring functions #10584

Improve documentation: consistent scoring functions #10584

Comments

lorentzenchr commented Feb 4, 2018 • edited

Abstract

Explanation

Examples

Disclaimer

jnothman commented Feb 4, 2018 via email

jnothman commented Feb 4, 2018 via email

lorentzenchr commented Mar 12, 2018

lorentzenchr commented Feb 4, 2018 •

edited