Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation: consistent scoring functions #10584

Open
lorentzenchr opened this issue Feb 4, 2018 · 3 comments
Open

Improve documentation: consistent scoring functions #10584

lorentzenchr opened this issue Feb 4, 2018 · 3 comments

Comments

@lorentzenchr
Copy link
Member

lorentzenchr commented Feb 4, 2018

Abstract

Improve the documentation 3.3. Model Evaluation. Give advice to use strictly proper scoring functions.

Explanation

The documentation of scikit-learn is amazing and a strong argument for usage (and fame). I think that one could get a bit lost by choosing the right scoring function or metric out of the many options for model evaluation. There are some influential papers which advocate the usage of strictly proper scoring functions:

  1. Making and Evaluating Point Forecasts, alt. link

  2. Strictly Proper Scoring Rules, Prediction,
    and Estimation
    , alt. link

For classification and regression, most of the time one is interested in the mean functional of the distribution of the target variable y. Then, the scoring function used to compare the predictive power of different models (like when choosing the regularization strength via cross validation) should be strictly consistent for the mean functional.

Examples

For binary classification knowing the mean of y is equivalent to knowing the whole distribution of y. Brier (squared error) and logistic (log) loss are strictly consistent for the mean. Accuracy and 0-1 loss are only consistent, but not strictly consistent (they are strictly consistent for the mode which is less informative than the mean for classification). ROC, precision, recall etc. are not consistent for the mean, afaik.

For regression, Bregman functions (eq. (18) of 1.) are strictly consistent for the mean. For targets y on the whole real numbers, the squared error is one example. For positive-valued targets y, the squared error (b=2), Gamma deviance (b=0) and Poisson deviance (b=1) are examples, see eq. (20) of 1.

Maybe one should check the many metrics already provided by scikit-learn, if they are (strictly) consistent for a certain functional and if they are equivalent to one another.

Disclaimer

I do not know the authors of the cited papers and I'm not pushing my own research, just my opinion on how to improve this fantastic library 😏

@jnothman
Copy link
Member

jnothman commented Feb 4, 2018 via email

@jnothman
Copy link
Member

jnothman commented Feb 4, 2018 via email

@lorentzenchr
Copy link
Member Author

@jnothman I updated the above text slightly in order to make clear that it is always the mean of the target variable y. A scoring function should then be strictly consistent for the mean, i.e. the mean functional in general.

I would go for the first option and write a section "Which metric should I use?". I can make a proposal via a PR, but it may take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants