[BUG]: Binary classification #109

prashanthharshangi · 2021-11-18T17:49:22Z

Describe the bug
Assertion error: For binary classification problems with y in [0, 1]
To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Additional context
https://github.com/scikit-learn-contrib/MAPIE/blob/master/mapie/classification.py#L513

assert type_of_target(y) in ["binary", "multiclass"]

The text was updated successfully, but these errors were encountered:

gmartinonQM · 2021-11-18T19:15:15Z

Hi @prashanthharshangi , this is not a bug, MAPIE only deals with multi-class classification for now, because the very notion of prediction set does not make sense in the binary case. In the binary case, look for calibration : https://scikit-learn.org/stable/modules/calibration.html

e-pet · 2022-04-29T19:24:00Z

Hi @gmartinonQM! I am slightly confused by your response. :-) What I would be interested in, and what I would imagine @prashanthharshangi might also have been looking for, are confidence intervals for the risk scores returned by a binary classifier. I.e., the classifier might return a risk score of .76, but there is also uncertainty associated with that risk score. Many people are interested in uncertainty quantification for such risk scores (see, e.g., here and here), and this is an issue that is fully separate from calibration.

Could mapie also be used for that problem? (Binary) risk score estimation is basically single-variable regression, so I would image it should be possible, at least in theory?

gmartinonQM · 2022-04-29T19:42:35Z

Hi @e-pet , in this paper : https://arxiv.org/pdf/2006.10564.pdf , section 2.2, you can see that there is an impossibility theorem for getting mathematical guarantees about confidence intervals in the binary case. This is why the notion of calibration is the most relevant to begin with for binary classification.

In theorem 4 of the same paper however, you can indeed go further and get confidence interval around a calibrated score provided you discretize it. There is thus a trade-off between bin size and statistical significance of your interval. This is something we have not prioritized yet, but could indeed be useful in MAPIE in the future : calibrating + discretizing + getting confidence intervals on discretized, calibrated scores.

e-pet · 2022-05-14T15:56:20Z

Thanks for the pointer @gmartinonQM; really interesting paper! I'll be honest though, I am having trouble intuitively understanding why exactly probability regression should be inherently "harder" than any other kind of regression? Can you help me understand this / gain an intuition?

prashanthharshangi added the bug Something isn't working label Nov 18, 2021

gmartinonQM closed this as completed Nov 18, 2021

e-pet mentioned this issue May 14, 2022

Risk score uncertainty (="prediction intervals" in classification) saattrupdan/doubt#37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Binary classification #109

[BUG]: Binary classification #109

prashanthharshangi commented Nov 18, 2021

gmartinonQM commented Nov 18, 2021

e-pet commented Apr 29, 2022

gmartinonQM commented Apr 29, 2022 •

edited

e-pet commented May 14, 2022

[BUG]: Binary classification #109

[BUG]: Binary classification #109

Comments

prashanthharshangi commented Nov 18, 2021

gmartinonQM commented Nov 18, 2021

e-pet commented Apr 29, 2022

gmartinonQM commented Apr 29, 2022 • edited

e-pet commented May 14, 2022

gmartinonQM commented Apr 29, 2022 •

edited