Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Binary classification #109

Closed
prashanthharshangi opened this issue Nov 18, 2021 · 4 comments
Closed

[BUG]: Binary classification #109

prashanthharshangi opened this issue Nov 18, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@prashanthharshangi
Copy link

Describe the bug
Assertion error: For binary classification problems with y in [0, 1]
To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
https://github.com/scikit-learn-contrib/MAPIE/blob/master/mapie/classification.py#L513

assert type_of_target(y) in ["binary", "multiclass"]

@prashanthharshangi prashanthharshangi added the bug Something isn't working label Nov 18, 2021
@gmartinonQM
Copy link
Contributor

Hi @prashanthharshangi , this is not a bug, MAPIE only deals with multi-class classification for now, because the very notion of prediction set does not make sense in the binary case. In the binary case, look for calibration : https://scikit-learn.org/stable/modules/calibration.html

@e-pet
Copy link

e-pet commented Apr 29, 2022

Hi @gmartinonQM! I am slightly confused by your response. :-) What I would be interested in, and what I would imagine @prashanthharshangi might also have been looking for, are confidence intervals for the risk scores returned by a binary classifier. I.e., the classifier might return a risk score of .76, but there is also uncertainty associated with that risk score. Many people are interested in uncertainty quantification for such risk scores (see, e.g., here and here), and this is an issue that is fully separate from calibration.

Could mapie also be used for that problem? (Binary) risk score estimation is basically single-variable regression, so I would image it should be possible, at least in theory?

@gmartinonQM
Copy link
Contributor

gmartinonQM commented Apr 29, 2022

Hi @e-pet , in this paper : https://arxiv.org/pdf/2006.10564.pdf , section 2.2, you can see that there is an impossibility theorem for getting mathematical guarantees about confidence intervals in the binary case. This is why the notion of calibration is the most relevant to begin with for binary classification.

In theorem 4 of the same paper however, you can indeed go further and get confidence interval around a calibrated score provided you discretize it. There is thus a trade-off between bin size and statistical significance of your interval. This is something we have not prioritized yet, but could indeed be useful in MAPIE in the future : calibrating + discretizing + getting confidence intervals on discretized, calibrated scores.

@e-pet
Copy link

e-pet commented May 14, 2022

Thanks for the pointer @gmartinonQM; really interesting paper! I'll be honest though, I am having trouble intuitively understanding why exactly probability regression should be inherently "harder" than any other kind of regression? Can you help me understand this / gain an intuition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants