New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX f1_score with zero_division=1 uses directly confusion matrix statistic #27577
Conversation
@glemaitre Thank you for handling this. |
I'll ping you when ready to be reviewed @OmarManzoor :) |
@OmarManzoor this is fine on my end. I think that we can forward with a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @glemaitre. This looks good! There is the codecov patch test that is failing. Do we need to check that?
I checked and this is a false positive. I added new cases even and I don't see new line that are not covered. I assume this is kind of relative coverage. |
Bit why is code coverage is calculating coverage for tests? Maybe there should be Codecov is telling that there was only a partial hit on |
The weird thing is that it should not be linked to the patch because we do at least as good as before. I would also find it strange that we don't have all combination but I will have a look. |
@glemaitre @OmarManzoor FYI this bug was not limited to binary classification cases, e.g. this: # Macro avg f-1 should be 1/3 (the "neutral" class has 100% precision & recall):
print(classification_report(
y_true=[0, 2], y_pred=[2, 0],
labels=[0, 1, 2], target_names=["negative", "neutral", "positive"],
zero_division=1.0
)) gave this (erroneous) report (on version 1.3.2): Broken output
however, the same code gave the correct answer when I ran on your updated version (db791f7): Fixed output
so it might be worth changing the title of this PR to reflect that; I encountered this bug in my own work and almost opened a new issue for it. |
Thanks. I will also change the the entry in the changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
Maybe the coverage goes down because we remove lines there were covered and keep lines that weren't covered?
This PR also appears to fix a similar issue that was occurring when Here's the bad behavior on >>> print(classification_report([0, 0, 1, 1, 2, 2], [0, 0, 0, 3, 3, 1], labels=list(range(5)), zero_division=np.nan))
precision recall f1-score support
0 0.67 1.00 0.80 2
1 0.00 0.00 nan 2
2 nan 0.00 nan 2
3 0.00 nan nan 0
4 nan nan nan 0
micro avg 0.33 0.33 0.33 6
macro avg 0.22 0.33 0.80 6
weighted avg 0.33 0.33 0.80 6 and here's the fixed behavior as of commit db791f7: >>> print(classification_report([0, 0, 1, 1, 2, 2], [0, 0, 0, 3, 3, 1], labels=list(range(5)), zero_division=np.nan))
precision recall f1-score support
0 0.67 1.00 0.80 2
1 0.00 0.00 0.00 2
2 nan 0.00 0.00 2
3 0.00 nan 0.00 0
4 nan nan nan 0
micro avg 0.33 0.33 0.33 6
macro avg 0.22 0.33 0.20 6
weighted avg 0.33 0.33 0.27 6 |
Co-authored-by: Tim Head <betatim@gmail.com>
Looking at the report of codecov, this is a false positive. |
As a regression, should this be backported into a patch release for 1.3 @glemaitre ? |
We are going to release 1.4 this week or next week since we did an RC at the end of last year. |
@jnothman I agree a patch to 1.3 would be justified. I also don't think I've seen anybody in the discussions on this repo point out that this is a particularly bad kind of error, in that it can be subtly wrong and go undetected. Especially with highly-multiclass classification, e.g.: >>> sklearn.__version__ '1.3.0' >>> sklearn.metrics.f1_score(y_true=list(range(104)), y_pred=list(range(100)) + [101, 102, 103, 104], average='macro', zero_division=1.0) 0.9809523809523809 vs >>> sklearn.__version__ '1.2.2' >>> sklearn.metrics.f1_score(y_true=list(range(104)), y_pred=list(range(100)) + [101, 102, 103, 104], average='macro', zero_division=1.0) 0.9523809523809523 |
I don't see the benefit of backporting in a particular bug fix release in 1.3 while we are releasing 1.4. One will have to upgrade version anyway and upgrading from 1.3 to 1.4 does not introduce some backward incompatible change. In addition, upgrading to 1.4 will bring other features and bug fixes. |
…istic (scikit-learn#27577) Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Tim Head <betatim@gmail.com>
Fixes #26965
Fixes #27189
Fixes #27165
I open this PR because I am not able to fix the remaining issue in #27165.
Fix the behaviour of
zero_division
inf1_score
by using the formulation based on confusion matrix statistics instead of precision recall where a division by zero might already has happen.