FIX f1_score with zero_division=1 uses directly confusion matrix statistic #27577

glemaitre · 2023-10-12T12:31:58Z

I open this PR because I am not able to fix the remaining issue in #27165.

Fix the behaviour of zero_division in f1_score by using the formulation based on confusion matrix statistics instead of precision recall where a division by zero might already has happen.

… recall

github-actions · 2023-10-12T12:34:00Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 6c77c9f. Link to the linter CI: here}

OmarManzoor · 2023-10-13T06:03:28Z

@glemaitre Thank you for handling this.

glemaitre · 2023-10-13T15:10:26Z

I'll ping you when ready to be reviewed @OmarManzoor :)

glemaitre · 2023-10-23T13:29:43Z

@OmarManzoor this is fine on my end. I think that we can forward with a review.
Since that now, the f-score can be defined even if the precision or recall are not, it simplifies the message to be shown. So we need to only raise independent message.

OmarManzoor

Thanks @glemaitre. This looks good! There is the codecov patch test that is failing. Do we need to check that?

glemaitre · 2023-10-26T13:16:51Z

Do we need to check that?

I checked and this is a false positive. I added new cases even and I don't see new line that are not covered. I assume this is kind of relative coverage.

glevv · 2023-11-01T07:03:43Z

Bit why is code coverage is calculating coverage for tests? Maybe there should be omit line in the coveragerc?

Codecov is telling that there was only a partial hit on war_for if. My guess is that not all combinations of average and metric were checked.

glemaitre · 2023-11-01T10:52:57Z

Codecov is telling that there was only a partial hit on war_for if. My guess is that not all combinations of average and metric were checked.

The weird thing is that it should not be linked to the patch because we do at least as good as before. I would also find it strange that we don't have all combination but I will have a look.

boyleconnor · 2023-12-08T04:45:06Z

@glemaitre @OmarManzoor FYI this bug was not limited to binary classification cases, e.g. this:

# Macro avg f-1 should be 1/3 (the "neutral" class has 100% precision & recall):
print(classification_report(
    y_true=[0, 2], y_pred=[2, 0],
    labels=[0, 1, 2], target_names=["negative", "neutral", "positive"],
    zero_division=1.0
))

gave this (erroneous) report (on version 1.3.2):

Broken output

              precision    recall  f1-score   support

    negative       0.00      0.00      1.00       1.0
    positive       0.00      0.00      1.00       1.0

    accuracy                           1.00       2.0
   macro avg       0.00      0.00      1.00       2.0
weighted avg       0.00      0.00      1.00       2.0

however, the same code gave the correct answer when I ran on your updated version (db791f7):

Fixed output

              precision    recall  f1-score   support

    negative       0.00      0.00      0.00       1.0
     neutral       1.00      1.00      1.00       0.0
    positive       0.00      0.00      0.00       1.0

   micro avg       0.00      0.00      0.00       2.0
   macro avg       0.33      0.33      0.33       2.0
weighted avg       0.00      0.00      0.00       2.0

so it might be worth changing the title of this PR to reflect that; I encountered this bug in my own work and almost opened a new issue for it.

glemaitre · 2023-12-08T07:49:42Z

Thanks. I will also change the the entry in the changelog.

doc/whats_new/v1.4.rst

betatim

Looks reasonable to me.

Maybe the coverage goes down because we remove lines there were covered and keep lines that weren't covered?

boyleconnor · 2023-12-09T22:57:31Z

This PR also appears to fix a similar issue that was occurring when zero_division=np.nan.

Here's the bad behavior on 1.3.2:

>>> print(classification_report([0, 0, 1, 1, 2, 2], [0, 0, 0, 3, 3, 1], labels=list(range(5)), zero_division=np.nan))
              precision    recall  f1-score   support

           0       0.67      1.00      0.80         2
           1       0.00      0.00       nan         2
           2        nan      0.00       nan         2
           3       0.00       nan       nan         0
           4        nan       nan       nan         0

   micro avg       0.33      0.33      0.33         6
   macro avg       0.22      0.33      0.80         6
weighted avg       0.33      0.33      0.80         6

and here's the fixed behavior as of commit db791f7:

>>> print(classification_report([0, 0, 1, 1, 2, 2], [0, 0, 0, 3, 3, 1], labels=list(range(5)), zero_division=np.nan))
              precision    recall  f1-score   support

           0       0.67      1.00      0.80         2
           1       0.00      0.00      0.00         2
           2        nan      0.00      0.00         2
           3       0.00       nan      0.00         0
           4        nan       nan       nan         0

   micro avg       0.33      0.33      0.33         6
   macro avg       0.22      0.33      0.20         6
weighted avg       0.33      0.33      0.27         6

Co-authored-by: Tim Head <betatim@gmail.com>

glemaitre · 2023-12-11T12:49:56Z

Looking at the report of codecov, this is a false positive.

jnothman · 2023-12-27T04:07:11Z

As a regression, should this be backported into a patch release for 1.3 @glemaitre ?

glemaitre · 2024-01-09T13:42:43Z

We are going to release 1.4 this week or next week since we did an RC at the end of last year.

boyleconnor · 2024-01-11T07:48:04Z

@jnothman I agree a patch to 1.3 would be justified.

I also don't think I've seen anybody in the discussions on this repo point out that this is a particularly bad kind of error, in that it can be subtly wrong and go undetected. Especially with highly-multiclass classification, e.g.:

>>> sklearn.__version__
'1.3.0'
>>> sklearn.metrics.f1_score(y_true=list(range(104)), y_pred=list(range(100)) + [101, 102, 103, 104], average='macro', zero_division=1.0)
0.9809523809523809

vs

>>> sklearn.__version__
'1.2.2'
>>> sklearn.metrics.f1_score(y_true=list(range(104)), y_pred=list(range(100)) + [101, 102, 103, 104], average='macro', zero_division=1.0)
0.9523809523809523

glemaitre · 2024-01-11T09:56:35Z

@jnothman I agree a patch to 1.3 would be justified.

I don't see the benefit of backporting in a particular bug fix release in 1.3 while we are releasing 1.4. One will have to upgrade version anyway and upgrading from 1.3 to 1.4 does not introduce some backward incompatible change. In addition, upgrading to 1.4 will bring other features and bug fixes.

…istic (scikit-learn#27577) Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Tim Head <betatim@gmail.com>

OmarManzoor and others added 10 commits August 25, 2023 17:11

FIX f1_score with zero_division=1 on binary classes

5083b2c

Merge branch 'main' into f1_fix

8ea01e1

Merge branch 'main' into f1_fix

0ec9972

Fix f1 score using a formulation which does not require precision and…

fdbfa71

… recall

Add changlog

ae01c7a

Remove f1 warning that does not apply anymore

6678537

Fix warnings and tests that failed because of warning change

b5d1f3e

Correct value in doctest

9b67107

Merge remote-tracking branch 'origin/main' into pr/OmarManzoor/27165

612c77e

remove useless file

c26e224

github-actions bot added the module:metrics label Oct 12, 2023

fix changelog

5016f21

glemaitre added 6 commits October 13, 2023 17:13

Merge remote-tracking branch 'origin/main' into fix_f1

57bd201

Merge remote-tracking branch 'origin/main' into fix_f1

306f807

iter

23b5f93

parametrize test

ada4568

adapt tests

153242d

avoid changes in changelog

db791f7

OmarManzoor approved these changes Oct 26, 2023

View reviewed changes

i-aki-y mentioned this pull request Nov 2, 2023

Add zero_division option to the precision, recall, f1, fbeta. Lightning-AI/torchmetrics#2198

Merged

4 tasks

glemaitre changed the title ~~FIX f1_score with zero_division=1 on binary classes~~ FIX f1_score with zero_division=1 uses directly confusion matrix statistic Dec 8, 2023

glemaitre added this to the 1.4 milestone Dec 8, 2023

betatim reviewed Dec 8, 2023

View reviewed changes

doc/whats_new/v1.4.rst Outdated Show resolved Hide resolved

betatim approved these changes Dec 8, 2023

View reviewed changes

boyleconnor mentioned this pull request Dec 11, 2023

DOC update the formula regarding the computation of the F1-score #27936

Merged

glemaitre and others added 2 commits December 11, 2023 12:08

Update doc/whats_new/v1.4.rst

b8f8fdd

Co-authored-by: Tim Head <betatim@gmail.com>

Merge branch 'main' into fix_f1

6c77c9f

glemaitre merged commit 3b06962 into scikit-learn:main Dec 11, 2023
26 of 27 checks passed

boyleconnor mentioned this pull request Jan 25, 2024

Potentially Inaccurate F-1 Scores WK-Chen/MAIR_23#1

Open

pedwrds mentioned this pull request Feb 1, 2024

Unable to import LazyRegressor from lazypredict.Supervised pedwrds/lazypredict#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX f1_score with zero_division=1 uses directly confusion matrix statistic #27577

FIX f1_score with zero_division=1 uses directly confusion matrix statistic #27577

glemaitre commented Oct 12, 2023 •

edited

github-actions bot commented Oct 12, 2023 •

edited

OmarManzoor commented Oct 13, 2023

glemaitre commented Oct 13, 2023

glemaitre commented Oct 23, 2023

OmarManzoor left a comment

glemaitre commented Oct 26, 2023

glevv commented Nov 1, 2023

glemaitre commented Nov 1, 2023

boyleconnor commented Dec 8, 2023

glemaitre commented Dec 8, 2023

betatim left a comment

boyleconnor commented Dec 9, 2023

glemaitre commented Dec 11, 2023

jnothman commented Dec 27, 2023

glemaitre commented Jan 9, 2024

boyleconnor commented Jan 11, 2024

glemaitre commented Jan 11, 2024

FIX f1_score with zero_division=1 uses directly confusion matrix statistic #27577

FIX f1_score with zero_division=1 uses directly confusion matrix statistic #27577

Conversation

glemaitre commented Oct 12, 2023 • edited

github-actions bot commented Oct 12, 2023 • edited

✔️ Linting Passed

OmarManzoor commented Oct 13, 2023

glemaitre commented Oct 13, 2023

glemaitre commented Oct 23, 2023

OmarManzoor left a comment

Choose a reason for hiding this comment

glemaitre commented Oct 26, 2023

glevv commented Nov 1, 2023

glemaitre commented Nov 1, 2023

boyleconnor commented Dec 8, 2023

glemaitre commented Dec 8, 2023

betatim left a comment

Choose a reason for hiding this comment

boyleconnor commented Dec 9, 2023

glemaitre commented Dec 11, 2023

jnothman commented Dec 27, 2023

glemaitre commented Jan 9, 2024

boyleconnor commented Jan 11, 2024

glemaitre commented Jan 11, 2024

glemaitre commented Oct 12, 2023 •

edited

github-actions bot commented Oct 12, 2023 •

edited