Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaccard UndefinedMetricWarning gives bad advice #17826

Closed
crypdick opened this issue Jul 3, 2020 · 4 comments · Fixed by #17866
Closed

Jaccard UndefinedMetricWarning gives bad advice #17826

crypdick opened this issue Jul 3, 2020 · 4 comments · Fixed by #17866
Labels

Comments

@crypdick
Copy link
Contributor

crypdick commented Jul 3, 2020

Describe the bug

If you run jaccard with no true positives or predicted true labels, it gives the following warning:

sklearn/metrics/_classification.py:1221: UndefinedMetricWarning:
  
  Jaccard is ill-defined and being set to 0.0 in labels with no true or predicted samples. Use `zero_division` parameter to control this behavior.

I tried to heed the warning's advice. I tried adding zero_division=1 but then get this error:

 TypeError: jaccard_score() got an unexpected keyword argument 'zero_division'

Given the first warning, I expect jaccard to have a zero_division kwarg.

Versions

import sklearn; sklearn.show_versions()                                                       

System:
    python: 3.7.4 (default, Aug 13 2019, 20:35:49)  [GCC 7.3.0]
executable: /home/richard/src/anaconda3/bin/python
   machine: Linux-5.4.0-37-generic-x86_64-with-debian-bullseye-sid

Python dependencies:
          pip: 20.1.1
   setuptools: 41.4.0
      sklearn: 0.23.1
        numpy: 1.17.2
        scipy: 1.5.0
       Cython: 0.29.13
       pandas: 0.25.1
   matplotlib: 3.1.1
       joblib: 0.13.2
threadpoolctl: 2.1.0

Built with OpenMP: True
@crypdick crypdick added the Bug: triage Reported bugs that are not confirmed label Jul 3, 2020
@crypdick
Copy link
Contributor Author

crypdick commented Jul 3, 2020

Suggested patch:

  Index: ml/lib/python3.7/site-packages/sklearn/metrics/_classification.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- ml/lib/python3.7/site-packages/sklearn/metrics/_classification.py	(date 1593790417022)
+++ ml/lib/python3.7/site-packages/sklearn/metrics/_classification.py	(date 1593790417022)
@@ -680,7 +680,7 @@
 
 
 def jaccard_score(y_true, y_pred, labels=None, pos_label=1,
-                  average='binary', sample_weight=None):
+                  average='binary', sample_weight=None, zero_division='warn'):
     """Jaccard similarity coefficient score
 
     The Jaccard index [1], or Jaccard similarity coefficient, defined as
@@ -803,7 +803,7 @@
         denominator = np.array([denominator.sum()])
 
     jaccard = _prf_divide(numerator, denominator, 'jaccard',
-                          'true or predicted', average, ('jaccard',),)
+                          'true or predicted', average, ('jaccard',), zero_division=zero_division)
     if average is None:
         return jaccard
     if average == 'weighted':

@jnothman
Copy link
Member

jnothman commented Jul 4, 2020 via email

@adrinjalali adrinjalali added Bug and removed Bug: triage Reported bugs that are not confirmed labels Jul 5, 2020
@josephwillard
Copy link

Is there any objection if I work on this?

@crypdick
Copy link
Contributor Author

crypdick commented Jul 8, 2020

@josephwillard please do! I am too busy at work to write unit tests anytime soon. I just created a PR with a partial fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants