ENH: zero_division parameter for classification… #14900

marctorsoc · 2019-09-06T03:26:51Z

See issue #14876

What does this implement/fix? Explain your changes.

zero_division parameter for precision, recall, and friends

Any other comments?

3 possible values:

"warn": same behavior as before
0, 1: remove the warnings and set value for the metrics to this when the metrics is ill-defined (see issue for more details and examples)

Just to clarify:

prec will be ZD if all predictions are negative
rec will be ZD if all labels are negative
f will be ZD if everything is negative

Note that if ZD = "warn" this means 0 + warning

jnothman

thanks for opening the PR. Sorry I'm not able to review immediately.

doc/whats_new/v0.21.rst

sklearn/metrics/classification.py

…into prec_rec_fscore_zero_division # Conflicts: # sklearn/metrics/classification.py # sklearn/metrics/tests/test_classification.py lost some stuff after merging, need a review

jnothman · 2019-09-06T06:36:20Z

Oh yes, forgot that.

- Changed whats_new to 0.22 - F-score only warns if both prec and rec are ill-defined - new private method to simplify _prf_divide

marctorsoc · 2019-09-07T12:28:32Z

thanks for opening the PR. Sorry I'm not able to review immediately.

Hi @jnothman , it's just my second PR to sklearn so I'm still learning :)

I'm having a problem with git, it says I have 9 files changed, but actually changed only 3. It's like it's comparing with a master from some days ago. For example this commit:

2119490

is in master but its changes appear in the diff. Can you guide me to fix this?

jnothman · 2019-09-09T12:19:20Z

sklearn/metrics/classification.py

@@ -892,6 +903,12 @@ def f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary',
    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.

+    zero_division : string or int, default="warn"
+        Sets the behavior when there is a zero division. If set to
+        ("warn"|0)/1, returns 0/1 when both precision and recall are zero


I don't think this notation is easy enough to read. How about 'Sets the value to return when blah blah. If "warn" (default), this acts like 0 but also raises a warning.'

wrote something similar, please check new version

jnothman · 2019-09-09T12:21:36Z

sklearn/metrics/classification.py

@@ -1062,7 +1092,12 @@ def _prf_divide(numerator, denominator, metric, modifier, average, warn_for):
        return result

    # remove infs
-    result[mask] = 0.0
+    result[mask] = float(zero_division == 1)


this is obfuscated. I'd rather 0.0 if zero_division in ('warn', 0) else 1

done in new version

sklearn/metrics/classification.py

jnothman · 2019-09-09T12:24:21Z

sklearn/metrics/tests/test_classification.py

+    fbeta = my_assert(*tmp, y_true, y_pred, beta=beta,
+                      average=average, zero_division=zero_division)
+
+    zero_division = float(zero_division == 1)


simplified in the new version with two separated tests

jnothman · 2019-09-09T12:28:00Z

sklearn/metrics/tests/test_classification.py

-    assert_array_almost_equal(r, [0, 0, 0], 2)
-    assert_array_almost_equal(f, [0, 0, 0], 2)
+    func = precision_recall_fscore_support
+    my_assert = (assert_warns if zero_division == "warn"


if you must do something like this, use functools.partial to capture the arguments too.

But I think tests must be very readable code, as the reader needs to be absolutely certain of their correctness to be confident that they in turn imply the corretness of the code.

simplified in the new version with two separated tests

jnothman · 2019-09-09T12:28:28Z

sklearn/metrics/tests/test_classification.py

+    fbeta = my_assert(*tmp, y_true, y_pred, beta=beta,
+                      average=None, zero_division=zero_division)
+
+    zero_division = float(zero_division == 1)


This is obfuscated. I'd rather a clear, separate test checking the behaviour of zero_divison, than a tiny, unexplicit piece in a larger test.

simplified in the new version with two separated tests

- better docstrings - more explicit use of zero_division value

jnothman · 2019-09-12T04:29:26Z

doc/developers/advanced_installation.rst

-<https://visualstudio.microsoft.com/de/downloads/>`_.
+<https://visualstudio.microsoft.com/downloads/>`_.
+
+.. warning::


You've done something strange in trying to merge in changes from master. Please try to merge in the latest master again

merged again master into my branch. Now only the 3 files appear

# Conflicts: # doc/whats_new/v0.22.rst

marctorsoc · 2019-09-23T23:07:40Z

@jnothman any more comments?

jnothman

Thanks for the ping.

I don't think we currently test the return value (i.e. zero_division=1) except in the case that all the labels (true and pred) are negative... we don't seem to test zero_division=1 in the zero-sample_weight case either (though it is a pretty weird case).

jnothman · 2019-09-24T08:28:01Z

sklearn/metrics/classification.py

@@ -2065,7 +2176,8 @@ def log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None,
    y_true : array-like or label indicator matrix
        Ground truth (correct) labels for n_samples samples.

-    y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)
+    y_pred : array-like of float, shape = (n_samples, n_classes) or
+        (n_samples,)


Please leave this as it was. Going over the line length is the best we can do really to render correctly in pydoc and Sphinx

jnothman · 2019-09-24T08:28:22Z

sklearn/metrics/tests/test_classification.py

@@ -1875,7 +2030,7 @@ def test_hinge_loss_multiclass_with_missing_labels():
    np.clip(dummy_losses, 0, None, out=dummy_losses)
    dummy_hinge_loss = np.mean(dummy_losses)
    assert (hinge_loss(y_true, pred_decision, labels=labels) ==
-                 dummy_hinge_loss)
+            dummy_hinge_loss)


Please do not change unrelated things. It makes your contribution harder to review and may introduce merge conflicts to other pull requests.

if I don't change this I have flake8 warning:

sklearn/metrics/tests/test_classification.py:1988:18: E127 continuation line over-indented for visual indent

Yes, I know that this is bad PEP8... we've considered black, but not clearly decided in its favour

jnothman · 2019-09-24T08:57:48Z

sklearn/metrics/tests/test_classification.py

-                        weights="linear"), 0.9412, decimal=4)
-    assert_almost_equal(cohen_kappa_score(y1, y2,
-                        weights="quadratic"), 0.9541, decimal=4)
+    assert_almost_equal(


Please do not change unrelated things. It makes your contribution harder to review and may introduce merge conflicts to other pull requests.

sorry for that. This formatting things are so annoying, have you considered black? it's really handy

jnothman · 2019-09-24T08:59:43Z

sklearn/metrics/tests/test_classification.py

    assert_almost_equal(fbeta, 0)


-def test_precision_recall_f1_no_labels_average_none():
+@pytest.mark.parametrize('zero_division', [0, 1])


I don't think this is an exemplary use-case for parametrize given that you then need to handle the warnings case separately!

Given your previous comment:

This is obfuscated. I'd rather a clear, separate test checking the behaviour of zero_divison, than a tiny, unexplicit piece in a larger test.

I decided to separate this into two tests. I think it is a lot more readable. Otherwise, there are if's or the use of functools.partial. I can go back to previous version, but honestly, if we want readability I think this is better (maybe with better names)

jnothman · 2019-09-24T09:00:23Z

sklearn/metrics/tests/test_classification.py

    assert_array_almost_equal(fbeta, [0, 0, 0], 2)


-def test_prf_warnings():
+@pytest.mark.parametrize('zero_division', ["warn"])


not sure how this helps

jnothman · 2019-09-25T07:38:39Z

Ahh yes, the pytest.warns context manager might help with making the no warning case fit a similar structure to the yes warning case, using pytest.warns(None). But okay.

marctorsoc · 2019-09-25T08:04:45Z

updated table in the description, it was wrong :(

prec will be ZD if all predictions are negative
rec will be ZD if all labels are negative
f will be ZD if everything is negative

- added tests for YTN or YPN to check prec/rec with zero_division value - cleaner tests

marctorsoc · 2019-09-25T08:18:50Z

Thanks for the ping.

I don't think we currently test the return value (i.e. zero_division=1) except in the case that all the labels (true and pred) are negative... we don't seem to test zero_division=1 in the zero-sample_weight case either (though it is a pretty weird case).

Added zero_division to a test where prec and rec both have their peculiar cases. I don't understand the last comment, do you mean passing the labels param with a label that is not present?

jnothman · 2019-09-25T10:12:35Z

The last comment was about a special case in the code where sample_weight=zeros

jnothman

Thank you. This is looking good!

Let's see what others think about this, including the parameter name which I think is still up for debate.

jnothman

Thank you. This is looking good!

Let's see what others think about this, including the parameter name which I think is still up for debate.

marctorsoc · 2019-09-25T11:26:55Z

Thank you. This is looking good!

Let's see what others think about this, including the parameter name which I think is still up for debate.

Thanks!

thomasjpfan

Would on_zero_division be a better name?

sklearn/metrics/classification.py

marctorsoc · 2019-09-25T21:14:07Z

Would on_zero_division be a better name?

IMHO, it's as readable as zero_division so I would keep the shortest one, but I have no strong opinion about this

All the rest of changes your proposed have been applied

thomasjpfan · 2019-10-02T18:02:35Z

When one sets zero_division=1 is it obvious that it means: "If the denominator is zero, the value of this metric is 1"?

The logical is "If there is zero division then do something (warn, or set 0 or 1)". My concern is how just "zero_division" does not capture the "if..." part of the statement.

Maybe if_zero_division is better?

jnothman · 2019-10-02T21:03:18Z

I prefer on_zero_division to if_zero_division...

marctorsoc · 2019-10-02T21:44:35Z

I prefer on_zero_division to if_zero_division...

For me just zero_division is fine to be honest, but if I have to choose one I would go for on_zero_division

Here: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html

the error_score has a similar behavior. Maybe we want zero_division_score? (IMHO I find it too long. If a user has doubts given the name she should just check the docs...)

jnothman · 2019-10-02T21:54:02Z

I suppose I can reiterate that I'm fine with zero_division

marctorsoc · 2019-10-10T14:01:20Z

can we have a 4th opinion on this or just take a decision on this?

thomasjpfan · 2019-10-10T17:29:05Z

I am also fine with zero_division.

sklearn/metrics/tests/test_classification.py

thomasjpfan · 2019-10-12T15:48:54Z

Thank you @marctorrellas !

first commit for issue 14876: zero_division parameter

84d3813

jnothman reviewed Sep 6, 2019

View reviewed changes

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved

sklearn/metrics/classification.py Outdated Show resolved Hide resolved

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

9ae7dab

…into prec_rec_fscore_zero_division # Conflicts: # sklearn/metrics/classification.py # sklearn/metrics/tests/test_classification.py lost some stuff after merging, need a review

- merge with master

16dcda1

- Changed whats_new to 0.22 - F-score only warns if both prec and rec are ill-defined - new private method to simplify _prf_divide

marctorsoc changed the title ~~[WIP] Issue 14876: zero_division parameter for classification metrics~~ [MRG] Issue 14876: zero_division parameter for classification metrics Sep 7, 2019

marctorsoc added 3 commits September 7, 2019 13:31

fixed "[...0 0...]" --> "[...0, 0...]" in docstring

29d1109

corrected docstring examples

f4e85e6

corrected docstring examples (again)

446f878

jnothman reviewed Sep 9, 2019

View reviewed changes

- tests for warn and [0,1] separated to make them more clear;

e189423

- better docstrings - more explicit use of zero_division value

jnothman reviewed Sep 12, 2019

View reviewed changes

marctorsoc added 2 commits September 12, 2019 09:02

Merge branch 'sklearn_master' into prec_rec_fscore_zero_division

eadf88d

# Conflicts: # doc/whats_new/v0.22.rst

Merge branch 'sklearn_master' into prec_rec_fscore_zero_division

31d075f

- removed fstring to make compatible with python<3.6

3befe31

jnothman reviewed Sep 24, 2019

View reviewed changes

- reverted changes to avoid flake8 warnings

7a2bc7d

- added tests for YTN or YPN to check prec/rec with zero_division value - cleaner tests

jnothman reviewed Sep 25, 2019

View reviewed changes

jnothman approved these changes Sep 25, 2019

View reviewed changes

thomasjpfan reviewed Sep 25, 2019

View reviewed changes

small refactoring requested by @thomasjpfan

7891156

thomasjpfan approved these changes Oct 10, 2019

View reviewed changes

thomasjpfan requested changes Oct 10, 2019

View reviewed changes

sklearn/metrics/tests/test_classification.py Outdated Show resolved Hide resolved

assert_warns --> pytest.warns

3d3760e

marctorsoc requested a review from thomasjpfan October 12, 2019 15:40

thomasjpfan approved these changes Oct 12, 2019

View reviewed changes

thomasjpfan changed the title ~~[MRG] Issue 14876: zero_division parameter for classification metrics~~ ENH: zero_division parameter for classification… Oct 12, 2019

thomasjpfan merged commit 7f079e3 into scikit-learn:master Oct 12, 2019

marctorsoc mentioned this pull request May 21, 2020

Precision Recall and F-score: behavior when all negative #14876

Closed

suryagutta mentioned this pull request Mar 21, 2021

UndefinedMetricWarnings while running classification/main_train.py on SEN12MS Berkeley-Data/hpt#52

Open

marctorsoc deleted the prec_rec_fscore_zero_division branch February 26, 2022 02:23

This was referenced Feb 26, 2022

(WIP) ENH Add zero_division = none or np.nan #22618

Closed

Add zero_division None or np.nan #22625

Open

marctorsoc mentioned this pull request Apr 21, 2022

ENH add zero_division=nan for classification metrics #23183

Open

ENH: zero_division parameter for classification… #14900

ENH: zero_division parameter for classification… #14900

Conversation

marctorsoc commented Sep 6, 2019 • edited

What does this implement/fix? Explain your changes.

Any other comments?

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Sep 6, 2019 via email

marctorsoc commented Sep 7, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marctorsoc commented Sep 23, 2019

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Sep 25, 2019 via email

marctorsoc commented Sep 25, 2019 • edited

marctorsoc commented Sep 25, 2019

jnothman commented Sep 25, 2019 via email

jnothman left a comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

marctorsoc commented Sep 25, 2019

thomasjpfan left a comment

Choose a reason for hiding this comment

marctorsoc commented Sep 25, 2019

thomasjpfan commented Oct 2, 2019

jnothman commented Oct 2, 2019 via email

marctorsoc commented Oct 2, 2019

jnothman commented Oct 2, 2019 via email

marctorsoc commented Oct 10, 2019

thomasjpfan commented Oct 10, 2019

thomasjpfan commented Oct 12, 2019

marctorsoc commented Sep 6, 2019 •

edited

marctorsoc commented Sep 7, 2019 •

edited

marctorsoc commented Sep 25, 2019 •

edited