Bug Related to Calculation of Binary Metrics #349

anmolsjoshi · 2018-11-29T22:42:46Z

Fixes #348

Description:
Bug in Binary Precision/Recall maps binary cases into 2 classes and then averages the metrics of both. This is an incorrect method of calculating precision and recall for Precision and Recall. It should be treated as a one person class only.

I have included the following in the code:

Created _check_shape to process and check the shapes of y, y_pred
Created _check_type to determine the type of problem - binary or multiclass - based on y and y_pred, also raises error if the problem type changes during training. Type is decided on first update, and then checked for each subsequent update.
Calculates binary precision using threshold function, torch.round default
Includes check of binary output eg: torch.equal(y, y ** 2)
Only inputs torch.round as default is problem is binary
Appropriate checks for threshold_function
Added better tests - improved binary tests, incorrect threshold function, incorrect y, changing type in between updates.

Check list:

New tests are added (if a new feature is modified)
New doc strings: text and/or example code are in RST format
Documentation is updated (if required)

…alculation type remains the same during training, calculate binary precision using a threshold function vs categorical.

anmolsjoshi · 2018-11-29T22:43:31Z

@vfdev-5 please see the following PR. If we stick with this method, it'll be easier to incorporate multilabel as well.

vfdev-5 · 2018-11-29T22:51:04Z

@anmolsjoshi thanks for the PR !
Refactoring huge update into _check_shape, _check_type definitly makes sense. I have an impression that _check_shape, _check_type are the same for Precision and Recall ? Maybe even Accuracy could benefit from one of them ?

anmolsjoshi · 2018-11-29T22:53:31Z

@vfdev-5 no problem!

Refactoring huge update into _check_shape, _check_type definitly makes sense. I have an impression that _check_shape, _check_type are the same for Precision and Recall ?

You are correct, they are the same for Precision and Recall. I took a note from how sklearn calculates precision and recall.

Maybe even Accuracy could benefit from one of them ?

Definitely! I can work on that later tonight.

vfdev-5 · 2018-11-29T22:54:30Z

Could you please post a reference on sklearn here ?

Maybe we could create a common class for Precision, Recall in order to not to copy these methods ?

vfdev-5 · 2018-11-29T22:56:38Z

Tell me when you are done with the code, I'll update the tests in the same way as for Accuracy.

anmolsjoshi · 2018-11-29T22:58:32Z

@vfdev-5 not sure what went wrong with the pytorch-nightly-cpu 2.7

anmolsjoshi · 2018-11-29T23:00:13Z

Here is the sklearn reference

vfdev-5 · 2018-11-29T23:05:14Z

@anmolsjoshi seems like that pytorch nightly is broken for 2.7

anmolsjoshi · 2018-11-29T23:05:36Z

Maybe we could create a common class for Precision, Recall in order to not to copy these methods ?

That's an interesting idea. So a base class with a common _check_type, _check_shape? looks like we can make compute the same too - change actual and all_positives to denominator.

anmolsjoshi · 2018-11-29T23:08:49Z

@vfdev-5 I believe that I'm done with all the tests and code. Feel free to incorporate this into Accuracy.

@anmolsjoshi seems like that pytorch nightly is broken for 2.7

what are next steps?

vfdev-5 · 2018-11-29T23:13:08Z

Let's ignore this failing test for a moment, if this is pytorch problem, it will be solved soon. We can ask to be sure.

Next step is to refactor classes to avoid code copying.

I'll improve tests.

anmolsjoshi · 2018-11-29T23:24:04Z

@vfdev-5 ok I'll start refactoring the classes. Did you want me to work on Accuracy as well? That's entirely fine, just don't want repeated efforts from both of us

vfdev-5 · 2018-11-29T23:26:49Z

Did you want me to work on Accuracy as well?

@anmolsjoshi yes please :)

…port, precision/recall calculation into PrecisionRecallSupport. precision.py and recall.py - use PrecisionRecallSupport

anmolsjoshi · 2018-11-30T00:15:38Z

@vfdev-5 I refactored the code, let me know if this works for you!

Still working on Accuracy.

~~Do I need to write tests for the refactored code in _classification_support.py? All the lines of code are run using test_precision and test_recall.~~ Nevermind, just checked the code coverage of _classification_support.py in one of the passing tests.

_classification_support.py

ClassificationSupport - this will be used in Accuracy, Precision, Recall - contains type check and shape check
PrecisionRecallSupport - this will be used to calculate Precision/Recall - contains common update, reset, compute

precision.py

Precision - uses PrecisionRecallSupport as a base class with self._precision_vs_recall True

recall.py

Recall - uses PrecisionRecallSuport as a base class with self._precision_vs_recall False.

vfdev-5 · 2018-11-30T00:28:58Z

@anmolsjoshi I think we can create these classes directly in accuracy.py, precision.py files. No need of _classification_support.py.
ClassificationSupport -> ~ _BaseClassification, leave it abstract (do not define update, we should not instantiate this class).
PrecisionRecallSupport -> ~ _(Base)PrecisionRecall(Support)
I'm not fan of _precision_vs_recall as true/false. Maybe better a function to override ?
Initialization may be specific to Accurary, Precision, Recall, IMO should not be done in ClassificationSupport ?

anmolsjoshi · 2018-11-30T03:59:29Z

@vfdev-5

ClassificationSupport -> ~ _BaseClassification, leave it abstract (do not define update, we should not instantiate this class).

Agree, have updated. I also kept _BasePrecisionRecallSupport as abstract where the update function is missing.

I'm not fan of _precision_vs_recall as true/false. Maybe better a function to override ?

Could you explain that further?

~~The calculation is entirely the same, the only difference is what type of positives to calculate i.e. positives of y_pred or y. precision_vs_recall might not be the cleanest way of doing this.~~

For now, I created two functions in _BasePrecisionRecallSupport called _calculate_correct and _sum_positives. Precision and Recall use both in the update function as shown below:

class Precision(_BasePrecisionRecallSupport):

...

    def update(self, output):
        correct, y_pred, y = self._calculate_correct(output)
        all_positives = y_pred.sum(dim=0)
        self._sum_positives(correct, all_positives)

This way there is no clunky if else statement like before.

Initialization may be specific to Accurary, Precision, Recall, IMO should not be done in ClassificationSupport

Referring to this comment, it might be best to stick with threshold_function for binary calculation. I think initialization of _BaseClassification should stay the same, only thing that is added for Precision and Recall is the term average, which is introduced in _BasePrecisionRecallSupport (child of _BaseClassification).

think we can create these classes directly in accuracy.py, precision.py files. No need of _classification_support.py.

I'll move _BaseClassification to accuracy.py, and _BasePrecisionRecallSupport to precision.py, and will call _BasePrecisionRecallSupport in recall.py.

Suggestion - let's create one file called precision_recall.py which includes _BasePrecisionRecallSupport, Precision and Recall, and change the init.py appropriately. - My latest commit reflects this, we can obviously revert back to previous setups.

Thoughts?

…pport (child of _BaseClassification). Refactored Accuracy into _BaseClassification.

…recisionRecallSupport, Precision, Recall.

… accuracy from pytorch#333

anmolsjoshi · 2018-11-30T06:43:19Z

@vfdev-5 Here is a summary of the most up to date commit:

accuracy.py - contains _BaseClassification (still abstract) with improved binary shape check (last few commits had incorrect logic)
recall.py - deleted
precision.py - renamed to precision_recall.py
precision_recall.py - contains _BasePrecisionRecallSupport (child of _BaseClassification and still abstract), Precision and Recall. Precision and Recall have distinct update functions.
init.py - changed for updated Precision and Recall calls
tests_accuracy.py - Contains your modified tests from Multilabel PR and include tests similar to test_precision/recall testing threshold_function, _check_type.

Please note that Binary Accuracy is now being calculated using torch.round, I think it is the preferred method due to inconsistent results in between PyTorch versions. Discussed in this comment.

anmolsjoshi · 2018-11-30T21:34:47Z

@vfdev-5 all tests are now passing. I updated precision and recall so that they have there own unique updates. Let me know if you have any other issues!

vfdev-5 · 2018-12-09T20:35:01Z

What are your thoughts on adding thresholded_output_transform to ignite._utils.py? So that users can access it.

For instance I'm not sure that this is necessary... Let's see

anmolsjoshi · 2018-12-10T21:12:27Z

@vfdev-5 the original binary_accuracy had torch.round implemented within update, see here.

Is there a chance we might inconvenience people by having them use output_transform? I guess if the plan is to cut a release after this is merged, we can point this out.

vfdev-5 · 2018-12-10T21:16:48Z

@anmolsjoshi this is a good point ! Actually, just remarked that in the docs it was directly stated that y_pred should be between 0 and 1. I think we lost this when merged binary and categorical accuracies.

anmolsjoshi · 2018-12-10T21:41:45Z

Yes, that's probably my bad when I sent in PR #275, I missed including that docstring.

How do you think we should proceed? Still include binary to categorical mapping? Or introduce this change in the release notes?

vfdev-5 · 2018-12-10T22:14:28Z

@anmolsjoshi no problems with that. We'll take care of it this time :) I'm readjusting the tests now.

How do you think we should proceed? Still include binary to categorical mapping? Or introduce this change in the release notes?

Thanks for asking! So, as we discussed #348 in we need to fix the bug and cut the release 0.1.2 that should be compatible with 0.1.1. Thus, we should include torch.round in binary case and update the docs saying that y_pred is between 0 and 1 (in binary case).

In another PR we can remove torch.round and propose user to apply binarization inside output_transform.

vfdev-5 · 2018-12-10T22:38:12Z

I'll commit some modifications on Accuracy and let you review it.

vfdev-5 · 2018-12-10T22:55:45Z

ignite/metrics/precision.py

+        y_pred, y = output
+
+        if y.ndimension() + 1 == y_pred.ndimension():
+            if y_pred.shape[1] == 2:


@anmolsjoshi could you please recall me why you separate this case and raise a warning ?

@vfdev-5 if num_classes=2, it is a binary case that is being fed as a categorical case. I think it'll be helpful to warn the user that only precision for the positive class if calculated in this case. Because in the case of binary, we shouldn't be average the precision of 0 and 1.

But we can also have multiclass 2-classes case which should be computed as N-classes too

True, I'll remove the binary_multiclass entirely and treat it as multiclass.

I'll take it in new modifications

vfdev-5 · 2018-12-10T23:54:43Z

@anmolsjoshi I did yet not finished the code and tests on precision. Seems binary case of (N, L) is not well coded. If you would like to continue, feel free (wont touch it at least for 7-8h :)

anmolsjoshi · 2018-12-11T02:13:02Z

@vfdev-5 thanks, this looks fantastic! I'll work on it

anmolsjoshi · 2018-12-11T05:06:25Z

@vfdev-5 all tests are passing, let me know what you think!

Currently, there are warnings for Precision and Recall using sklearn for cases where number of predicted or actual positives is 0 for a specific class. ~~If we want to test average, it might be best to ignore these warnings.~~ I added some code to catch these warnings.

vfdev-5 · 2018-12-11T07:38:32Z

@anmolsjoshi thanks a lot ! That starts look much better than the previous version :)
I'll check out the new version and make some local tests (I had a doubt on some combinations)

vfdev-5 · 2018-12-11T13:31:59Z

@anmolsjoshi I updated the doc.

@alykhantejani @jasonkriss any comments, review please

jasonkriss

LGTM! My only minor nit is that maybe we shouldn't have the _ for the _Base* classes as we are importing them from other files.

anmolsjoshi · 2018-12-13T03:52:15Z

@jasonkriss thanks for the review! My thought about adding the _ for the _Base* classes was that we didn't want users using these base classes and just accessing the metrics only.

This is done over here in ignite as well.

@vfdev-5 thoughts?

vfdev-5 · 2018-12-13T16:11:37Z

@jasonkriss as @anmolsjoshi says the idea is to explicitly indicate that these classes are private (and abstract) and shouldn't be used by users. Another solution could be to put them into a _basesomething.py file and import from there as BaseSomething, but IMO having a private file in metrics looks strange...

I propose to keep it without modification

jasonkriss · 2018-12-13T18:14:28Z

@vfdev-5 @anmolsjoshi 👍 I'm fine keeping it as is.

vfdev-5 · 2018-12-13T22:14:09Z

@anmolsjoshi thank you for the PR

anmolsjoshi · 2018-12-13T22:38:44Z

@vfdev-5 thanks for all your help!

Is it ok to send in a PR for the thresholded_output_transform as discussed in this PR?

Once that is merged, we can incorporated all this into Multilabel metrics PR #333

Thoughts?

vfdev-5 · 2018-12-13T23:36:12Z

@anmolsjoshi let's work directly in #333

anmolsjoshi · 2018-12-13T23:38:04Z

@vfdev-5 sounds good

anmolsjoshi added 3 commits November 29, 2018 14:31

Updated Precision and Recall to fix binary calculation, ensure that c…

5b8c447

…alculation type remains the same during training, calculate binary precision using a threshold function vs categorical.

Fixed flake8 errors

c10b4e3

Updated handling of threshold_function.

16d0f2e

vfdev-5 requested a review from jasonkriss November 29, 2018 22:46

anmolsjoshi added 2 commits November 29, 2018 16:10

_classification_support.py - Refactored checks into ClassificationSup…

43e7a78

…port, precision/recall calculation into PrecisionRecallSupport. precision.py and recall.py - use PrecisionRecallSupport

Fixed flake8 errors, passing tests local

0730ea2

anmolsjoshi added 4 commits November 29, 2018 22:07

Refactored Precision and Recall functions into _BasePrecisionRecallSu…

fa20e5c

…pport (child of _BaseClassification). Refactored Accuracy into _BaseClassification.

Removed .idea folder.

d8143f9

Renamed precision.py to precision_recall.py which now contains _BaseP…

d43f40e

…recisionRecallSupport, Precision, Recall.

Added tests similar to Precision/Recall and vfdev5 modified tests for…

8e3eabb

… accuracy from pytorch#333

anmolsjoshi changed the title ~~Precisionbug~~ Bug Related to Calculation of Binary Metrics Nov 30, 2018

Updated Precision and Recall with separate update functions

661504f

Update tests and accuracy code, minor changes on precision

305dd8e

vfdev-5 reviewed Dec 10, 2018

View reviewed changes

[WIP] Updated code and tests on precision

aaa8831

anmolsjoshi added 2 commits December 10, 2018 20:43

Updated precision, recall and tests.

402ea02

Updated test_running_average with Accuracy.

2e9ab58

Added warning exception for sklearn.exceptions.UndefinedMetricWarning.

1376136

Updated docs

7d2d252

Removed identical method _check_type from _BasePrecisionRecall

6060f8e

jasonkriss approved these changes Dec 13, 2018

View reviewed changes

vfdev-5 merged commit 8558a8f into pytorch:master Dec 13, 2018

vfdev-5 added the 0.1.2 label Dec 14, 2018

anmolsjoshi deleted the precisionbug branch January 8, 2019 19:41

This pull request was closed.

Bug Related to Calculation of Binary Metrics #349

Bug Related to Calculation of Binary Metrics #349

Conversation

anmolsjoshi commented Nov 29, 2018 • edited Loading

anmolsjoshi commented Nov 29, 2018

vfdev-5 commented Nov 29, 2018

anmolsjoshi commented Nov 29, 2018 • edited Loading

vfdev-5 commented Nov 29, 2018 • edited Loading

vfdev-5 commented Nov 29, 2018

anmolsjoshi commented Nov 29, 2018

anmolsjoshi commented Nov 29, 2018

vfdev-5 commented Nov 29, 2018

anmolsjoshi commented Nov 29, 2018 • edited Loading

anmolsjoshi commented Nov 29, 2018

vfdev-5 commented Nov 29, 2018 • edited Loading

anmolsjoshi commented Nov 29, 2018 • edited Loading

vfdev-5 commented Nov 29, 2018

anmolsjoshi commented Nov 30, 2018 • edited Loading

vfdev-5 commented Nov 30, 2018 • edited Loading

anmolsjoshi commented Nov 30, 2018 • edited Loading

anmolsjoshi commented Nov 30, 2018 • edited Loading

anmolsjoshi commented Nov 30, 2018

vfdev-5 commented Dec 9, 2018

anmolsjoshi commented Dec 10, 2018

vfdev-5 commented Dec 10, 2018

anmolsjoshi commented Dec 10, 2018

vfdev-5 commented Dec 10, 2018

vfdev-5 commented Dec 10, 2018 • edited Loading

vfdev-5 Dec 10, 2018

Choose a reason for hiding this comment

anmolsjoshi Dec 10, 2018

Choose a reason for hiding this comment

vfdev-5 Dec 10, 2018

Choose a reason for hiding this comment

anmolsjoshi Dec 10, 2018

Choose a reason for hiding this comment

vfdev-5 Dec 10, 2018

Choose a reason for hiding this comment

vfdev-5 commented Dec 10, 2018

anmolsjoshi commented Dec 11, 2018

anmolsjoshi commented Dec 11, 2018 • edited Loading

vfdev-5 commented Dec 11, 2018

vfdev-5 commented Dec 11, 2018

jasonkriss left a comment • edited Loading

Choose a reason for hiding this comment

anmolsjoshi commented Dec 13, 2018 • edited Loading

vfdev-5 commented Dec 13, 2018 • edited Loading

jasonkriss commented Dec 13, 2018

vfdev-5 commented Dec 13, 2018

anmolsjoshi commented Dec 13, 2018

vfdev-5 commented Dec 13, 2018

anmolsjoshi commented Dec 13, 2018

anmolsjoshi commented Nov 29, 2018 •

edited

Loading

anmolsjoshi commented Nov 29, 2018 •

edited

Loading

vfdev-5 commented Nov 29, 2018 •

edited

Loading

anmolsjoshi commented Nov 29, 2018 •

edited

Loading

vfdev-5 commented Nov 29, 2018 •

edited

Loading

anmolsjoshi commented Nov 29, 2018 •

edited

Loading

anmolsjoshi commented Nov 30, 2018 •

edited

Loading

vfdev-5 commented Nov 30, 2018 •

edited

Loading

anmolsjoshi commented Nov 30, 2018 •

edited

Loading

anmolsjoshi commented Nov 30, 2018 •

edited

Loading

vfdev-5 commented Dec 10, 2018 •

edited

Loading

anmolsjoshi commented Dec 11, 2018 •

edited

Loading

jasonkriss left a comment •

edited

Loading

anmolsjoshi commented Dec 13, 2018 •

edited

Loading

vfdev-5 commented Dec 13, 2018 •

edited

Loading