Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Related to Calculation of Binary Metrics #349

Merged
merged 32 commits into from
Dec 13, 2018

Conversation

anmolsjoshi
Copy link
Contributor

@anmolsjoshi anmolsjoshi commented Nov 29, 2018

Fixes #348

Description:
Bug in Binary Precision/Recall maps binary cases into 2 classes and then averages the metrics of both. This is an incorrect method of calculating precision and recall for Precision and Recall. It should be treated as a one person class only.

I have included the following in the code:

  • Created _check_shape to process and check the shapes of y, y_pred
  • Created _check_type to determine the type of problem - binary or multiclass - based on y and y_pred, also raises error if the problem type changes during training. Type is decided on first update, and then checked for each subsequent update.
  • Calculates binary precision using threshold function, torch.round default
  • Includes check of binary output eg: torch.equal(y, y ** 2)
  • Only inputs torch.round as default is problem is binary
  • Appropriate checks for threshold_function
  • Added better tests - improved binary tests, incorrect threshold function, incorrect y, changing type in between updates.

Check list:

  • New tests are added (if a new feature is modified)
  • New doc strings: text and/or example code are in RST format
  • Documentation is updated (if required)

…alculation type remains the same during training, calculate binary precision using a threshold function vs categorical.
@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 please see the following PR. If we stick with this method, it'll be easier to incorporate multilabel as well.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 29, 2018

@anmolsjoshi thanks for the PR !
Refactoring huge update into _check_shape, _check_type definitly makes sense. I have an impression that _check_shape, _check_type are the same for Precision and Recall ? Maybe even Accuracy could benefit from one of them ?

@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Nov 29, 2018

@vfdev-5 no problem!

Refactoring huge update into _check_shape, _check_type definitly makes sense. I have an impression that _check_shape, _check_type are the same for Precision and Recall ?

You are correct, they are the same for Precision and Recall. I took a note from how sklearn calculates precision and recall.

Maybe even Accuracy could benefit from one of them ?

Definitely! I can work on that later tonight.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 29, 2018

Could you please post a reference on sklearn here ?

Maybe we could create a common class for Precision, Recall in order to not to copy these methods ?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 29, 2018

Tell me when you are done with the code, I'll update the tests in the same way as for Accuracy.

@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 not sure what went wrong with the pytorch-nightly-cpu 2.7

@anmolsjoshi
Copy link
Contributor Author

Here is the sklearn reference

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 29, 2018

@anmolsjoshi seems like that pytorch nightly is broken for 2.7

@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Nov 29, 2018

Maybe we could create a common class for Precision, Recall in order to not to copy these methods ?

That's an interesting idea. So a base class with a common _check_type, _check_shape? looks like we can make compute the same too - change actual and all_positives to denominator.

@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 I believe that I'm done with all the tests and code. Feel free to incorporate this into Accuracy.

@anmolsjoshi seems like that pytorch nightly is broken for 2.7

what are next steps?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 29, 2018

Let's ignore this failing test for a moment, if this is pytorch problem, it will be solved soon. We can ask to be sure.

Next step is to refactor classes to avoid code copying.

I'll improve tests.

@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Nov 29, 2018

@vfdev-5 ok I'll start refactoring the classes. Did you want me to work on Accuracy as well? That's entirely fine, just don't want repeated efforts from both of us

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 29, 2018

Did you want me to work on Accuracy as well?

@anmolsjoshi yes please :)

…port, precision/recall calculation into PrecisionRecallSupport.

precision.py and recall.py - use PrecisionRecallSupport
@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Nov 30, 2018

@vfdev-5 I refactored the code, let me know if this works for you!

Still working on Accuracy.

Do I need to write tests for the refactored code in _classification_support.py? All the lines of code are run using test_precision and test_recall. Nevermind, just checked the code coverage of _classification_support.py in one of the passing tests.

_classification_support.py

  • ClassificationSupport - this will be used in Accuracy, Precision, Recall - contains type check and shape check
  • PrecisionRecallSupport - this will be used to calculate Precision/Recall - contains common update, reset, compute

precision.py

  • Precision - uses PrecisionRecallSupport as a base class with self._precision_vs_recall True

recall.py

  • Recall - uses PrecisionRecallSuport as a base class with self._precision_vs_recall False.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 30, 2018

@anmolsjoshi I think we can create these classes directly in accuracy.py, precision.py files. No need of _classification_support.py.
ClassificationSupport -> ~ _BaseClassification, leave it abstract (do not define update, we should not instantiate this class).
PrecisionRecallSupport -> ~ _(Base)PrecisionRecall(Support)
I'm not fan of _precision_vs_recall as true/false. Maybe better a function to override ?
Initialization may be specific to Accurary, Precision, Recall, IMO should not be done in ClassificationSupport ?

@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Nov 30, 2018

@vfdev-5

ClassificationSupport -> ~ _BaseClassification, leave it abstract (do not define update, we should not instantiate this class).

Agree, have updated. I also kept _BasePrecisionRecallSupport as abstract where the update function is missing.

I'm not fan of _precision_vs_recall as true/false. Maybe better a function to override ?

Could you explain that further?

The calculation is entirely the same, the only difference is what type of positives to calculate i.e. positives of y_pred or y. precision_vs_recall might not be the cleanest way of doing this.

For now, I created two functions in _BasePrecisionRecallSupport called _calculate_correct and _sum_positives. Precision and Recall use both in the update function as shown below:

class Precision(_BasePrecisionRecallSupport):

...

    def update(self, output):
        correct, y_pred, y = self._calculate_correct(output)
        all_positives = y_pred.sum(dim=0)
        self._sum_positives(correct, all_positives)

This way there is no clunky if else statement like before.

Initialization may be specific to Accurary, Precision, Recall, IMO should not be done in ClassificationSupport

Referring to this comment, it might be best to stick with threshold_function for binary calculation. I think initialization of _BaseClassification should stay the same, only thing that is added for Precision and Recall is the term average, which is introduced in _BasePrecisionRecallSupport (child of _BaseClassification).

think we can create these classes directly in accuracy.py, precision.py files. No need of _classification_support.py.

I'll move _BaseClassification to accuracy.py, and _BasePrecisionRecallSupport to precision.py, and will call _BasePrecisionRecallSupport in recall.py.

Suggestion - let's create one file called precision_recall.py which includes _BasePrecisionRecallSupport, Precision and Recall, and change the init.py appropriately. - My latest commit reflects this, we can obviously revert back to previous setups.

Thoughts?

@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Nov 30, 2018

@vfdev-5 Here is a summary of the most up to date commit:

  • accuracy.py - contains _BaseClassification (still abstract) with improved binary shape check (last few commits had incorrect logic)

  • recall.py - deleted

  • precision.py - renamed to precision_recall.py

  • precision_recall.py - contains _BasePrecisionRecallSupport (child of _BaseClassification and still abstract), Precision and Recall. Precision and Recall have distinct update functions.

  • init.py - changed for updated Precision and Recall calls

  • tests_accuracy.py - Contains your modified tests from Multilabel PR and include tests similar to test_precision/recall testing threshold_function, _check_type.

Please note that Binary Accuracy is now being calculated using torch.round, I think it is the preferred method due to inconsistent results in between PyTorch versions. Discussed in this comment.

@anmolsjoshi anmolsjoshi changed the title Precisionbug Bug Related to Calculation of Binary Metrics Nov 30, 2018
@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 all tests are now passing. I updated precision and recall so that they have there own unique updates. Let me know if you have any other issues!

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 9, 2018

What are your thoughts on adding thresholded_output_transform to ignite._utils.py? So that users can access it.

For instance I'm not sure that this is necessary... Let's see

@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 the original binary_accuracy had torch.round implemented within update, see here.

Is there a chance we might inconvenience people by having them use output_transform? I guess if the plan is to cut a release after this is merged, we can point this out.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 10, 2018

@anmolsjoshi this is a good point ! Actually, just remarked that in the docs it was directly stated that y_pred should be between 0 and 1. I think we lost this when merged binary and categorical accuracies.

@anmolsjoshi
Copy link
Contributor Author

Yes, that's probably my bad when I sent in PR #275, I missed including that docstring.

How do you think we should proceed? Still include binary to categorical mapping? Or introduce this change in the release notes?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 10, 2018

@anmolsjoshi no problems with that. We'll take care of it this time :) I'm readjusting the tests now.

How do you think we should proceed? Still include binary to categorical mapping? Or introduce this change in the release notes?

Thanks for asking! So, as we discussed #348 in we need to fix the bug and cut the release 0.1.2 that should be compatible with 0.1.1. Thus, we should include torch.round in binary case and update the docs saying that y_pred is between 0 and 1 (in binary case).

In another PR we can remove torch.round and propose user to apply binarization inside output_transform.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 10, 2018

I'll commit some modifications on Accuracy and let you review it.

y_pred, y = output

if y.ndimension() + 1 == y_pred.ndimension():
if y_pred.shape[1] == 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmolsjoshi could you please recall me why you separate this case and raise a warning ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vfdev-5 if num_classes=2, it is a binary case that is being fed as a categorical case. I think it'll be helpful to warn the user that only precision for the positive class if calculated in this case. Because in the case of binary, we shouldn't be average the precision of 0 and 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can also have multiclass 2-classes case which should be computed as N-classes too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll remove the binary_multiclass entirely and treat it as multiclass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take it in new modifications

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 10, 2018

@anmolsjoshi I did yet not finished the code and tests on precision. Seems binary case of (N, L) is not well coded. If you would like to continue, feel free (wont touch it at least for 7-8h :)

@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 thanks, this looks fantastic! I'll work on it

@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Dec 11, 2018

@vfdev-5 all tests are passing, let me know what you think!

Currently, there are warnings for Precision and Recall using sklearn for cases where number of predicted or actual positives is 0 for a specific class. If we want to test average, it might be best to ignore these warnings. I added some code to catch these warnings.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 11, 2018

@anmolsjoshi thanks a lot ! That starts look much better than the previous version :)
I'll check out the new version and make some local tests (I had a doubt on some combinations)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 11, 2018

@anmolsjoshi I updated the doc.

@alykhantejani @jasonkriss any comments, review please

Copy link
Contributor

@jasonkriss jasonkriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only minor nit is that maybe we shouldn't have the _ for the _Base* classes as we are importing them from other files.

@anmolsjoshi
Copy link
Contributor Author

anmolsjoshi commented Dec 13, 2018

@jasonkriss thanks for the review! My thought about adding the _ for the _Base* classes was that we didn't want users using these base classes and just accessing the metrics only.

This is done over here in ignite as well.

@vfdev-5 thoughts?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 13, 2018

@jasonkriss as @anmolsjoshi says the idea is to explicitly indicate that these classes are private (and abstract) and shouldn't be used by users. Another solution could be to put them into a _basesomething.py file and import from there as BaseSomething, but IMO having a private file in metrics looks strange...

I propose to keep it without modification

@jasonkriss
Copy link
Contributor

@vfdev-5 @anmolsjoshi 👍 I'm fine keeping it as is.

@vfdev-5 vfdev-5 merged commit 8558a8f into pytorch:master Dec 13, 2018
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 13, 2018

@anmolsjoshi thank you for the PR

@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 thanks for all your help!

Is it ok to send in a PR for the thresholded_output_transform as discussed in this PR?

Once that is merged, we can incorporated all this into Multilabel metrics PR #333

Thoughts?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 13, 2018

@anmolsjoshi let's work directly in #333

@anmolsjoshi
Copy link
Contributor Author

@vfdev-5 sounds good

@vfdev-5 vfdev-5 added the 0.1.2 label Dec 14, 2018
@anmolsjoshi anmolsjoshi deleted the precisionbug branch January 8, 2019 19:41
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants