-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Fixes EMNIST split and label issues #2673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2673 +/- ##
==========================================
- Coverage 73.38% 72.40% -0.98%
==========================================
Files 99 95 -4
Lines 8791 8247 -544
Branches 1389 1310 -79
==========================================
- Hits 6451 5971 -480
+ Misses 1915 1860 -55
+ Partials 425 416 -9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I've one comment, let me know what you think
torchvision/datasets/mnist.py
Outdated
if self.target_transform is None and self.split == 'letters': | ||
self.target_transform = lambda x: x - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me I think it might be preferable to change classes_split_dict
case for letters
, if it's not compatible with the current annotations.
I think that implicitly changing in the target_transform
the labels might be misleading to users, and would make previously trained models to not be valid anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think of that initially while fixing this, but the issue is the class to letter mapping is still offset by 1. The only other solution would be to add a dummy letter to position 0 to fix the class offset. For ex: 1 is A while loading the dataset, but classes_split_dict would return b. I added this fix since transforms is a cleaner approach, but adding a dummy letter to the list is preferred, I'm happy to change it to that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a dummy letter to position 0
I think this would be the preferred approach IMO. Maybe add something like __unused__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using N/A
is fine with me
# Merged Classes assumes Same structure for both uppercase and lowercase version | ||
_merged_classes = set(['C', 'I', 'J', 'K', 'L', 'M', 'O', 'P', 'S', 'U', 'V', 'W', 'X', 'Y', 'Z']) | ||
_all_classes = set(list(string.digits + string.ascii_letters)) | ||
_merged_classes = {'c', 'i', 'j', 'k', 'l', 'm', 'o', 'p', 's', 'u', 'v', 'w', 'x', 'y', 'z'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One remark concerning cosmetics, not actual functionality. The EMNIST paper describes removing the capital letters. We can actually write the classes in a way that achieves this but then the labels will look weird; something like: A, B, c, D, ...
. So I think @vballoli's solution is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is fine as is, but thanks for pointing it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add float support to ColorJitter * Fix byclass EMNIST * Fix bymerge, balance, letters EMNIST * Fix whitespace indent * Revert unrelated file changes * Revert unrelated file changes * Removing unnecessary type conversions. * Removing the transform and adding dummy class instead. Co-authored-by: Vasileios Vryniotis <vvryniotis@fb.com>
* Add float support to ColorJitter * Fix byclass EMNIST * Fix bymerge, balance, letters EMNIST * Fix whitespace indent * Revert unrelated file changes * Revert unrelated file changes * Removing unnecessary type conversions. * Removing the transform and adding dummy class instead. Co-authored-by: Vasileios Vryniotis <vvryniotis@fb.com>
Fixes #2630