Fixes EMNIST split and label issues #2673

vballoli · 2020-09-13T20:07:02Z

codecov · 2020-09-13T20:42:21Z

Codecov Report

Merging #2673 into master will decrease coverage by 0.97%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master    #2673      +/-   ##
==========================================
- Coverage   73.38%   72.40%   -0.98%     
==========================================
  Files          99       95       -4     
  Lines        8791     8247     -544     
  Branches     1389     1310      -79     
==========================================
- Hits         6451     5971     -480     
+ Misses       1915     1860      -55     
+ Partials      425      416       -9

Impacted Files	Coverage Δ
torchvision/datasets/mnist.py	`53.84% <33.33%> (-1.41%)`	⬇️
torchvision/ops/feature_pyramid_network.py	`75.82% <0.00%> (-15.39%)`	⬇️
torchvision/io/video.py	`69.82% <0.00%> (-10.66%)`	⬇️
torchvision/io/image.py	`71.73% <0.00%> (-10.52%)`	⬇️
torchvision/transforms/functional_tensor.py	`66.84% <0.00%> (-7.31%)`	⬇️
torchvision/transforms/transforms.py	`76.53% <0.00%> (-3.99%)`	⬇️
torchvision/ops/roi_align.py	`68.96% <0.00%> (-2.01%)`	⬇️
torchvision/ops/roi_pool.py	`73.07% <0.00%> (-1.93%)`	⬇️
torchvision/ops/ps_roi_pool.py	`73.07% <0.00%> (-1.93%)`	⬇️
torchvision/ops/ps_roi_align.py	`71.42% <0.00%> (-1.91%)`	⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3756b60...9c6a2d8. Read the comment docs.

fmassa

Thanks for the PR!

I've one comment, let me know what you think

fmassa · 2020-09-14T12:29:25Z

torchvision/datasets/mnist.py

+        if self.target_transform is None and self.split == 'letters':
+            self.target_transform = lambda x: x - 1


To me I think it might be preferable to change classes_split_dict case for letters, if it's not compatible with the current annotations.

I think that implicitly changing in the target_transform the labels might be misleading to users, and would make previously trained models to not be valid anymore.

I did think of that initially while fixing this, but the issue is the class to letter mapping is still offset by 1. The only other solution would be to add a dummy letter to position 0 to fix the class offset. For ex: 1 is A while loading the dataset, but classes_split_dict would return b. I added this fix since transforms is a cleaner approach, but adding a dummy letter to the list is preferred, I'm happy to change it to that too.

add a dummy letter to position 0

I think this would be the preferred approach IMO. Maybe add something like __unused__ ?

I picked up @vballoli's great work to make the final changes and merge.

@fmassa I used N/A instead because COCO uses this value for the unused categories.

Using N/A is fine with me

datumbox · 2020-10-23T11:41:13Z

torchvision/datasets/mnist.py

    # Merged Classes assumes Same structure for both uppercase and lowercase version
-    _merged_classes = set(['C', 'I', 'J', 'K', 'L', 'M', 'O', 'P', 'S', 'U', 'V', 'W', 'X', 'Y', 'Z'])
-    _all_classes = set(list(string.digits + string.ascii_letters))
+    _merged_classes = {'c', 'i', 'j', 'k', 'l', 'm', 'o', 'p', 's', 'u', 'v', 'w', 'x', 'y', 'z'}


One remark concerning cosmetics, not actual functionality. The EMNIST paper describes removing the capital letters. We can actually write the classes in a way that achieves this but then the labels will look weird; something like: A, B, c, D, .... So I think @vballoli's solution is fine.

Yeah, I think this is fine as is, but thanks for pointing it out

fmassa

Thanks @vballoli and @datumbox !

* Add float support to ColorJitter * Fix byclass EMNIST * Fix bymerge, balance, letters EMNIST * Fix whitespace indent * Revert unrelated file changes * Revert unrelated file changes * Removing unnecessary type conversions. * Removing the transform and adding dummy class instead. Co-authored-by: Vasileios Vryniotis <vvryniotis@fb.com>

vballoli added 6 commits September 14, 2020 00:10

Add float support to ColorJitter

126c594

Fix byclass EMNIST

aabc455

Fix bymerge, balance, letters EMNIST

945e056

Fix whitespace indent

87b2548

Revert unrelated file changes

4d8762b

Revert unrelated file changes

8bf3c05

fmassa reviewed Sep 14, 2020

View reviewed changes

fmassa assigned datumbox Oct 21, 2020

datumbox added 3 commits October 23, 2020 11:44

Removing unnecessary type conversions.

8cb4531

Removing the transform and adding dummy class instead.

fa7dd62

Merge branch 'master' into bugfix/emnist

9c6a2d8

datumbox reviewed Oct 23, 2020

View reviewed changes

datumbox requested a review from fmassa October 23, 2020 11:42

fmassa approved these changes Oct 23, 2020

View reviewed changes

fmassa merged commit cffac64 into pytorch:master Oct 23, 2020

datumbox mentioned this pull request Oct 19, 2021

replace np.frombuffer with torch.frombuffer in MNIST prototype #4651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes EMNIST split and label issues #2673

Fixes EMNIST split and label issues #2673

Uh oh!

vballoli commented Sep 13, 2020

Uh oh!

codecov bot commented Sep 13, 2020 •

edited

Loading

Uh oh!

fmassa left a comment

Uh oh!

fmassa Sep 14, 2020

Uh oh!

vballoli Sep 15, 2020

Uh oh!

fmassa Sep 22, 2020

Uh oh!

datumbox Oct 23, 2020

Uh oh!

fmassa Oct 23, 2020

Uh oh!

datumbox Oct 23, 2020

Uh oh!

fmassa Oct 23, 2020

Uh oh!

fmassa left a comment

Uh oh!

Uh oh!

		if self.target_transform is None and self.split == 'letters':
		self.target_transform = lambda x: x - 1

Fixes EMNIST split and label issues #2673

Fixes EMNIST split and label issues #2673

Uh oh!

Conversation

vballoli commented Sep 13, 2020

Uh oh!

codecov bot commented Sep 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

fmassa Sep 14, 2020

Choose a reason for hiding this comment

Uh oh!

vballoli Sep 15, 2020

Choose a reason for hiding this comment

Uh oh!

fmassa Sep 22, 2020

Choose a reason for hiding this comment

Uh oh!

datumbox Oct 23, 2020

Choose a reason for hiding this comment

Uh oh!

fmassa Oct 23, 2020

Choose a reason for hiding this comment

Uh oh!

datumbox Oct 23, 2020

Choose a reason for hiding this comment

Uh oh!

fmassa Oct 23, 2020

Choose a reason for hiding this comment

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Sep 13, 2020 •

edited

Loading