Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for the QMNIST dataset #995

Merged
merged 9 commits into from Jun 12, 2019
Merged

Added support for the QMNIST dataset #995

merged 9 commits into from Jun 12, 2019

Conversation

leonbottou
Copy link
Contributor

QMNIST https://github.com/facebookresearch/qmnist

Patch

The patch has two commits.

  • First commit adds a function read_sn3_pascalvincent_tensor that reads all variants of the mnist file format. Can read compressed files. Can accept a file object instead of a filename. Redefine functions read_label_file and read_image_file using this generic function.
  • The second commit adds class QMNIST. Usage is the same as the qmnist.py loader provided with the QMNIST distribution. An interesting aspect is the argument 'what' that specifies which subset to access: 'train', 'test', 'test10k', 'test60k', 'nist'. When argument what is absent, the 'train' argument is consulted.

Remarks.

The way I handle compressed files in read_sn3_pascalvincent_tensor does not square well with the new utility function download_and_extract_archive. My take is that this function should only deal with archives (zip, tar, tgz) but not compressed files (gz, xz). In the end, the QMNIST class uses download_url instead.

Instead the QMNIST class uses download_url. My take is that the

@leonbottou
Copy link
Contributor Author

Note: With python-2.7, there is no lzma module by default.
One has to use https://anaconda.org/conda-forge/backports.lzma.
I do not know how to get Travis to do this for me...

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR Leon!

I've made a few minor comments. I believe we can fix CI by moving the import of lzma inside the function call.

Let me know if you want me to address the comments that I've made.

torchvision/datasets/mnist.py Outdated Show resolved Hide resolved
torchvision/datasets/mnist.py Outdated Show resolved Hide resolved
torchvision/datasets/mnist.py Show resolved Hide resolved
torchvision/datasets/mnist.py Outdated Show resolved Hide resolved
torchvision/datasets/mnist.py Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Jun 10, 2019

Codecov Report

Merging #995 into master will decrease coverage by 0.14%.
The diff coverage is 38.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #995      +/-   ##
==========================================
- Coverage   62.69%   62.54%   -0.15%     
==========================================
  Files          65       65              
  Lines        5080     5148      +68     
  Branches      761      772      +11     
==========================================
+ Hits         3185     3220      +35     
- Misses       1678     1711      +33     
  Partials      217      217
Impacted Files Coverage Δ
torchvision/datasets/mnist.py 50.71% <38.63%> (-12.53%) ⬇️
torchvision/datasets/utils.py 58.51% <0%> (-2.97%) ⬇️
torchvision/models/mobilenet.py 89.7% <0%> (-2.61%) ⬇️
torchvision/datasets/celeba.py 17.24% <0%> (+1.14%) ⬆️
torchvision/models/densenet.py 86.79% <0%> (+1.56%) ⬆️
torchvision/models/segmentation/segmentation.py 67.44% <0%> (+25.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 793613b...5c70191. Read the comment docs.

@fmassa fmassa merged commit ac2e995 into pytorch:master Jun 12, 2019
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR Leon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants