Add test for CIFAR10(0) #1010

pmeier · 2019-06-12T08:16:33Z

This adds test for the CIFAR10 and CIFAR100 datasets as mentioned in #1006.

fmassa

Looks great, thanks a lot for the awesome PR!

Just waiting for CI to finish and then this is good to merge

pmeier · 2019-06-12T08:36:32Z

Seems like the pickle.load() within the dataset is not correct for python2. I don't have a environment set up to test python2. @fmassa Could you check this?

fmassa · 2019-06-12T08:40:57Z

@pmeier the pickle file seems to be Python3-only.

I see two options:

re-save and re-upload the pickle with pickle protocol 2 https://docs.python.org/3/library/pickle.html#data-stream-format

add the logic to generate dummy files in the test code, like in

vision/test/test_datasets.py

Lines 27 to 59 in ac2e995

    
           @contextlib.contextmanager 
        
           def get_mnist_data(num_images, cls_name, **kwargs): 
        
               def _encode(v): 
        
                   return torch.tensor(v, dtype=torch.int32).numpy().tobytes()[::-1] 
        
               def _make_image_file(filename, num_images): 
        
                   img = torch.randint(0, 255, size=(28 * 28 * num_images,), dtype=torch.uint8) 
        
                   with open(filename, "wb") as f: 
        
                       f.write(_encode(2051))  # magic header 
        
                       f.write(_encode(num_images)) 
        
                       f.write(_encode(28)) 
        
                       f.write(_encode(28)) 
        
                       f.write(img.numpy().tobytes()) 
        
               def _make_label_file(filename, num_images): 
        
                   labels = torch.randint(0, 10, size=(num_images,), dtype=torch.uint8) 
        
                   with open(filename, "wb") as f: 
        
                       f.write(_encode(2049))  # magic header 
        
                       f.write(_encode(num_images)) 
        
                       f.write(labels.numpy().tobytes()) 
        
               tmp_dir = tempfile.mkdtemp(**kwargs) 
        
               raw_dir = os.path.join(tmp_dir, cls_name, "raw") 
        
               os.makedirs(raw_dir) 
        
               _make_image_file(os.path.join(raw_dir, "train-images-idx3-ubyte"), num_images) 
        
               _make_label_file(os.path.join(raw_dir, "train-labels-idx1-ubyte"), num_images) 
        
               _make_image_file(os.path.join(raw_dir, "t10k-images-idx3-ubyte"), num_images) 
        
               _make_label_file(os.path.join(raw_dir, "t10k-labels-idx1-ubyte"), num_images) 
        
               try: 
        
                   yield tmp_dir 
        
               finally: 
        
                   shutil.rmtree(tmp_dir)

Given that you have generated the dummy file yourself, I think it might make sense to go for option 2, and plus the data generation logic stays in the code as well.

Thoughts?

codecov-io · 2019-06-12T08:52:54Z

Codecov Report

Merging #1010 into master will increase coverage by 0.12%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1010      +/-   ##
==========================================
+ Coverage   63.11%   63.24%   +0.12%     
==========================================
  Files          65       65              
  Lines        5073     5148      +75     
  Branches      757      772      +15     
==========================================
+ Hits         3202     3256      +54     
- Misses       1656     1669      +13     
- Partials      215      223       +8

Impacted Files	Coverage Δ
torchvision/datasets/mnist.py	`50.71% <0%> (-12.53%)`	⬇️
torchvision/transforms/transforms.py	`81.89% <0%> (-0.65%)`	⬇️
torchvision/datasets/cifar.py	`78.65% <0%> (+40.44%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f64dd9...1216308. Read the comment docs.

pmeier · 2019-06-12T09:51:03Z

Should be fixed now. As a general thought: Do we want to do the fake data generation on-the-fly for all datasets? If yes, I think we should move the generation to a separate module to avoid cluttering the actual tests.

fmassa

Thanks!

fmassa · 2019-06-12T11:35:43Z

Do we want to do the fake data generation on-the-fly for all datasets? If yes, I think we should move the generation to a separate module to avoid cluttering the actual tests.

Yes, I think we should be doing it on-the-fly, and I agree that they should eventually be moved to a separate file.

added test for CIFAR10(0)

c245504

fmassa approved these changes Jun 12, 2019

View reviewed changes

create CIFAR data on-the-fly

a443545

Philip Meier added 3 commits June 12, 2019 11:55

flake8

a47da93

fixed typo

061da72

removed falsely added import

1216308

fmassa approved these changes Jun 12, 2019

View reviewed changes

fmassa merged commit 67bfb96 into pytorch:master Jun 12, 2019

This was referenced Jun 13, 2019

Create imagenet fakedata on-the-fly #1012

Merged

Move fakedata generation to separate module #1014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add test for CIFAR10(0) #1010

Add test for CIFAR10(0) #1010

Uh oh!

pmeier commented Jun 12, 2019

Uh oh!

fmassa left a comment

Uh oh!

pmeier commented Jun 12, 2019

Uh oh!

fmassa commented Jun 12, 2019

Uh oh!

codecov-io commented Jun 12, 2019 •

edited

Loading

Uh oh!

pmeier commented Jun 12, 2019

Uh oh!

fmassa left a comment

Uh oh!

fmassa commented Jun 12, 2019

Uh oh!

Uh oh!

Add test for CIFAR10(0) #1010

Add test for CIFAR10(0) #1010

Uh oh!

Conversation

pmeier commented Jun 12, 2019

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier commented Jun 12, 2019

Uh oh!

fmassa commented Jun 12, 2019

Uh oh!

codecov-io commented Jun 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pmeier commented Jun 12, 2019

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

fmassa commented Jun 12, 2019

Uh oh!

Uh oh!

codecov-io commented Jun 12, 2019 •

edited

Loading