Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

VOCDetection Dataset md5 has changed, no longer will load correctly #2588

Closed
teddykoker opened this issue Aug 14, 2020 · 5 comments
Closed

Comments

@teddykoker
Copy link

teddykoker commented Aug 14, 2020

馃悰 Bug

To Reproduce

Steps to reproduce the behavior:

>>>from torchvision.datasets import VOCDetection
>>>ds = VOCDetection(".")
RuntimeError: File not found or corrupted.

Can be fixed by correcting the md5sum:

from torchvision.datasets import VOCDetection
from torchvision.datasets.voc import DATASET_YEAR_DICT
DATASET_YEAR_DICT['2012']['md5'] = "af08689459fe018b209e8d8fb5d4eb2e"
ds = VOCDetection(".", download=True)

Expected behavior

Dataset should download and load without error.

Environment

PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: None

OS: Mac OSX 10.15.5 (x86_64)
GCC version: Could not collect
Clang version: 11.0.3 (clang-1103.0.32.62)
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[conda] torch 1.6.0 pypi_0 pypi
[conda] torchvision 0.7.0 pypi_0 pypi

Additional context

@fmassa
Copy link
Member

fmassa commented Aug 20, 2020

Hi,

Thanks for opening this issue (and sending a PR)!

I tried to download the VOCDetection dataset now but I couldn't get it to download (it was timing out), so I couldn't verify that the hash has changed yet.

Also, I wonder what happens with users that have downloaded the old tar file, will changing the md5 suddently break for them?

@teddykoker
Copy link
Author

teddykoker commented Aug 21, 2020

Presumably, this change would render the old dataset "corrupt." I believe @vfdev-5 reached out to the organizers to see why the change was made.

Ideally I think the best coarse of action would be to restore the old tar in the organizers website, or perhaps self host the dataset. I think allowing datasets to change could set a bad precedent as people depend on them for reproducible benchmarks (even though it appears to be a small modification of one image in this case.)

Edit: the organizers website (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) seems to be completely down, not sure whats going on.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 21, 2020

Also, I wonder what happens with users that have downloaded the old tar file, will changing the md5 suddently break for them?

@fmassa in my case I have unarchived dataset before data corruption (but no tar file) and I still could create an instance of VOCSegmentation dataset.

import torchvision

from torchvision.datasets import VOCSegmentation
from torchvision.datasets.voc import DATASET_YEAR_DICT

train_ds = VOCSegmentation("/home/data0/")
print(len(train_ds))
print(DATASET_YEAR_DICT["2012"])
> 1464
> {'url': 'http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar', 'filename': 'VOCtrainval_11-May-2012.tar', 'md5': '6cd6e144f989b92b3379bac3b3de84fd', 'base_dir': 'VOCdevkit/VOC2012'}

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 21, 2020

Looks like the correct tar with md5 6cd6e144f989b92b3379bac3b3de84fd is back !
@teddykoker could you please recheck from your side.
If everything is OK, I think we can close the issue and the PR. Anyway, @teddykoker thanks for pointing out that !

@teddykoker
Copy link
Author

Just verified; the correct tar is back :). Closing the issue and PR. Thanks for the help anyway @vfdev-5 and @fmassa!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants