VOCDetection Dataset md5 has changed, no longer will load correctly #2588

teddykoker · 2020-08-14T20:08:43Z

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

>>>from torchvision.datasets import VOCDetection
>>>ds = VOCDetection(".")
RuntimeError: File not found or corrupted.

Can be fixed by correcting the md5sum:

from torchvision.datasets import VOCDetection
from torchvision.datasets.voc import DATASET_YEAR_DICT
DATASET_YEAR_DICT['2012']['md5'] = "af08689459fe018b209e8d8fb5d4eb2e"
ds = VOCDetection(".", download=True)

Expected behavior

Dataset should download and load without error.

Environment

PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: None

OS: Mac OSX 10.15.5 (x86_64)
GCC version: Could not collect
Clang version: 11.0.3 (clang-1103.0.32.62)
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[conda] torch 1.6.0 pypi_0 pypi
[conda] torchvision 0.7.0 pypi_0 pypi

Additional context

The text was updated successfully, but these errors were encountered:

fmassa · 2020-08-20T11:26:44Z

Hi,

Thanks for opening this issue (and sending a PR)!

I tried to download the VOCDetection dataset now but I couldn't get it to download (it was timing out), so I couldn't verify that the hash has changed yet.

Also, I wonder what happens with users that have downloaded the old tar file, will changing the md5 suddently break for them?

teddykoker · 2020-08-21T02:02:24Z

Presumably, this change would render the old dataset "corrupt." I believe @vfdev-5 reached out to the organizers to see why the change was made.

Ideally I think the best coarse of action would be to restore the old tar in the organizers website, or perhaps self host the dataset. I think allowing datasets to change could set a bad precedent as people depend on them for reproducible benchmarks (even though it appears to be a small modification of one image in this case.)

Edit: the organizers website (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) seems to be completely down, not sure whats going on.

vfdev-5 · 2020-08-21T07:52:59Z

Also, I wonder what happens with users that have downloaded the old tar file, will changing the md5 suddently break for them?

@fmassa in my case I have unarchived dataset before data corruption (but no tar file) and I still could create an instance of VOCSegmentation dataset.

import torchvision

from torchvision.datasets import VOCSegmentation
from torchvision.datasets.voc import DATASET_YEAR_DICT

train_ds = VOCSegmentation("/home/data0/")
print(len(train_ds))
print(DATASET_YEAR_DICT["2012"])
> 1464
> {'url': 'http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar', 'filename': 'VOCtrainval_11-May-2012.tar', 'md5': '6cd6e144f989b92b3379bac3b3de84fd', 'base_dir': 'VOCdevkit/VOC2012'}

vfdev-5 · 2020-08-21T09:25:25Z

Looks like the correct tar with md5 6cd6e144f989b92b3379bac3b3de84fd is back !
@teddykoker could you please recheck from your side.
If everything is OK, I think we can close the issue and the PR. Anyway, @teddykoker thanks for pointing out that !

teddykoker · 2020-08-21T19:20:35Z

Just verified; the correct tar is back :). Closing the issue and PR. Thanks for the help anyway @vfdev-5 and @fmassa!

teddykoker mentioned this issue Aug 14, 2020

Fix PascalVOC 2012 md5 sum (#2588) #2589

Closed

fmassa added module: datasets needs discussion labels Aug 20, 2020

teddykoker closed this as completed Aug 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VOCDetection Dataset md5 has changed, no longer will load correctly #2588

VOCDetection Dataset md5 has changed, no longer will load correctly #2588

teddykoker commented Aug 14, 2020 •

edited

fmassa commented Aug 20, 2020

teddykoker commented Aug 21, 2020 •

edited

vfdev-5 commented Aug 21, 2020

vfdev-5 commented Aug 21, 2020 •

edited

teddykoker commented Aug 21, 2020

VOCDetection Dataset md5 has changed, no longer will load correctly #2588

VOCDetection Dataset md5 has changed, no longer will load correctly #2588

Comments

teddykoker commented Aug 14, 2020 • edited

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

fmassa commented Aug 20, 2020

teddykoker commented Aug 21, 2020 • edited

vfdev-5 commented Aug 21, 2020

vfdev-5 commented Aug 21, 2020 • edited

teddykoker commented Aug 21, 2020

teddykoker commented Aug 14, 2020 •

edited

teddykoker commented Aug 21, 2020 •

edited

vfdev-5 commented Aug 21, 2020 •

edited