Skip to content

MNIST dataset no longer downloads (repeat of old fixed March 2020 problem) #3497

@kesterlester

Description

@kesterlester

🐛 Bug

This seems to be a recurrence of an issue spotted in #1938 which was fixed back in March 2020 and then closed, but has now reappeared. There are a number of people in #1938 reporting that the issue appeared somewhere in the last 12 hours.

To Reproduce

Steps to reproduce the behavior:

  1. Open a fresh google colab
  2. Try something like the following:
import torchvision
from torchvision import datasets, transforms
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])
trainset = datasets.MNIST('PATH_TO_STORE_TRAINSET', download=True, train=True, transform=transform)

Results in:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to PATH_TO_STORE_TRAINSET/MNIST/raw/train-images-idx3-ubyte.gz
0/? [00:00<?, ?it/s]
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-16-492e382ce34e> in <module>()
      4                               transforms.Normalize((0.5,), (0.5,)),
      5                               ])
----> 6 trainset = datasets.MNIST('PATH_TO_STORE_TRAINSET', download=True, train=True, transform=transform)

11 frames
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py in __init__(self, root, train, transform, target_transform, download)
     77 
     78         if download:
---> 79             self.download()
     80 
     81         if not self._check_exists():

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py in download(self)
    144         for url, md5 in self.resources:
    145             filename = url.rpartition('/')[2]
--> 146             download_and_extract_archive(url, download_root=self.raw_folder, filename=filename, md5=md5)
    147 
    148         # process and save as torch files

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/utils.py in download_and_extract_archive(url, download_root, extract_root, filename, md5, remove_finished)
    254         filename = os.path.basename(url)
    255 
--> 256     download_url(url, download_root, filename, md5)
    257 
    258     archive = os.path.join(download_root, filename)

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/utils.py in download_url(url, root, filename, md5)
     82                 )
     83             else:
---> 84                 raise e
     85         # check integrity of downloaded file
     86         if not check_integrity(fpath, md5):

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/utils.py in download_url(url, root, filename, md5)
     70             urllib.request.urlretrieve(
     71                 url, fpath,
---> 72                 reporthook=gen_bar_updater()
     73             )
     74         except (urllib.error.URLError, IOError) as e:  # type: ignore[attr-defined]

/usr/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    245     url_type, path = splittype(url)
    246 
--> 247     with contextlib.closing(urlopen(url, data)) as fp:
    248         headers = fp.info()
    249 

/usr/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/usr/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

/usr/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

/usr/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

Expected behavior

The dataset should just be loaded (as indeed it was this morning).

Environment

Collecting environment information...
PyTorch version: 1.7.1+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.12.0

Python version: 3.7 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 11.0.221
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.7.1+cu101
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.3.1
[pip3] torchvision==0.8.2+cu101
[conda] Could not collect```



cc @pmeier

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions