-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
🚀 Feature
A mirror of the CelebA dataset on S3, to support reliable downloading of the dataset.
Motivation
Downloading the CelebA dataset through TorchVision is currently an issue, due to a common quota exceeded error associated with Google Drive, where the dataset owners currently host the dataset. To resolve a similar issue with MNIST, a mirror was set up on S3. As I type this, this mirror is the active source of the download for that dataset, as Yann LeCun's website is currently down.
Pitch
I would like the TorchVision developers to contact the CelebA authors, and request permission to host the dataset on S3. Reliable access to this dataset will increase its usage and popularity further, so I don't see why they would object. The dataset can then be hosted in the same S3 bucket as MNIST, and the source code for torchvision.datasets.CelebA can be modified to point to this mirror if the primary download source fails to deliver the dataset in pristine condition.
Alternatives
Alternative solutions include future prospective torchvision users waiting an indefinite amount of time to download the dataset on their local machines, thus slowing down developer productivity.
Additional context
I also opened a bug ticket for this issue.
cc @pmeier