Skip to content

A mirror of the CelebA dataset on S3, similar to the one for MNIST #3709

@lucaslingle

Description

@lucaslingle

🚀 Feature

A mirror of the CelebA dataset on S3, to support reliable downloading of the dataset.

Motivation

Downloading the CelebA dataset through TorchVision is currently an issue, due to a common quota exceeded error associated with Google Drive, where the dataset owners currently host the dataset. To resolve a similar issue with MNIST, a mirror was set up on S3. As I type this, this mirror is the active source of the download for that dataset, as Yann LeCun's website is currently down.

Pitch

I would like the TorchVision developers to contact the CelebA authors, and request permission to host the dataset on S3. Reliable access to this dataset will increase its usage and popularity further, so I don't see why they would object. The dataset can then be hosted in the same S3 bucket as MNIST, and the source code for torchvision.datasets.CelebA can be modified to point to this mirror if the primary download source fails to deliver the dataset in pristine condition.

Alternatives

Alternative solutions include future prospective torchvision users waiting an indefinite amount of time to download the dataset on their local machines, thus slowing down developer productivity.

Additional context

I also opened a bug ticket for this issue.

cc @pmeier

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions