Unable to load CelebA dataset: "File is not zip file" error.

## 🐛 Bug

I am unable to download the CelebFace Attributes (CelebA) dataset using TorchVision. The error I get when attempting to download is "File is not a zip file." 

I have encountered this error every time I've tried to download the file, over the course of the last 24 hours. 

## To Reproduce

Steps to reproduce the behavior:

1. Import torchvision
2. Call torchvision.datasets.CelebA(root='somewhere', split='train', download=True)

A code sample is provided below:

```
import torch as tc
import torchvision as tv

def get_dataloaders(batch_size):
    transform = tv.transforms.Compose([
        tv.transforms.CenterCrop(108),
        tv.transforms.Resize(64),
        tv.transforms.ToTensor(),
        tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # scales pixels to [-1, 1].
    ])
    train_data = tv.datasets.CelebA(root='data', split='train', download=True, transform=transform)
    test_data = tv.datasets.CelebA(root='data', split='test', download=True, transform=transform)

    train_dataloader = tc.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = tc.utils.data.DataLoader(test_data, batch_size=batch_size, shuffle=True)

    return train_dataloader, test_dataloader
```

Here, my stacktrace is provided as well:

```
/Users/lucaslingle/opt/miniconda3/envs/pytorch181/bin/python /Users/lucaslingle/PycharmProjects/pytorch_dcgan/main.py
3112it [00:00, 2727831.57it/s]
26721026it [00:01, 23197510.91it/s]
3424458it [00:00, 12672950.45it/s]
6082035it [00:00, 17442168.31it/s]
12156055it [00:00, 21823850.99it/s]
2836386it [00:00, 11141828.55it/s]
Traceback (most recent call last):
  File "/Users/lucaslingle/PycharmProjects/pytorch_dcgan/main.py", line 35, in <module>
    dataloader, _ = get_dataloaders(batch_size=batch_size)
  File "/Users/lucaslingle/PycharmProjects/pytorch_dcgan/dataloaders.py", line 11, in get_dataloaders
    train_data = tv.datasets.CelebA(root='data', split='train', download=True, transform=transform)
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/site-packages/torchvision/datasets/celeba.py", line 77, in __init__
    self.download()
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/site-packages/torchvision/datasets/celeba.py", line 131, in download
    with zipfile.ZipFile(os.path.join(self.root, self.base_folder, "img_align_celeba.zip"), "r") as f:
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
```

## Expected behavior

The download just works.

## Environment

 - PyTorch / torchvision Version: 1.8.1 / 0.9.1
 - OS: MacOS Catalina
 - How you installed PyTorch / torchvision: conda and pip, respectively.
 - Build command you used (if compiling from source): N/A
 - Python version: 3.9.2
 - CUDA/cuDNN version: N/A
 - GPU models and configuration: N/A
 

## Additional context

This is [a known issue](https://github.com/pytorch/vision/issues/2262), and is due to a Google Drive quota limit, which returns an error page that is saved in the supposed zip file. The TensorVision developers closed another issue, saying they would wait for someone else to open a ticket complaining, before they found a robust fix. Anyways, here I am. This is a complaint. 

**My idea of a robust fix would be for the TensorVision developers to get permission from the dataset owners to mirror the data on S3. This was already done by the TensorVision developers for MNIST.** Please consider getting permission from the CelebA dataset owners to mirror the data on S3, like was done for MNIST. This will resolve the problem. 

For completeness, I have also opened [a feature request ticket](https://github.com/pytorch/vision/issues/3709) to address this issue. 


cc @pmeier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to load CelebA dataset: "File is not zip file" error. #3708

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to load CelebA dataset: "File is not zip file" error. #3708

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions