-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
🐛 Bug
I am unable to download the CelebFace Attributes (CelebA) dataset using TorchVision. The error I get when attempting to download is "File is not a zip file."
I have encountered this error every time I've tried to download the file, over the course of the last 24 hours.
To Reproduce
Steps to reproduce the behavior:
- Import torchvision
- Call torchvision.datasets.CelebA(root='somewhere', split='train', download=True)
A code sample is provided below:
import torch as tc
import torchvision as tv
def get_dataloaders(batch_size):
transform = tv.transforms.Compose([
tv.transforms.CenterCrop(108),
tv.transforms.Resize(64),
tv.transforms.ToTensor(),
tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # scales pixels to [-1, 1].
])
train_data = tv.datasets.CelebA(root='data', split='train', download=True, transform=transform)
test_data = tv.datasets.CelebA(root='data', split='test', download=True, transform=transform)
train_dataloader = tc.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_dataloader = tc.utils.data.DataLoader(test_data, batch_size=batch_size, shuffle=True)
return train_dataloader, test_dataloader
Here, my stacktrace is provided as well:
/Users/lucaslingle/opt/miniconda3/envs/pytorch181/bin/python /Users/lucaslingle/PycharmProjects/pytorch_dcgan/main.py
3112it [00:00, 2727831.57it/s]
26721026it [00:01, 23197510.91it/s]
3424458it [00:00, 12672950.45it/s]
6082035it [00:00, 17442168.31it/s]
12156055it [00:00, 21823850.99it/s]
2836386it [00:00, 11141828.55it/s]
Traceback (most recent call last):
File "/Users/lucaslingle/PycharmProjects/pytorch_dcgan/main.py", line 35, in <module>
dataloader, _ = get_dataloaders(batch_size=batch_size)
File "/Users/lucaslingle/PycharmProjects/pytorch_dcgan/dataloaders.py", line 11, in get_dataloaders
train_data = tv.datasets.CelebA(root='data', split='train', download=True, transform=transform)
File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/site-packages/torchvision/datasets/celeba.py", line 77, in __init__
self.download()
File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/site-packages/torchvision/datasets/celeba.py", line 131, in download
with zipfile.ZipFile(os.path.join(self.root, self.base_folder, "img_align_celeba.zip"), "r") as f:
File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/zipfile.py", line 1257, in __init__
self._RealGetContents()
File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Expected behavior
The download just works.
Environment
- PyTorch / torchvision Version: 1.8.1 / 0.9.1
- OS: MacOS Catalina
- How you installed PyTorch / torchvision: conda and pip, respectively.
- Build command you used (if compiling from source): N/A
- Python version: 3.9.2
- CUDA/cuDNN version: N/A
- GPU models and configuration: N/A
Additional context
This is a known issue, and is due to a Google Drive quota limit, which returns an error page that is saved in the supposed zip file. The TensorVision developers closed another issue, saying they would wait for someone else to open a ticket complaining, before they found a robust fix. Anyways, here I am. This is a complaint.
My idea of a robust fix would be for the TensorVision developers to get permission from the dataset owners to mirror the data on S3. This was already done by the TensorVision developers for MNIST. Please consider getting permission from the CelebA dataset owners to mirror the data on S3, like was done for MNIST. This will resolve the problem.
For completeness, I have also opened a feature request ticket to address this issue.
cc @pmeier