Skip to content

Unable to load CelebA dataset: "File is not zip file" error. #3708

@lucaslingle

Description

@lucaslingle

🐛 Bug

I am unable to download the CelebFace Attributes (CelebA) dataset using TorchVision. The error I get when attempting to download is "File is not a zip file."

I have encountered this error every time I've tried to download the file, over the course of the last 24 hours.

To Reproduce

Steps to reproduce the behavior:

  1. Import torchvision
  2. Call torchvision.datasets.CelebA(root='somewhere', split='train', download=True)

A code sample is provided below:

import torch as tc
import torchvision as tv

def get_dataloaders(batch_size):
    transform = tv.transforms.Compose([
        tv.transforms.CenterCrop(108),
        tv.transforms.Resize(64),
        tv.transforms.ToTensor(),
        tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # scales pixels to [-1, 1].
    ])
    train_data = tv.datasets.CelebA(root='data', split='train', download=True, transform=transform)
    test_data = tv.datasets.CelebA(root='data', split='test', download=True, transform=transform)

    train_dataloader = tc.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = tc.utils.data.DataLoader(test_data, batch_size=batch_size, shuffle=True)

    return train_dataloader, test_dataloader

Here, my stacktrace is provided as well:

/Users/lucaslingle/opt/miniconda3/envs/pytorch181/bin/python /Users/lucaslingle/PycharmProjects/pytorch_dcgan/main.py
3112it [00:00, 2727831.57it/s]
26721026it [00:01, 23197510.91it/s]
3424458it [00:00, 12672950.45it/s]
6082035it [00:00, 17442168.31it/s]
12156055it [00:00, 21823850.99it/s]
2836386it [00:00, 11141828.55it/s]
Traceback (most recent call last):
  File "/Users/lucaslingle/PycharmProjects/pytorch_dcgan/main.py", line 35, in <module>
    dataloader, _ = get_dataloaders(batch_size=batch_size)
  File "/Users/lucaslingle/PycharmProjects/pytorch_dcgan/dataloaders.py", line 11, in get_dataloaders
    train_data = tv.datasets.CelebA(root='data', split='train', download=True, transform=transform)
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/site-packages/torchvision/datasets/celeba.py", line 77, in __init__
    self.download()
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/site-packages/torchvision/datasets/celeba.py", line 131, in download
    with zipfile.ZipFile(os.path.join(self.root, self.base_folder, "img_align_celeba.zip"), "r") as f:
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/Users/lucaslingle/opt/miniconda3/envs/pytorch181/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Expected behavior

The download just works.

Environment

  • PyTorch / torchvision Version: 1.8.1 / 0.9.1
  • OS: MacOS Catalina
  • How you installed PyTorch / torchvision: conda and pip, respectively.
  • Build command you used (if compiling from source): N/A
  • Python version: 3.9.2
  • CUDA/cuDNN version: N/A
  • GPU models and configuration: N/A

Additional context

This is a known issue, and is due to a Google Drive quota limit, which returns an error page that is saved in the supposed zip file. The TensorVision developers closed another issue, saying they would wait for someone else to open a ticket complaining, before they found a robust fix. Anyways, here I am. This is a complaint.

My idea of a robust fix would be for the TensorVision developers to get permission from the dataset owners to mirror the data on S3. This was already done by the TensorVision developers for MNIST. Please consider getting permission from the CelebA dataset owners to mirror the data on S3, like was done for MNIST. This will resolve the problem.

For completeness, I have also opened a feature request ticket to address this issue.

cc @pmeier

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions