Skip to content

Loading CelebA dataset in Google Colab causes runtime to run out of memory. #3208

@Kinyugo

Description

@Kinyugo

Loading the CelebA dataset using torchvision.datasets.CelebA uses up all the memory of a Google Colab runtime causing a crash.

Am loading the CelebA dataset in Google Colab. During the process, the memory consumption rises till it reaches the maximum memory allocated i.e: 12GB. This leads to the colab runtime crashing.

To Reproduce

# Root directory for the dataset
data_root = 'data/celeba'
# Spatial size of training images, images are resized to this size.
image_size = 64

celeba_data = datasets.CelebA(data_root,
                              download=True,
                              transform=transforms.Compose([
                                  transforms.Resize(image_size),
                                  transforms.CenterCrop(image_size),
                                  transforms.ToTensor(),
                                  transforms.Normalize(mean=[0.5, 0.5, 0.5],
                                                       std=[0.5, 0.5, 0.5])
                              ]))

The notebook can be found here.

Expected behavior

I expected that running the script above, correctly loads and applies the transformations to the dataset without enormous memory requirements.

Environment

  • PyTorch version: 1.7.1+cu101

  • Is debug build: False

  • CUDA used to build PyTorch: 10.1

  • ROCM used to build PyTorch: N/A

  • OS: Ubuntu 18.04.5 LTS (x86_64)

  • GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

  • Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)

  • CMake version: version 3.12.0

  • Python version: 3.6 (64-bit runtime)

  • Is CUDA available: True

  • CUDA runtime version: 10.1.243

  • GPU models and configuration: GPU 0: Tesla T4

  • Nvidia driver version: 418.67

  • cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

  • HIP runtime version: N/A

  • MIOpen runtime version: N/A

Versions of relevant libraries:

  • [pip3] numpy==1.19.4
  • [pip3] torch==1.7.1+cu101
  • [pip3] torchaudio==0.7.2
  • pip3] torchsummary==1.5.1
  • [pip3] torchtext==0.3.1
  • [pip3] torchvision==0.8.2+cu101
  • [conda] Could not collect

Additional context

I have also tried to separate the download process and the dataloading process to see if this can solve the memory problem, i.e:

# Download the data first
datasets.CelebA(data_root, download=True)
celeba_dataset = datasets.CelebA(data_root, ...)

The results are the same, the runtime crashes.

cc @pmeier

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions