DataLoader with collate_fn that returns tensors in GPU memory raises warnings when deleted #98002

jakelevi1996 · 2023-03-30T18:53:40Z

🐛 Describe the bug

I defined a DataLoader with collate_fn that returns tensors in GPU memory, with num_workers=1 and prefetch_factor=2 so that as I iterate through the DataLoader, the tensors it returns are already in GPU memory. When the DataLoader is deleted, a lot of warnings are raised from CUDAGuardImpl.h. For example:

import torch
import torchvision

def collate_gpu(batch):
    x, t = torch.utils.data.default_collate(batch)
    return x.to(device=0), t.to(device=0)

train_dataset = torchvision.datasets.MNIST(
    './data',
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=100,
    num_workers=1,
    prefetch_factor=2,
    persistent_workers=True,
    collate_fn=collate_gpu,
)

if __name__ == "__main__":
    x, t = next(iter(train_loader))
    print("About to call `del train_loader`...")
    del train_loader
    print("Finished `del train_loader`")

Console output:

About to call `del train_loader`...
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: driver shutting down (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
Finished `del train_loader`

In reality I don't call del train_loader, but I initialise train_loader inside a function, and when the function exits, the result is the same. (Weirdly, if I don't call del train_loader and train_loader is not defined inside a function, then there are no warning messages at all).

PS am I being silly? I would have assumed it would be a very common use case to want to pre-fetch data in GPU memory with a DataLoader rather than waiting in the main process for data to be copied to the GPU, but I can't seem to find many posts on this topic at all (one example is this thread, but it's 4 years old and the error message is different)

Versions

Versions/OS:
Python 3.7.6
Cuda 11.7
PyTorch 1.13.0+cu117
Windows 10

(sorry I don't fancy running a >600 line Python script downloaded from the internet, regardless of the author)

cc @ssnl @VitalyFedyunin @ejguan @NivekT @dzhulgakov

The text was updated successfully, but these errors were encountered:

jakelevi1996 · 2023-03-30T19:14:37Z

Hi @pat749 , thanks for the suggestion, but I still get all the same warnings on my system if I call torch.cuda.empty_cache() after del train_loader (and also if I call torch.cuda.empty_cache() before del train_loader)

jakelevi1996 · 2023-03-30T20:03:26Z

Hi @pat749 , I think the link you posted was to a different notebook, but I tried implementing this code in a new Colab notebook and didn't get any warnings, even without calling torch.cuda.empty_cache(). It seems these warnings must be system dependent, because I don't observe them on Colab, but they definitely do appear on my local PC.

ejguan · 2023-04-17T19:36:07Z

First of all, it's better to keep Tensor in CPU within worker processes. Otherwise, each worker process will create a separate CUDA context. You can enable pin_memory=True for DataLoader to move Tensor to pinned memory to reduce the time moving data from CPU to CUDA.

train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=100,
    num_workers=1,
    prefetch_factor=2,
    persistent_workers=True,
    collate_fn=collate_gpu,
    pin_memory=True,
)

for x, t in train_loader:
    x.to(device)
    t.to(device)
    ...

crzdg · 2024-03-30T23:52:16Z

First of all, it's better to keep Tensor in CPU within worker processes. Otherwise, each worker process will create a separate CUDA context. You can enable pin_memory=True for DataLoader to move Tensor to pinned memory to reduce the time moving data from CPU to CUDA.
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=100,
    num_workers=1,
    prefetch_factor=2,
    persistent_workers=True,
    collate_fn=collate_gpu,
    pin_memory=True,
)

for x, t in train_loader:
    x.to(device)
    t.to(device)
    ...

Thanks for the suggestions of keeping batch preparation in CPU memory and moving tensors right before actual usage.

However, I think the code examples does have some flaws.

The collate_fn=collate_gpu is not needed anymore in that case right? With the collage_gpu from OP we move the tensors already to GPU, so the to-calls in the loop are superfluous.
The to-call of the tensors does return the tensor on the device, but not move the called tensor itself to the device. So this rather should be. right?

for x, t in train_loader:
    x = x.to(device)
    t = t.to(device)
    ...

This comment was marked as off-topic.

Sign in to view

bdhirsh added module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLoader with collate_fn that returns tensors in GPU memory raises warnings when deleted #98002

DataLoader with collate_fn that returns tensors in GPU memory raises warnings when deleted #98002

jakelevi1996 commented Mar 30, 2023 •

edited by pytorch-bot bot

This comment was marked as off-topic.

jakelevi1996 commented Mar 30, 2023

This comment was marked as off-topic.

jakelevi1996 commented Mar 30, 2023

This comment was marked as off-topic.

ejguan commented Apr 17, 2023

crzdg commented Mar 30, 2024

DataLoader with collate_fn that returns tensors in GPU memory raises warnings when deleted #98002

DataLoader with collate_fn that returns tensors in GPU memory raises warnings when deleted #98002

Comments

jakelevi1996 commented Mar 30, 2023 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

This comment was marked as off-topic.

jakelevi1996 commented Mar 30, 2023

This comment was marked as off-topic.

jakelevi1996 commented Mar 30, 2023

This comment was marked as off-topic.

ejguan commented Apr 17, 2023

crzdg commented Mar 30, 2024

jakelevi1996 commented Mar 30, 2023 •

edited by pytorch-bot bot