Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader with collate_fn that returns tensors in GPU memory raises warnings when deleted #98002

Open
jakelevi1996 opened this issue Mar 30, 2023 · 7 comments
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jakelevi1996
Copy link

jakelevi1996 commented Mar 30, 2023

馃悰 Describe the bug

I defined a DataLoader with collate_fn that returns tensors in GPU memory, with num_workers=1 and prefetch_factor=2 so that as I iterate through the DataLoader, the tensors it returns are already in GPU memory. When the DataLoader is deleted, a lot of warnings are raised from CUDAGuardImpl.h. For example:

import torch
import torchvision

def collate_gpu(batch):
    x, t = torch.utils.data.default_collate(batch)
    return x.to(device=0), t.to(device=0)

train_dataset = torchvision.datasets.MNIST(
    './data',
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=100,
    num_workers=1,
    prefetch_factor=2,
    persistent_workers=True,
    collate_fn=collate_gpu,
)

if __name__ == "__main__":
    x, t = next(iter(train_loader))
    print("About to call `del train_loader`...")
    del train_loader
    print("Finished `del train_loader`")

Console output:

About to call `del train_loader`...
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: driver shutting down (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
Finished `del train_loader`

In reality I don't call del train_loader, but I initialise train_loader inside a function, and when the function exits, the result is the same. (Weirdly, if I don't call del train_loader and train_loader is not defined inside a function, then there are no warning messages at all).

PS am I being silly? I would have assumed it would be a very common use case to want to pre-fetch data in GPU memory with a DataLoader rather than waiting in the main process for data to be copied to the GPU, but I can't seem to find many posts on this topic at all (one example is this thread, but it's 4 years old and the error message is different)

Versions

Versions/OS:
Python 3.7.6
Cuda 11.7
PyTorch 1.13.0+cu117
Windows 10

(sorry I don't fancy running a >600 line Python script downloaded from the internet, regardless of the author)

cc @ssnl @VitalyFedyunin @ejguan @NivekT @dzhulgakov

@pat749

This comment was marked as off-topic.

@jakelevi1996
Copy link
Author

Hi @pat749 , thanks for the suggestion, but I still get all the same warnings on my system if I call torch.cuda.empty_cache() after del train_loader (and also if I call torch.cuda.empty_cache() before del train_loader)

@pat749

This comment was marked as off-topic.

@jakelevi1996
Copy link
Author

Hi @pat749 , I think the link you posted was to a different notebook, but I tried implementing this code in a new Colab notebook and didn't get any warnings, even without calling torch.cuda.empty_cache(). It seems these warnings must be system dependent, because I don't observe them on Colab, but they definitely do appear on my local PC.

@pat749

This comment was marked as off-topic.

@bdhirsh bdhirsh added module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 31, 2023
@ejguan
Copy link
Contributor

ejguan commented Apr 17, 2023

First of all, it's better to keep Tensor in CPU within worker processes. Otherwise, each worker process will create a separate CUDA context. You can enable pin_memory=True for DataLoader to move Tensor to pinned memory to reduce the time moving data from CPU to CUDA.

train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=100,
    num_workers=1,
    prefetch_factor=2,
    persistent_workers=True,
    collate_fn=collate_gpu,
    pin_memory=True,
)

for x, t in train_loader:
    x.to(device)
    t.to(device)
    ...

@crzdg
Copy link

crzdg commented Mar 30, 2024

First of all, it's better to keep Tensor in CPU within worker processes. Otherwise, each worker process will create a separate CUDA context. You can enable pin_memory=True for DataLoader to move Tensor to pinned memory to reduce the time moving data from CPU to CUDA.

train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=100,
    num_workers=1,
    prefetch_factor=2,
    persistent_workers=True,
    collate_fn=collate_gpu,
    pin_memory=True,
)

for x, t in train_loader:
    x.to(device)
    t.to(device)
    ...

Thanks for the suggestions of keeping batch preparation in CPU memory and moving tensors right before actual usage.

However, I think the code examples does have some flaws.

  1. The collate_fn=collate_gpu is not needed anymore in that case right? With the collage_gpu from OP we move the tensors already to GPU, so the to-calls in the loop are superfluous.
  2. The to-call of the tensors does return the tensor on the device, but not move the called tensor itself to the device. So this rather should be. right?
for x, t in train_loader:
    x = x.to(device)
    t = t.to(device)
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants