Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

CudaHostAlloc takes a lot of time during training #124456

Open
niuliling123 opened this issue Apr 19, 2024 · 4 comments
Open

CudaHostAlloc takes a lot of time during training #124456

niuliling123 opened this issue Apr 19, 2024 · 4 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: CUDACachingAllocator module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@niuliling123
Copy link

niuliling123 commented Apr 19, 2024

馃悰 Describe the bug

set: pin_memory =True in Dataloader

Versions

image

pytorch 1.13
numpy 1.21
python3.8

cc @ssnl @VitalyFedyunin @ejguan @dzhulgakov @ptrblck

@xw285cornell
Copy link
Contributor

this CUDA API is supposed to be very slow. There is cuda host caching allocator that will creates a memory pool to avoid calling this API. Can you make sure you run the data loader multiple times to warm up the cache? It should stop showing up after a while

@niuliling123
Copy link
Author

I have already used data cache to speed up. it is very common in my model.
What is the size of pin_memory 锛烡uring the training process, I saw that there was still a large amount of remaining GPU memory.

@bdhirsh bdhirsh added module: dataloader Related to torch.utils.data.DataLoader and Sampler module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: CUDACachingAllocator labels Apr 21, 2024
@gokulavasan
Copy link

@niuliling123 What is the approx size of batch (received by dataloader from the worker) after collation?

@niuliling123
Copy link
Author

batch_size is 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: CUDACachingAllocator module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants