CudaHostAlloc takes a lot of time during training #124456

niuliling123 · 2024-04-19T04:50:56Z

🐛 Describe the bug

set: pin_memory =True in Dataloader

Versions

pytorch 1.13
numpy 1.21
python3.8

cc @ssnl @VitalyFedyunin @ejguan @dzhulgakov @ptrblck

xw285cornell · 2024-04-19T08:47:11Z

this CUDA API is supposed to be very slow. There is cuda host caching allocator that will creates a memory pool to avoid calling this API. Can you make sure you run the data loader multiple times to warm up the cache? It should stop showing up after a while

niuliling123 · 2024-04-19T10:37:55Z

I have already used data cache to speed up. it is very common in my model.
What is the size of pin_memory ？During the training process, I saw that there was still a large amount of remaining GPU memory.

gokulavasan · 2024-04-22T20:32:18Z

@niuliling123 What is the approx size of batch (received by dataloader from the worker) after collation?

niuliling123 · 2024-04-24T02:31:01Z

batch_size is 16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CudaHostAlloc takes a lot of time during training #124456

CudaHostAlloc takes a lot of time during training #124456

niuliling123 commented Apr 19, 2024 •

edited by pytorch-bot bot

xw285cornell commented Apr 19, 2024

niuliling123 commented Apr 19, 2024

gokulavasan commented Apr 22, 2024

niuliling123 commented Apr 24, 2024

CudaHostAlloc takes a lot of time during training #124456

CudaHostAlloc takes a lot of time during training #124456

Comments

niuliling123 commented Apr 19, 2024 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

xw285cornell commented Apr 19, 2024

niuliling123 commented Apr 19, 2024

gokulavasan commented Apr 22, 2024

niuliling123 commented Apr 24, 2024

niuliling123 commented Apr 19, 2024 •

edited by pytorch-bot bot