Replies: 1 comment 2 replies
-
Yes, that is something that has been bothering me as well. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Phenomenon
I observed the GPU utilization is averaged at 20% when training.
Experience told me that the bottleneck is data transfer from CPU to GPU with a high probability.
the server has 40 cores and theoratilly can achieve 4000% CPU utilization, but it's only 139%. As I think, that's the problem.
Methods
The simplest way comes up to me is to add
multiprocessing
in dataset.py. Or make kinda modify to compatible withtorch.utils.data.DataLoader
.Trouble
I am choose the first method, use multiprocessing, but what I have got is a slower dataloader than serial. Here is the implement code.
I noticed the process will hang if the im's size bigger than tensor.size 1x32x300 in my laptop WSL environment, but 1x32x256 will execute smoothly.
And at server as below.
serial execute
Pool(20) execute
14.14it/s versus 92.97s/it, (→_→)
How do you think about it? @lukas-blecher I prepare to implement the second method, transfer multi-cpu problem to
torch.utils.data.DataLoader
.Beta Was this translation helpful? Give feedback.
All reactions