multi cpu dataloader #153

TITC · 2022-05-22T08:37:11Z

TITC
May 22, 2022
Collaborator

Phenomenon

I observed the GPU utilization is averaged at 20% when training.

Experience told me that the bottleneck is data transfer from CPU to GPU with a high probability.

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
102389 root      20   0 79.101g 6.011g 1.095g R 139.3  1.6 178:30.78 python -m pix2tex.train --config model/settings/config-vit.yaml

the server has 40 cores and theoratilly can achieve 4000% CPU utilization, but it's only 139%. As I think, that's the problem.

Methods

The simplest way comes up to me is to add multiprocessing in dataset.py. Or make kinda modify to compatible with torch.utils.data.DataLoader.

Trouble

I am choose the first method, use multiprocessing, but what I have got is a slower dataloader than serial. Here is the implement code.

I noticed the process will hang if the im's size bigger than tensor.size 1x32x300 in my laptop WSL environment, but 1x32x256 will execute smoothly.

And at server as below.

serial execute

Processing batch 113:   1%|█▊                                                                                                                            | 114/8082 [00:09<09:23, 14.14it/s]

Pool(20) execute

Processing batch 24:   0%|▍                                                                                                                           | 25/8082 [38:44<208:04:35, 92.97s/it]

14.14it/s versus 92.97s/it, (→_→)

How do you think about it? @lukas-blecher I prepare to implement the second method, transfer multi-cpu problem to torch.utils.data.DataLoader.

lukas-blecher · 2022-05-22T08:50:23Z

lukas-blecher
May 22, 2022
Maintainer

Yes, that is something that has been bothering me as well.
I think it would be best if we could use the already existing tools in pytorch.
I'm thinking torch.utils.data.IterableDataset. Most of the adaption should be relative straight forward, one thing I'm not sure how to handle is the worker_init_fn. It's probably best to split the pairs evenly across the different workers.

2 replies

TITC May 23, 2022
Collaborator Author

That's my today's job, I will implement it ASAP. And I think datasets could be the third scheme.

TITC May 23, 2022
Collaborator Author

Most of the adaption should be relative straight forward

You are right, only add a function and some variable. I will submit a PR later after useless code is removed and small modifications. Look forward for your review. ^_^.

References: IterableDataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi cpu dataloader #153

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

multi cpu dataloader #153

TITC May 22, 2022 Collaborator

Phenomenon

Methods

Trouble

Replies: 1 comment · 2 replies

lukas-blecher May 22, 2022 Maintainer

TITC May 23, 2022 Collaborator Author

TITC May 23, 2022 Collaborator Author

TITC
May 22, 2022
Collaborator

Replies: 1 comment 2 replies

lukas-blecher
May 22, 2022
Maintainer

TITC May 23, 2022
Collaborator Author

TITC May 23, 2022
Collaborator Author