Skip to content

Why using mp.spawn is slower than using torch.distributed.launch when using multi-GPU training  #47587

@WZMIAOMIAO

Description

@WZMIAOMIAO

❓ Questions and Help

Dear Pytorch Team:
I've been reading the documents you provided these days about distributed training. I tried to use mp.spawn and torch.distributed.launch to start training. I found that using mp.spawn is slower than torch.distributed.launch, mainly in the early stage of each epoch data read.
For example, when using torch.distributed.launch, it only takes 8 seconds to train an epoch. When using mp.spawn, it takes 17 seconds to train an epoch, of which the first 9 seconds have been waiting (GPU util is 0%).
And I found that when using torch.distributed.launch, I can see multiple processes through ps -ef | grep train_multi instruction, but when I use mp.spawn, I can only see one processes.
I don't know if I use it incorrectly. I hope I can get your advice. Looking forward to your reply.

environments:
OS: Centos7
Python: 3.6
Pytorch: 1.7 GPU
CUDA: 10.1
GPU: Tesla V100

using code:
https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_classification/train_multi_GPU

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: multiprocessingRelated to torch.multiprocessingoncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions