How can I use multi-GPU to train UNMT #23

hpsun1109 · 2019-03-02T07:35:27Z

I add --local_rank, but raise error.

SLURM job: False
Traceback (most recent call last):
File "train.py", line 322, in
main(params)
File "train.py", line 198, in main
init_distributed_mode(params)
File "XLM/src/slurm.py", line 110, in init_distributed_mode
params.global_rank = int(os.environ['RANK'])
File "/usr/lib/python3.5/os.py", line 725, in getitem
raise KeyError(key) from None
KeyError: 'RANK'

hpsun1109 · 2019-03-02T07:51:35Z

another question, in the UNMT model, only one encoder and one decoder? Thanks.

glample · 2019-03-02T13:04:20Z

You should not handle the --local_rank yourself. You can use the following command to train with multi-GPU: https://github.com/facebookresearch/XLM#how-can-i-run-experiments-on-multiple-gpus

export NGPU=8; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py ARGUMENTS

And no, there are 2 separate models for UNMT, one encoder and one decoder, but they are initialized with the same weights (apart from the parameters of the source attention in the decoder that remain randomly initialized).

BinWone · 2019-07-04T06:29:20Z

You should not handle the --local_rank yourself. You can use the following command to train with multi-GPU: https://github.com/facebookresearch/XLM#how-can-i-run-experiments-on-multiple-gpus
export NGPU=8; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py ARGUMENTS
And no, there are 2 separate models for UNMT, one encoder and one decoder, but they are initialized with the same weights (apart from the parameters of the source attention in the decoder that remain randomly initialized).

I using multi-GPU to pre-training the model like export NGPU=8; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py ARGUMENTS, it just run the same job on 8 GPUs, the training time is the same as training on 1 GPU, it doesn't fast the pre-training process.
How to set the params and I can fast the training process on multi-GPU?

glample · 2019-07-04T09:29:56Z

What do you mean by the training time is the same? Is the perplexity the same at the end of a few epochs? Or do you look at the number of words per second? The number of words per second in the log is given per GPU, so this will be the same. But the loss / perplexity should decrease much faster.

BinWone · 2019-07-04T10:58:49Z

What do you mean by the training time is the same? Is the perplexity the same at the end of a few epochs? Or do you look at the number of words per second? The number of words per second in the log is given per GPU, so this will be the same. But the loss / perplexity should decrease much faster.

yes, i made a mistake. you are right, multi-gpu training get better valid ppl and acc.
pretraining on 1 GPU:

on 4 GPU

glample · 2019-07-04T11:08:13Z

Looks good :)

hpsun1109 closed this as completed Mar 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I use multi-GPU to train UNMT #23

How can I use multi-GPU to train UNMT #23

hpsun1109 commented Mar 2, 2019

hpsun1109 commented Mar 2, 2019

glample commented Mar 2, 2019

BinWone commented Jul 4, 2019

glample commented Jul 4, 2019

BinWone commented Jul 4, 2019

glample commented Jul 4, 2019

How can I use multi-GPU to train UNMT #23

How can I use multi-GPU to train UNMT #23

Comments

hpsun1109 commented Mar 2, 2019

hpsun1109 commented Mar 2, 2019

glample commented Mar 2, 2019

BinWone commented Jul 4, 2019

glample commented Jul 4, 2019

BinWone commented Jul 4, 2019

glample commented Jul 4, 2019