Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

How can I use multi-GPU to train UNMT #23

Closed
hpsun1109 opened this issue Mar 2, 2019 · 6 comments
Closed

How can I use multi-GPU to train UNMT #23

hpsun1109 opened this issue Mar 2, 2019 · 6 comments

Comments

@hpsun1109
Copy link

I add --local_rank, but raise error.

SLURM job: False
Traceback (most recent call last):
File "train.py", line 322, in
main(params)
File "train.py", line 198, in main
init_distributed_mode(params)
File "XLM/src/slurm.py", line 110, in init_distributed_mode
params.global_rank = int(os.environ['RANK'])
File "/usr/lib/python3.5/os.py", line 725, in getitem
raise KeyError(key) from None
KeyError: 'RANK'

@hpsun1109
Copy link
Author

another question, in the UNMT model, only one encoder and one decoder? Thanks.

@glample
Copy link
Contributor

glample commented Mar 2, 2019

You should not handle the --local_rank yourself. You can use the following command to train with multi-GPU: https://github.com/facebookresearch/XLM#how-can-i-run-experiments-on-multiple-gpus

export NGPU=8; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py ARGUMENTS

And no, there are 2 separate models for UNMT, one encoder and one decoder, but they are initialized with the same weights (apart from the parameters of the source attention in the decoder that remain randomly initialized).

@BinWone
Copy link

BinWone commented Jul 4, 2019

You should not handle the --local_rank yourself. You can use the following command to train with multi-GPU: https://github.com/facebookresearch/XLM#how-can-i-run-experiments-on-multiple-gpus

export NGPU=8; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py ARGUMENTS

And no, there are 2 separate models for UNMT, one encoder and one decoder, but they are initialized with the same weights (apart from the parameters of the source attention in the decoder that remain randomly initialized).

I using multi-GPU to pre-training the model like export NGPU=8; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py ARGUMENTS, it just run the same job on 8 GPUs, the training time is the same as training on 1 GPU, it doesn't fast the pre-training process.
How to set the params and I can fast the training process on multi-GPU?

@glample
Copy link
Contributor

glample commented Jul 4, 2019

What do you mean by the training time is the same? Is the perplexity the same at the end of a few epochs? Or do you look at the number of words per second? The number of words per second in the log is given per GPU, so this will be the same. But the loss / perplexity should decrease much faster.

@BinWone
Copy link

BinWone commented Jul 4, 2019

What do you mean by the training time is the same? Is the perplexity the same at the end of a few epochs? Or do you look at the number of words per second? The number of words per second in the log is given per GPU, so this will be the same. But the loss / perplexity should decrease much faster.

yes, i made a mistake. you are right, multi-gpu training get better valid ppl and acc.
pretraining on 1 GPU:
image
on 4 GPU
image

@glample
Copy link
Contributor

glample commented Jul 4, 2019

Looks good :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants