Hyperparameters for replicating supervised MT ro -> en result. #37

sarthakgarg · 2019-03-14T17:24:56Z

Hi,

I am trying to replicate the supervised MT ro -> en baseline of 28.4 mentioned in the paper. I was hoping that you could give me some idea about the hyperparameters for that.
Specifically can you tell me the values of #of BPE operations, learning rate and learning rate schedule used, dropout and attention dropout values, embedding size of the network, batch size and # of gpus used during training.

Thanks!

glample · 2019-03-16T11:26:13Z

Hi, we used the parameters of the README https://github.com/facebookresearch/XLM#train-on-unsupervised-mt-from-a-pretrained-model , with 60000 BPE operations, and 8 GPUs.

sarthakgarg · 2019-03-18T18:05:32Z

Thanks for that! I figured out the issue. I was only using EuroParl v8 parallel corpus and not the full (Europarl v8 + SETIMES2) wmt16 dataset.

sarthakgarg closed this as completed Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameters for replicating supervised MT ro -> en result. #37

Hyperparameters for replicating supervised MT ro -> en result. #37

sarthakgarg commented Mar 14, 2019

glample commented Mar 16, 2019

sarthakgarg commented Mar 18, 2019

Hyperparameters for replicating supervised MT ro -> en result. #37

Hyperparameters for replicating supervised MT ro -> en result. #37

Comments

sarthakgarg commented Mar 14, 2019

glample commented Mar 16, 2019

sarthakgarg commented Mar 18, 2019