Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Hyperparameters for replicating supervised MT ro -> en result. #37

Closed
sarthakgarg opened this issue Mar 14, 2019 · 2 comments
Closed

Hyperparameters for replicating supervised MT ro -> en result. #37

sarthakgarg opened this issue Mar 14, 2019 · 2 comments

Comments

@sarthakgarg
Copy link

Hi,

I am trying to replicate the supervised MT ro -> en baseline of 28.4 mentioned in the paper. I was hoping that you could give me some idea about the hyperparameters for that.
Specifically can you tell me the values of #of BPE operations, learning rate and learning rate schedule used, dropout and attention dropout values, embedding size of the network, batch size and # of gpus used during training.

Thanks!

@glample
Copy link
Contributor

glample commented Mar 16, 2019

Hi, we used the parameters of the README https://github.com/facebookresearch/XLM#train-on-unsupervised-mt-from-a-pretrained-model , with 60000 BPE operations, and 8 GPUs.

@sarthakgarg
Copy link
Author

Thanks for that! I figured out the issue. I was only using EuroParl v8 parallel corpus and not the full (Europarl v8 + SETIMES2) wmt16 dataset.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants