Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce the result on WMT14 En-De #202

Closed
ustctf-zz opened this issue Jun 28, 2018 · 10 comments
Closed

How to reproduce the result on WMT14 En-De #202

ustctf-zz opened this issue Jun 28, 2018 · 10 comments

Comments

@ustctf-zz
Copy link

ustctf-zz commented Jun 28, 2018

Hi,

Thank you for providing such an impressive toolkit!

For replicating the WMT14 En-De translation result, I follow the instructions here , but after running on 8 M40 for 5.5 days, the test set BLEU (<27) cannot match the one stated in the paper , or even the original T2T paper (28.4). May I know what's wrong at my side? Here is the running script:

model=transformer
PROBLEM=WMT14_ENDE
SETTING=transformer_vaswani_wmt_en_de_big

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py ${REMOTE_DATA_PATH}/wmt14_en_de_joined_dict \
--arch $SETTING --share-all-embeddings \
  --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
  --lr 0.001 --min-lr 1e-09  --update-freq 16\
  --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  --max-tokens 4096 --no-progress-bar --save-dir ${REMOTE_MODEL_PATH}/$model/$PROBLEM/$SETTING

(I do not use --fp16 and slightly enlarge the batch size from 3584 to 4096)

Here is the test script:

python generate.py ${REMOTE_DATA_PATH}/wmt14_en_de_joined_dict --path ${REMOTE_MODEL_PATH}/${model}/${PROBLEM}/${SETTING}/checkpoint_best.pt --batch-size 128 --beam 4 --lenpen 0.6 --quiet --remove-bpe --no-progress-bar

It outputs (after training for 5.5 days): Generate test with beam=4: BLEU4 = 26.66, 57.9/32.3/20.4/13.2 (BP=1.000, ratio=1.013, syslen=66179, reflen=65346)

BTW, it seems the dataset generated using prepare-wmt14en2de.sh has < 4M training pairs, not matching 4.5M, is it a possible reason?

Thanks a lot.

@myleott
Copy link
Contributor

myleott commented Jun 28, 2018

Yes, you are right. Originally I used the Google dataset [1], but was hoping to reproduce the results with our script, because it's not clear how the Google version was preprocessed.

I'm working on an updated preprocessing script that should better match the Google version (~4.5M pairs). I'll post it here and update the README shortly.

[1] https://github.com/tensorflow/tensor2tensor/blob/6a7ef7f79f56fdcb1b16ae76d7e61cb09033dc4f/tensor2tensor/data_generators/translate_ende.py#L60-L61

@myleott
Copy link
Contributor

myleott commented Jun 28, 2018

Please try this dataset: #203

I just ran it on 128 GPUs and get the same results as (actually a little better than) the paper now.

@ustctf-zz
Copy link
Author

Thanks @myleott !

I'm running on the new dataset (with 8 GPUs), and will return to you with latest result.

@ustctf-zz
Copy link
Author

Hi @myleott , after running on 8 M40 GPUs for about 5 days, I obtain a BLEU of 28.77 on WMT14 En-De. Thanks again for the code and help!

BTW, may I know that do you have a plan of giving the detailed config/command to reproduce the result on WMT14 En-Fr? Thanks!

@myleott
Copy link
Contributor

myleott commented Jul 11, 2018

For En-Fr you can use the transformer_vaswani_wmt_en_fr_big architecture. It's nearly identical to the En-De architecture except that we use a smaller dropout value: https://github.com/pytorch/fairseq/blob/f26b6affdaf67d271e0d39f4c4c8384c4e8160d9/fairseq/models/transformer.py#L467-L470

I used the standard fairseq En-Fr dataset with 40k BPE tokens, available here: https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh. For preprocessing make sure to add the --joined-dictionary flag

@myleott myleott closed this as completed Jul 11, 2018
@ustctf-zz
Copy link
Author

Thanks!

@wangqiangneu
Copy link

Hi @myleott , after running on 8 M40 GPUs for about 5 days, I obtain a BLEU of 28.77 on WMT14 En-De. Thanks again for the code and help!

BTW, may I know that do you have a plan of giving the detailed config/command to reproduce the result on WMT14 En-Fr? Thanks!

Hi @myleott @ustctf, if I use the new processed WMT14 En-De data provided by Google, should I also do some postprocessing (like get_ende_bleu.sh in tensor2tensor) to get a good BLEU?

@kalyangvs
Copy link
Contributor

hi @ustctf Can you provide the BLEU score for en-fr by using this script https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh
if you have used base transformer too please provide the scores. thanks.

@ustctf-zz
Copy link
Author

@gvskalyan Sorry I've no records. Maybe you can ask for the official help.

@kalyangvs
Copy link
Contributor

@gvskalyan Sorry I've no records. Maybe you can ask for the official help.

Yeah, Thank You.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants