-
Notifications
You must be signed in to change notification settings - Fork 38
Will you release the distillation dataset of wmt-en-de? #3
Comments
Hello there. I tried to reproduce en-de results in the paper, but I can only got about 22.6 BLEU score. Could you tell me some detail about your reproduction? Like what dataset did you use and what were the other hyperparameters? It would be very helpful if you can give me some information. Thx!! |
Hi, All hyperparameters are the same as the paper and the provided script. The data set is https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8 Thank you! |
Thank you very much. The reason why I couldn't reproduce the result seems to be the problem of the preprocess of the data. I lowercased all my data and lead to too much representations in the corpus. When I directly use your data, it works! Thanks again! |
Thanks for your interest. Please use the code here https://github.com/pytorch/fairseq/tree/master/examples/translation with this command python train.py your-data-bin --arch transformer --share-all-embeddings --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --lr 5e-4 --warmup-init-lr 1e-7 --min-lr 1e-9 --lr-scheduler inverse_sqrt --warmup-updates 4000 --optimizer adam --adam-betas '(0.9, 0.98)' --max-tokens 8192 --dropout 0.3 --encoder-layers 6 --encoder-embed-dim 1024 --decoder-layers 6 --decoder-embed-dim 1024 --max-update 300000 --update-freq 2 --fp16 --max-source-positions 10000 --max-target-positions 10000 --save-dir checkpoints |
Hello, Liu! Thanks for giving the test dataset to me last month. Now I am also in the phase trying to train the model from scratch but encounter the problem as you did. |
Hi, I have successfully trained the wmt-en-de from scratch. I use the distillation data set produced by a powerful Transformer big model (~29.3 BLEU score)(https://github.com/pytorch/fairseq/blob/master/examples/scaling_nmt/README.md#pre-trained-models) which can reproduce a final BLEU score >27.2. |
Hi @SunbowLiu Do you have any advice? |
The only way might be training from scratch. |
Hi, when I used the checkpoint_best.pt provided in readme and the inference script "python generate_cmlm.py ${output_dir}/data-bin --path ${model_dir}/checkpoint_best.pt --task translation_self --remove-bpe --max-sentences 20 --decoding-iterations 10 --decoding-strategy mask_predict", I can only got the bleu of 20.90. What is the problem? Are there any other hyperparameters I need to modify in the inference script? I see "average the 5 best checkpoints to create the final model" in the paper. So is the checkpoint_best.pt provided in the link the final model? If not, I wonder how to average the best checkpoints? Do we forward 5 models and average the prediction distribution? Thank you! |
Hi,
I have successfully reproduced the 27.03 BLEU score (N=10, l=5) and 1.2 times speedup (N=10, l=2) using your pre-trained wmt-en-de model.
I wanna train the model from scratch but the performance heavily relies on the distillation dataset you used (With raw data, I can only gain ~24 BLEU score), so it would be much better if you can provide this dataset.
Thank you!
The text was updated successfully, but these errors were encountered: