Skip to content

google/wmt19-paraphrased-references

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Additional reference translations for the English-to-German WMT test set nestest2018, newstest2019 and newstest2020.

The contents of this repository are not an official Google product.

[Additional References] The sentences below are alternative reference translations for the WMT newstest20XX English-German test sets, produced through human translation or human paraphrasing. Automatic metrics like BLEU have been demonstrated to correlate better with human judgement when using these references than when using standard references. For details on data collection and how paraphrased references can improve the automatic evaluation of machine translation, see our paper below. Also, consider citing the paper if you are using this data for your research. Currently the repo contains additional references for newstes2018, newstest2019 and newstest2020:

  1. newstest2018 WMT.p A paraphrased as-much-as-possible version of the original WMT reference.

  2. newstest2019 AR An additional high quality reference translation.

  3. newstest2019 AR.p A paraphrased as-much-as-possible version of AR.

  4. newstest2019 WMT.p A paraphrased as-much-as-possible version of the original WMT reference.

  5. newstest2019 HQ(R) A combined reference from the original reference translation and AR. Per sentence, humans picked one of the two reference translations.

  6. newstest2019 HQ(P) A combined reference from WMT.p and AR.p. Per sentence, humans picked one of the two reference translations.

  7. newstest2019 HQ(all) A combined reference from WMT, AR, WMT.p, AR.p. Per sentence, humans picked one of the two reference translations.

  8. newstest2020 WMT.p A paraphrased as-much-as-possible version of the original WMT reference.

[Research Paper]

BLEU might be Guilty but References are not Innocent Markus Freitag, David Grangier, Isaac Caswell - EMNLP 2020.

@inproceedings{freitag-bleu-paraphrase-references-2020,
title={BLEU might be Guilty but References are not Innocent},
author={Markus Freitag and David Grangier and Isaac Caswell},
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
year={2020},
month={nov}
}

Human-Paraphrased References Improve Neural Machine Translation Markus Freitag, George Foster, David Grangier, Colin Cherry - WMT 2020

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published