Skip to content

nguyenlab/SMT-LowRec

Repository files navigation

SMT-LowRec

This repository is for the following paper:

Enhancing Statistical Machine Translation For Low-ResourceLanguages Using Semantic Similarity

The repository includes:

  1. Corpora
  • Bilingual corpora: training, tuning, and test sets for language pairs: Japanese-Vietnamese, Indonesian-Vietnamese, Malay-Vietnamese, Filipino-Vietnamese.
  1. Sentence alignment
  • The Java implementation of [Moore, 2002] for sentence alignment.
  • Extending word alignment by word similarity using word2vec
  1. Pivot translation
  • The Java implementation of [Wu and Wang, 2007].

References

[1] Moore, Robert C. "Fast and accurate sentence alignment of bilingual corpora." Conference of the Association for Machine Translation in the Americas. Springer Berlin Heidelberg, 2002.

[2] Wu, Hua, and Haifeng Wang. "Pivot language approach for phrase-based statistical machine translation." Machine Translation 21.3 (2007): 165-181.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages