No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
corpora
pivot-translation/triangulation
sentence-alignment-similarity
smt-baseline-scripts
README.md

README.md

SMT-LowRec

This repository is for the following paper:

Enhancing Statistical Machine Translation For Low-ResourceLanguages Using Semantic Similarity

The repository includes:

  1. Corpora
  • Bilingual corpora: training, tuning, and test sets for language pairs: Japanese-Vietnamese, Indonesian-Vietnamese, Malay-Vietnamese, Filipino-Vietnamese.
  1. Sentence alignment
  • The Java implementation of [Moore, 2002] for sentence alignment.
  • Extending word alignment by word similarity using word2vec
  1. Pivot translation
  • The Java implementation of [Wu and Wang, 2007].

References

[1] Moore, Robert C. "Fast and accurate sentence alignment of bilingual corpora." Conference of the Association for Machine Translation in the Americas. Springer Berlin Heidelberg, 2002.

[2] Wu, Hua, and Haifeng Wang. "Pivot language approach for phrase-based statistical machine translation." Machine Translation 21.3 (2007): 165-181.