This repository is for the following paper:
Enhancing Statistical Machine Translation For Low-ResourceLanguages Using Semantic Similarity
The repository includes:
- Bilingual corpora: training, tuning, and test sets for language pairs: Japanese-Vietnamese, Indonesian-Vietnamese, Malay-Vietnamese, Filipino-Vietnamese.
- Sentence alignment
- The Java implementation of [Moore, 2002] for sentence alignment.
- Extending word alignment by word similarity using word2vec
- Pivot translation
- The Java implementation of [Wu and Wang, 2007].
 Moore, Robert C. "Fast and accurate sentence alignment of bilingual corpora." Conference of the Association for Machine Translation in the Americas. Springer Berlin Heidelberg, 2002.
 Wu, Hua, and Haifeng Wang. "Pivot language approach for phrase-based statistical machine translation." Machine Translation 21.3 (2007): 165-181.