No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This repository is for the following paper:

Enhancing Statistical Machine Translation For Low-ResourceLanguages Using Semantic Similarity

The repository includes:

  1. Corpora
  • Bilingual corpora: training, tuning, and test sets for language pairs: Japanese-Vietnamese, Indonesian-Vietnamese, Malay-Vietnamese, Filipino-Vietnamese.
  1. Sentence alignment
  • The Java implementation of [Moore, 2002] for sentence alignment.
  • Extending word alignment by word similarity using word2vec
  1. Pivot translation
  • The Java implementation of [Wu and Wang, 2007].


[1] Moore, Robert C. "Fast and accurate sentence alignment of bilingual corpora." Conference of the Association for Machine Translation in the Americas. Springer Berlin Heidelberg, 2002.

[2] Wu, Hua, and Haifeng Wang. "Pivot language approach for phrase-based statistical machine translation." Machine Translation 21.3 (2007): 165-181.