Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

SMT-LowRec

This repository is for the following paper:

Enhancing Statistical Machine Translation For Low-ResourceLanguages Using Semantic Similarity

The repository includes:

  1. Corpora
  • Bilingual corpora: training, tuning, and test sets for language pairs: Japanese-Vietnamese, Indonesian-Vietnamese, Malay-Vietnamese, Filipino-Vietnamese.
  1. Sentence alignment
  • The Java implementation of [Moore, 2002] for sentence alignment.
  • Extending word alignment by word similarity using word2vec
  1. Pivot translation
  • The Java implementation of [Wu and Wang, 2007].

References

[1] Moore, Robert C. "Fast and accurate sentence alignment of bilingual corpora." Conference of the Association for Machine Translation in the Americas. Springer Berlin Heidelberg, 2002.

[2] Wu, Hua, and Haifeng Wang. "Pivot language approach for phrase-based statistical machine translation." Machine Translation 21.3 (2007): 165-181.

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published

Languages