Attention based English-Bodo Neural Machine Translation

Introduction
Dataset
Training
Testing
Translating a sentence

Introduction

English-Bodo (Eng-Brx) Neural Machine Translation despite having potential no prior research has been done. According to 2011 Census of India, Bodo has 14,57,547 native speakers and a total of 14,82,929 total speakers. During the initial stage of this work we searched for English-Bodo parallel corpus, to our surprise we found only one resource - Indian Language Technology Proliferation and Deployment Centre.

Dataset

Tourism corpus: English-Bodo parallel corpus of Tourism domain (20901 sentences) provided by the TDIL-DC

The detailed steps of cleaning and preprocessing is present in paper.

All experiment are performed using Tensorflow NMT Framework by Thang Luong, Eugene Brevdo, Rui Zhao.

Training

The training process is similar to that of Tensorflow NMT however for better handling of hyper-parameters and execution we made a shell script start.sh. The hyper-parameters could be changed in the start.sh file.

bash start.sh

or

chmod +x start.sh
./start.sh

The trained models are saved in the models/ directory.

Testing

For testing the trained model on test set execute out.sh.

Translating 2090 English sentences to Bodo sentences

bash out.sh

or

chmod +x out.sh
./out.sh

View the translated sentence

gedit output.brx

Terminal editor like nano does not render Bodo characters properly so it's better to view it in gedit or leafpad

Calculate BLEU score

perl multi-bleu.perl nmt_data/tst2013.brx < output.brx

Translating a sentence

Enter English sentence which you want to translate in test.en file
Change the models path in translate.sh
Generate translation [Eng->Brx]

bash translate.sh

or

chmod +x translate.sh
./translate.sh

See translated Bodo sentence

gedit out.brx

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attention based English-Bodo Neural Machine Translation

Introduction

Dataset

Training

Testing

Translating a sentence

About

Releases

Packages

maharajbrahma/bodo-nmt-attention

Folders and files

Latest commit

History

Repository files navigation

Attention based English-Bodo Neural Machine Translation

Introduction

Dataset

Training

Testing

Translating a sentence

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages