No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
ChungJunyoung ChungJunyoung
ChungJunyoung and ChungJunyoung added appendix
Latest commit e738dc7 Aug 8, 2016
Type Name Latest commit message Commit time
Failed to load latest commit information.
character_base added translation scripts May 10, 2016
character_biscale updated Jul 2, 2016
preprocess remove ipdb Jun 1, 2016
presentation added appendix Aug 8, 2016
subword_base added translation scripts May 10, 2016
LICENSE added license May 13, 2016 updated May 29, 2016 initial commit May 10, 2016 minor fix Jun 1, 2016
translate_readme.txt updated May 21, 2016

Character-Level Neural Machine Translation

This is an implementation of the models described in the paper "A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation".


The majority of the script files are written in pure Theano.
In the preprocessing pipeline, there are the following dependencies.
Python Libraries: NLTK
Subword-NMT (

This code is based on the dl4mt library.

Be sure to include the path to this library in your PYTHONPATH.

We recommend you to use the latest version of Theano.
If you want exact reproduction however, please use the following version of Theano.
commit hash: fdfbab37146ee475b3fd17d8d104fb09bf3a8d5c

Preparing Text Corpora:

The original text corpora can be downloaded from
Once the downloading is finished, use the '' in 'preprocess' directory to preprocess the text files. For the character-level decoders, preprocessing is not necessary however, in order to compare the results with subword-level decoders and other word-level approaches, we apply the same process to all of the target corpora. Finally, use '' for character-case and '' for subword-case to build the vocabulary.