Transliteration related data files and/or models.
- Arabic-English transliteration dataset mined from Wikipedia.
- Trained transliteration modules for Arabic-English, English-Japanese and English-IPA.
Getting the code for loading and training models
To clone the repository:
git clone firstname.lastname@example.org:mihaelacr-google/clstm.git
How to use the trained models
clstmfilter can be used to use an already existing model to transliterate your data.
scons -j 4
For example, if you have a list of Arabic words which you want to transliterate to English, you can run the following commands in your shell:
set -a load="ar2en.clstm" ./clstmfilter your_data.txt
How to train your own models
If you want to train a new model with your data, you can use the
To build the binary, run:
scons -j 4
To train the model:
set -a lr=0.1 ./clstmfiltertrain your_train_data.txt your_eval_data.txt
Reproduce our results
If you want to reproduce the results from our paper, you can run:
scons -j 4 set -a load=ar2en.clstm with_gt=1 ./clstmfilter ar2en-test.txt
This will load the one layer Arabic to English model, and produce the character error rate and the word error rate of the model on the test data. You can similary load the 2 layer model and compute the metrics of that model.