GitHub - google/transliteration: Transliteration data and models

Transliteration related data files and/or models.

Contains:

Arabic-English transliteration dataset mined from Wikipedia.
Trained transliteration modules for Arabic-English, English-Japanese and English-IPA.

The models

The transliteration models provided are recurrent neural networks trained with a CTC loss. For a detailed description of the models, see the paper.

Getting the code for loading and training models

If you want to use some of the models provided in this repository you can use the clstm library, as provided in the branch here.

To clone the repository:

git clone git@github.com:mihaelacr-google/clstm.git

How to use the trained models

The binary clstmfilter can be used to use an already existing model to transliterate your data.

To build the binary, use the command below. For more on how to install clstm read this. You can read more about scons here.

scons -j 4

For example, if you have a list of Arabic words which you want to transliterate to English, you can run the following commands in your shell:

set -a
load="ar2en.clstm"
./clstmfilter your_data.txt

How to train your own models

If you want to train a new model with your data, you can use the clstmfiltertrain binary.

To build the binary, run:

scons -j 4

To train the model:

set -a
lr=0.1
./clstmfiltertrain your_train_data.txt your_eval_data.txt

Reproduce our results

If you want to reproduce the results from our paper, you can run:

scons -j 4
set -a 
load=ar2en.clstm
with_gt=1
./clstmfilter ar2en-test.txt

This will load the one layer Arabic to English model, and produce the character error rate and the word error rate of the model on the test data. You can similary load the 2 layer model and compute the metrics of that model.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
models		models
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ar2en-eval.txt		ar2en-eval.txt
ar2en-test.txt		ar2en-test.txt
ar2en-train.txt		ar2en-train.txt
ar2en.txt		ar2en.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

ar2en-eval.txt

ar2en-eval.txt

ar2en-test.txt

ar2en-test.txt

ar2en-train.txt

ar2en-train.txt

ar2en.txt

ar2en.txt

Repository files navigation

The models

Getting the code for loading and training models

How to use the trained models

How to train your own models

Reproduce our results

About

Releases

Packages

License

google/transliteration

Folders and files

Latest commit

History

Repository files navigation

The models

Getting the code for loading and training models

How to use the trained models

How to train your own models

Reproduce our results

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks