Table of Contents
pip install detransliterator
from detransliterator import Detransliterator
detransliterator = Detransliterator('latin2nqo_001.38')
for latin in ["maari", "mààri", "magari", "màgàri", "makari", "màkàri"]:
nqo = detransliterator.detransliterate(latin, beam_size=5)
assert nqo == "ߡߊ߰ߙߌ"
# detransliteration tool
python -m detransliterator.tool --help
# csv/tsv extraction tool
python -m detransliterator.csv_tool --help
example: detransliterate a stream
echo "musa dunbuya" | python -m detransliterator.tool --model-name latin2nqo_001.35
example: extract a column from a csv file
cat test_csv_no_header.csv \
| python -m detransliterator.csv_tool \
extract-column --column-ix 1 \
> tmp_col1_1.txt
example: extract a column from a tsv file
cat test_tsv_with_header.tsv \
| python -m detransliterator.csv_tool \
extract-column --column-ix 1 --skip-lines 1 --csv-formatting-params delimiter tab \
> tmp_col1_2.txt
example: detransliterate a column from a csv file
cat test_csv_with_header.csv \
| python -m detransliterator.csv_tool extract-column --column-ix 1 --skip-lines 1 \
| python -m detransliterator.tool --model-name latin2nqo_001.35 \
> tmp_col_1_detransliterated_1.nqo
example: detransliterate a column from a tsv file
cat test_tsv_no_header.tsv \
| python -m detransliterator.csv_tool extract-column --column-ix 1 \
--csv-formatting-params delimiter tab \
| python -m detransliterator.tool --model-name latin2nqo_001.35 \
> tmp_col_1_detransliterated_2.nqo
example: use a particular GPU
CUDA_VISIBLE_DEVICES="1" echo "musa dunbuya" | python -m detransliterator.tool
Model | Source Script | Target Script | #Parameters | Validation BLEU | Test BLEU |
---|---|---|---|---|---|
latin2nqo_001.35 | latin | nqo | 2 520 576 | 75.56 | 74.14 |
latin2nqo_001.38 | latin | nqo | 3 909 120 | 78.51 | 77.06 |
variant | example latin | detransliterated nqo |
---|---|---|
maninka | maari | ߡߊ߰ߙߌ |
maninka tonal | mààri | ߡߊ߰ߙߌ |
bambara-ga | magari | ߡߊ߰ߙߌ |
bambara-ga tonal | màgàri | ߡߊ߰ߙߌ |
bambara-ka | makari | ߡߊ߰ߙߌ |
bambara-ka tonal | màkàri | ߡߊ߰ߙߌ |
detransliterator
is distributed under the terms of the MIT license.
If you use this software in your work, please cite it using the following metadata:
@software{Doumbouya_Detransliterator_2022,
author = {Doumbouya, Moussa},
month = {7},
title = {{Detransliterator}},
url = {https://github.com/mdoumbouya/detransliterator},
version = {0.0.2},
year = {2022}
}