Translate Japanese Names to / from English
This software uses PyTorch port of OpenNMT, an open-source (MIT) neural machine translation system.
It is also available as a free online service (up to 500 translations per month) from Japanese-name.app or from NamSor API.
Install OpenNMT-py
from pip
:
pip install OpenNMT-py
The names in data\parallel-japanese-corpus are represented (one line per FirstName or LastName) as follow :
^ln f u n a k o s h i $
^fn s a b u r o $
[...]
^ln 船 越 $
^fn 三 朗 $
[...]
Prepare Japanese to English data :
onmt_preprocess -train_src data/parallel-japanese-corpus/names-jp-train.txt -train_tgt data/parallel-japanese-corpus/names-en-train.txt -valid_src data/parallel-japanese-corpus/names-jp-val.txt -valid_tgt data/parallel-japanese-corpus/names-en-val.txt -save_data data/onmt-model/jp_en_data
This will output three files in data/onmt-model : jp_en_data.train.0.pt, jp_en_data.valid.0.pt, jp_en_data.vocab.pt
Prepare English to Japanese data :
onmt_preprocess -train_src data/parallel-japanese-corpus/names-en-train.txt -train_tgt data/parallel-japanese-corpus/names-jp-train.txt -valid_src data/parallel-japanese-corpus/names-en-val.txt -valid_tgt data/parallel-japanese-corpus/names-jp-val.txt -save_data data/onmt-model/en_jp_data
This will output three files in data/onmt-model : en_jp_data.train.0.pt, en_jp_data.valid.0.pt, en_jp_data.vocab.pt
Train Japanese to English machine translation model :
onmt_train -data data/onmt-model/jp_en_data -save_model data/onmt-model/jp_en_model -world_size 1 -gpu_ranks 0
This will output files in data/onmt-model : jp_en_model_step_100000.pt
Train English to Japanese machine translation model :
onmt_train -data data/onmt-model/en_jp_data -save_model data/onmt-model/en_jp_model -world_size 1 -gpu_ranks 0
This will output files in data/onmt-model : en_jp_model_step_100000.pt
Test Japanese to English machine translation model, with top-4 candidates outputs :
onmt_translate -model data/onmt-model/jp_en_model_step_100000.pt -src data/parallel-japanese-corpus/names-jp-test.txt -output data/test/names-en-test-out.txt -replace_unk -n_best 3
Test English to Japanese machine translation model, with top-4 candidates outputs :
onmt_translate -model data/onmt-model/en_jp_model_step_100000.pt -src data/parallel-japanese-corpus/names-en-test.txt -output data/test/names-jp-test-out.txt -replace_unk -n_best 3
We use the test outputs to calculate the accuracy, for getting the first translation right ; the first OR the second translation right ; any of the first N candidates right :
Translation direction | Match 1 | Match 2 | Match 3 | Match 4 | Match 5 |
---|---|---|---|---|---|
English To Japanese | 57% | 70% | 76% | 79% | 82% |
Japanese To English | 87% | 92% | 94% | 96% | 97% |
Install flask
from pip
:
pip install flask
To run the ONMT server, copy jp_en_model_step_100000.pt and en_jp_model_step_100000.pt into directory /available_models/ then run :
onmt_server
You can try the following GET method to check that the server is running :
curl -i -X GET \
http://localhost:5000/translator/health
which should return
{
"status": "ok"
}
Use model ID=100 for translating to English and ID=101 to translate to Japanese. Models are configured in /available_models/conf.json
You can query the server to translate using this POST method
curl -i -X POST -H "Content-Type: application/json" \
-d '[{"src": "^ln f u n a k o s h i $", "id": 100}]' \
http://localhost:5000/translator/translate
which should return
[
[
{
"n_best": 5,
"pred_score": -0.23048973083496094,
"src": "^ln f u n a k o s h i $",
"tgt": "^ln 船 越 $"
}
],
[
{
"n_best": 5,
"pred_score": -1.6027336120605469,
"src": "^ln f u n a k o s h i $",
"tgt": "^ln 舩 越 $"
}
],
[
{
"n_best": 5,
"pred_score": -5.745663642883301,
"src": "^ln f u n a k o s h i $",
"tgt": "^ln 舟 越 $"
}
],
[
{
"n_best": 5,
"pred_score": -8.610189437866211,
"src": "^ln f u n a k o s h i $",
"tgt": "^ln 二 越 $"
}
],
[
{
"n_best": 5,
"pred_score": -8.685261726379395,
"src": "^ln f u n a k o s h i $",
"tgt": "^ln 布 越 $"
}
]
]