morpheval
Evaluation of the morphological quality of machine translation outputs. The automatically generated test suite in English should be translated into Czech or Latvian. The output is then analyzed and provides three types of information:
- Adequacy: has the morphological information been well conveyed from the source?
- Fluency: do we have local agreement?
- Consistency: how well is the system confident in its prediction?
More details here.
Requirements
- Python 3
- Download the test suite (version 1) and put it in the main directory.
- Morphodita version 1.3 for Czech, as well as the tagging model and dictionary
- LU MII Tagger for Latvian
- Moses tokenizer
- Download our Latvian dictionary and put it in utils/
How To
Use your MT system to translate the source file morph_test_suite_limsi.en
(untokenized).
The next steps assume that the outputs are tokenized with Moses tokenizer (use English tokenization for Latvian).
Czech
-
Adequacy and fluency:
- Run analysis:
cat morph_test_suite_limsi.translated.cs | sed 's/$/\n/' | tr ' ' '\n' | morphodita/src/run_morpho_analyze czech-morfflex-131112-raw_lemmas.dict 1 --input=vertical --output=vertical > morph_test_suite_limsi.translated.cs.ambig
- Run evaluation:
python3 evaluate_morph_pairs_cs.py -i morph_test_suite_limsi.translated.cs.ambig -n morph_test_suite_limsi.en.info
- Run analysis:
-
Consistency:
- Run tagging:
cat morph_test_suite_limsi.translated.cs | sed 's/$/\n/' | tr ' ' '\n' | morphodita/src/run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy --input=vertical --output=vertical > morph_test_suite_limsi.translated.cs.disambig
- Run evaluation:
python3 evaluate_consistency_cs.py -i morph_test_suite_limsi.translated.cs.disambig -n morph_test_suite_limsi.en.info
- Run tagging:
Latvian
-
Set the path to LU MII Tagger in
tags_lv.sh
and run tagging (outputsmorph_test_suite_limsi.translated.lv.tag
):
./tags_lv.sh morph_test_suite_limsi.translated.lv
-
Adequacy and fluency:
- Generate ambiguities (analysis) with the dictionary:
python3 make_ambig_lv.py -w morph_test_suite_limsi.translated.lv -t morph_test_suite_limsi.translated.lv.tag > morph_test_suite_limsi.translated.lv.tag.ambig
- Run evaluation:
python3 evaluate_morph_pairs_lv.py -i morph_test_suite_limsi.translated.lv.tag.ambig -n morph_test_suite_limsi.en.info
- Generate ambiguities (analysis) with the dictionary:
-
Consistency:
- Format output:
python3 make_disambig_vert.py -w morph_test_suite_limsi.translated.lv -t morph_test_suite_limsi.translated.lv.tag > morph_test_suite_limsi.translated.lv.tag.disambig
- Run evaluation:
python3 evaluate_consistency_lv.py -i morph_test_suite_limsi.translated.lv.tag.disambig -n morph_test_suite_limsi.en.info
- Format output:
Publication
Franck Burlot and François Yvon, Evaluating the morphological competence of machine translation systems. In Proceedings of the Second Conference on Machine Translation (WMT’17). Association for Computational Linguistics, Copenhagen, Denmark, 2017.