MarMoT is a generic Conditional Random Field (CRF) framework as well as a state-of-the-art morphological tagger.
To get the latest binary release of MarMoT, please visit MarMoT's CIS home page. Petrained models can be found here.
https://github.com/muelletm/cistern/tree/master/marmot
The most typical thing to do with MarMoT is to annotate words with their morphological properties. Given a file text.txt in a one-word-per-line format:
Murmeltiere sind im Hochgebirge zu Hause .
The following command:
java -cp marmot.jar marmot.morph.cmd.Annotator\ --model-file de.marmot\ --test-file form-index=0,text.txt\ --pred-file text.out.txt\
Will produce a file in (a truncated) CoNLL09 format:
0 Murmeltiere _ _ _ NN _ case=nom|number=pl|gender=masc 1 sind _ _ _ VAFIN _ number=pl|person=3|tense=pres|mood=ind 2 im _ _ _ APPRART _ case=dat|number=sg|gender=neut 3 Hochgebirge _ _ _ NN _ case=dat|number=sg|gender=neut 4 zu _ _ _ APPR _ _ 5 Hause _ _ _ NN _ case=dat|number=sg|gender=neut 6 . _ _ _ $. _ _
The actual tags will depend on the annotation of the treebank that was used to train the MarMoT model. The tags here are in the STTS format used by TIGER.
- Training new models
- Integrating the output of an Morphological Analyzer
- Predictions for the SPMRL data sets
If you use MarMoT in your research and would like to acknowledge it, please refer to the following paper.
@InProceedings{mueller2013, author = {M\"uller, Thomas and Schmid, Helmut and Sch\"utze, Hinrich}, title = {Efficient Higher-Order CRFs for Morphological Tagging}, booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing}, year = {2013}, }