Skip to content

Latest commit

 

History

History
71 lines (53 loc) · 2.65 KB

marmot.md

File metadata and controls

71 lines (53 loc) · 2.65 KB

http://cistern.cis.lmu.de/marmot/marmot.png

Introduction

MarMoT is a generic Conditional Random Field (CRF) framework as well as a state-of-the-art morphological tagger.

Download

To get the latest binary release of MarMoT, please visit MarMoT's CIS home page. Petrained models can be found here.

Source Code

https://github.com/muelletm/cistern/tree/master/marmot

Quickstart

The most typical thing to do with MarMoT is to annotate words with their morphological properties. Given a file text.txt in a one-word-per-line format:

Murmeltiere
sind
im
Hochgebirge
zu
Hause
.

The following command:

java -cp marmot.jar marmot.morph.cmd.Annotator\
--model-file de.marmot\
--test-file form-index=0,text.txt\
--pred-file text.out.txt\

Will produce a file in (a truncated) CoNLL09 format:

0       Murmeltiere     _       _       _       NN      _       case=nom|number=pl|gender=masc
1       sind            _       _       _       VAFIN   _       number=pl|person=3|tense=pres|mood=ind
2       im              _       _       _       APPRART _       case=dat|number=sg|gender=neut
3       Hochgebirge     _       _       _       NN      _       case=dat|number=sg|gender=neut
4       zu              _       _       _       APPR    _       _
5       Hause           _       _       _       NN      _       case=dat|number=sg|gender=neut
6       .               _       _       _       $.      _       _

The actual tags will depend on the annotation of the treebank that was used to train the MarMoT model. The tags here are in the STTS format used by TIGER.

Further Reading

Projects that use MarMoT

References

If you use MarMoT in your research and would like to acknowledge it, please refer to the following paper.

@InProceedings{mueller2013,
author = {M\"uller, Thomas and Schmid, Helmut and Sch\"utze, Hinrich},
title = {Efficient Higher-Order CRFs for Morphological Tagging},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
year = {2013},
}