Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


This directory includes a sample application written to demonstrate the power of
python binding of maxent toolkit.

It implements a simple Maximum Entropy part of speech tagger. The
tagger is designed for English language.  However, it is easy to extend the
code to handle other languages (such as Chinese).

The tagger is documented in the manual of the toolkit. Please note that the
code here was written a long time ago, and is not actively maintained.
Therefore it may not work with the latest release of C++ core.

For inpatients:
The training data is plain text file with one sentence per line. The sentence
looks like:
They/PRP will/MD remain/VB on/IN a/DT lower-priority/JJ list/NN that/WDT includes/VBZ 17/CD other/JJ countries/NNS ./.

To train a tagging model called tagger.model with 100 iterations:

$ ./ tagger.model -f --iters 100

This will produces a file called "tagger.model".

To tag new sentences (in, one sentence per line) with a previously
trained model:

$ ./ -m tagger.model

The result will be printed to stdout.

A pre-trained model (trained on 00-18 section of WSJ corpus) named
"tagger.model" can be obtained from the homepage of the toolkit. 

You can’t perform that action at this time.