A sequential tagger built around the pycrfsuite wrapper for CRFSuite.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cfg
data
example
src
.gitignore
LICENSE
README.md
setup.py

README.md

CRFSuiteTagger

CRFSuiteTagger is a sequence tagger based on the pycrfsuite python wrapper for CRFSuite. It is built for chunking, NER, and other BIO (also referred to as IOB) based text annotation tasks.

Why would you need this?

CRFSuiteTagger has a wide selection of common features, and the capability to easily integrate additional ones. The features are controlled using a simple string-based feature template. Additional features can be easily added through new feature generating functions (see crfsuitetagger.ftex) passed on the CRFSuiteTagger constructor.

Installation

You should be able to install CRFSuiteTagger as any other Python package:

python setup.py install

Dependencies

You will need the following Python packages and one of my other libraries:

  • pycrfsuite - python wrapper for CRFSuite
  • numpy - you should it
  • bioeval - my library for evaluating BIO style annotation, which replaces the perl script from CoNLL-2000

TODO

  • command line interface
  • migrate data structure to pandas
  • more examples

See Also

If you are interested in other sequence taggers, you might want to look at: