This module implements a wrapper around TreeTagger that allows to work with KAFor NAF as input/output files.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

#Treetagger wrapper#

This module implements a wrapper around TreeTagger that allows to work with KAF or NAF as input/output files. The following languages are allowed by the wrapper: English, Dutch, German, Spanish, Italian and French, although is very easy to add new languages.


There are two only dependencies for this wrapper:

  1. KafNafParserPy library which allows to parse and modify KAF or NAF files
  2. TreeTagger itself, which needs to be installed and available on your machine.

There is one script that will perform the whole installation for you, the script This script will install first the KafNafParserPy library and then the TreeTagger and all the models.

QUICK INSTALLATION Basically these are the only steps you need to run from the command line to get this treetagger-wrapper installed

cd your_local_path
git clone
cd treetagger_kaf_naf

This 3 steps will clone this repository and install all the required dependencies. In case of an error you can try to inspect the installation script and run the commands one by one.

###If you have already TreeTagger installed###

In this case you just need to run the part of the installation script that clones the KafNafParserPy and then specify where your TreeTagger is installed. You can do this using two different ways:

  1. Edit the file lib/ and set the variable TREE_TAGGER_PATH to point to the root path of your installation of treetagger
  2. Set the environment variable TREE_TAGGER_PATH pointing again to the local path of treetagger


The requirement as input is a valid KAF/NAF file which has been processed by one tokeniser and it contains a correct text layer. Once installed you can try one of the example files on the examples subfolder, by running:

$ cat examples/input.en.kaf | python > my_output.en.kaf

This will process the file examples/input.en.kaf and the result will be storef in the file my_output.en.kaf, which should be the same (with exception of the time stamps) than the file examples/output.en.kaf. You will find example files for the rest of languages in the same examples folder.



Sofware distributed under GPL.v3, see LICENSE file for details.