This module implements a wrapper around TreeTagger that allows to work with KAF or NAF as input/output files. The following languages are allowed by the wrapper: English, Dutch, German, Spanish, Italian and French, although is very easy to add new languages.
There are two only dependencies for this wrapper:
- KafNafParserPy library which allows to parse and modify KAF or NAF files
- TreeTagger itself, which needs to be installed and available on your machine.
There is one script that will perform the whole installation for you, the script
install_dependencies.sh. This script will install
first the KafNafParserPy library and then the TreeTagger and all the models.
QUICK INSTALLATION Basically these are the only steps you need to run from the command line to get this treetagger-wrapper installed
cd your_local_path git clone https://github.com/rubenIzquierdo/treetagger_kaf_naf cd treetagger_kaf_naf bash install_dependencies.sh
This 3 steps will clone this repository and install all the required dependencies. In case of an error you can try to inspect the installation script and run the commands one by one.
###If you have already TreeTagger installed###
In this case you just need to run the part of the installation script that clones the KafNafParserPy and then specify where your TreeTagger is installed. You can do this using two different ways:
- Edit the file
lib/__init__.pyand set the variable
TREE_TAGGER_PATHto point to the root path of your installation of treetagger
- Set the environment variable
TREE_TAGGER_PATHpointing again to the local path of treetagger
The requirement as input is a valid KAF/NAF file which has been processed by one tokeniser and it contains a correct text layer.
Once installed you can try one of the example files on the
examples subfolder, by running:
$ cat examples/input.en.kaf | python treetagger.py > my_output.en.kaf
This will process the file
examples/input.en.kaf and the result will be storef in the file
my_output.en.kaf, which should be the same
(with exception of the time stamps) than the file
examples/output.en.kaf. You will find example files for the rest of languages in the same
- Ruben Izquierdo Bevia
- Vrije University of Amsterdam
Sofware distributed under GPL.v3, see LICENSE file for details.