Skip to content

Step by Step, TreeTagger Installation

rzanoli edited this page Feb 24, 2015 · 36 revisions

TreeTagger is a tool for annotating text with part-of-speech and lemma information. It is essential for other linguistic processing pipelines (e.g., MaltParser), and also needed for running some EDAs' configuration. The EOP cannot ship this tool given that it has its own license, which is not compatible with the EOP one.

TreeTagger can be installed automatically during the EOP installation. The install.sh script allows you to do that. However in case you would like to use the EOP as a library from you own code you have to install TreeTagger manually as reported below.

If you have instead decided to install TreeTagger manually, the first thing is reading the license agreement and agree with it: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/Tagger-Licence Actual installation is almost automated with a script. (The script will force you to read the license agreement, and won’t process unless you agree with it). Installing TreeTagger requires these 3 steps:

  1. Installing ant
  2. Downloading the build.xml file
  3. Installing TreeTagger

Installing ant

Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications. We use Ant to install the TreeTagger whereas Ant 1.8.x or later is required.

There are two ways of installing Ant in Ubuntu:

Please note that Ant 1.9.4 version has a bug in dealing with FTP URLs, and can't handle the URLs that are required for TreeTagger building. Only 1.9.4 has this bug --- if your Ant version is 1.9.4, please download other version from the Ant homepage. (e.g. 1.9.3 or 1.9.5)

Downloading the build.xml file

Download the [[build.xml | http://hlt-services4.fbk.eu:8080/artifactory/repo/eu/excitementproject/eop-resources/EOP{version}/resources/build.xml]] build.xml file. It is a script provided by DKPro. (Thanks! DKPro developers.) and it will be used in the next step to install TreeTagger.

Installing TreeTagger

Using EOP via api, via cli or via Eclipse involves different procedures to complete the TreeTagger installation as reported below.

via API: Adding TreeTagger into your own project

  • Move into the home directory of your project (e.g. myProject) e.g.

    > cd ~/programs/myProject/
    
  • Create the new directory treetagger, i.e.

    > mkdir -p src/scripts/treetagger/
    
  • Go into the new directory, i.e.

    > cd src/scripts/treetagger/
    
  • Copy the build.xml file into the current directory, e.g.

    > cp /home/user_name/Downloads/build.xml ./
    
  • Run the installation script by calling ANT build tool, i.e.

    > ant local-maven 
    

    This command will download and wrap the binary and models as Maven modules, and install it on your local Maven repository (i.e. ~/.m2/). TreeTagger Installation will take sometime (about 1 minute). If it works successfully, it will output “BUILD SUCCESSFUL”.

  • Add the following TreeTagger dependencies into the pom.xml file of your project:

    <!-- TreeTagger related dependencies -->
          <dependency>
                  <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                  <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-bin</artifactId>
                  <version>20130228.0</version>
          </dependency>
          <dependency>
                  <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                  <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-de</artifactId>
                  <version>20121207.0</version>
          </dependency>
          <dependency>
                  <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                  <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-en</artifactId>
                  <version>20111109.0</version>
          </dependency>
          <dependency>
                  <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                  <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-it</artifactId>
                  <version>20101115.0</version>
          </dependency>
    <!-- end of TreeTagger related dependencies -->
    

    where 20130228.0 is the version of the artifact treetagger-bin that has been installed. To know the version of the software that has been installed on your machine you have to take a look at the artifact installed in your maven local repository in the .m2 directory, i.e.

    > ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-bin/
    

20130228.0

**20130228.0** is really the version of the artifact that has been installed and this is the version that has to be reported as part of the dependency information. **20101115.0** is instead the version of the Italian model **treetagger-model-it**. To see that:

ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-model-it/ 20101115.0

The same check should be done for the other two models: **treetagger-model-de**, **treetagger-model-en**.


#### via CLI/Eclipse: Enabling TreeTagger in the EOP project

* Move into the TreeTagger home directory in EOP, e.g.

cd ~/Excitement-Open-Platform-{version}/lap/src/scripts/treetagger/


* Copy the build.xml file into the current directory, e.g.

cp /home/user_name/Downloads/build.xml ./


* Run the installation script by calling ANT build tool, i.e. 

ant local-maven

This command will download and wrap the binary and models as Maven modules, and install it on your local Maven repository (i.e. ~/.m2/). TreeTagger Installation will take sometime (about 1 minute). If it works successfully, it will output “BUILD SUCCESSFUL”. 

* Open with your favourite text editor (e.g. emacs) the pom.xml file that is in the directory ~/Excitement-Open-Platform-{_version_}/lap/ and uncomment the TreeTagger dependencies; in the file they are preceded by: <!-- TreeTagger related dependencies --> 

* Check if the TreeTagger dependency versions (e.g. **20130228.0**) reported in the pom file really correspond to the artifacts installed in your local maven repository in the .m2 directory. To do that we can follow the procedure that has also been reported when using EOP via API, e.g.

ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-bin/ 20130228.0


**20101115.0** is instead the version of the Italian model **treetagger-model-it**. To see that:

ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-model-it/ 20101115.0


The same check should be done for the other two models: **treetagger-model-de**, **treetagger-model-en**.

* Updating the project

* via CLI  

cd ~/Excitement-Open-Platform-{version} mvn package assembly:assembly

* via Eclipse  
  eop --> Maven --> Update Project

#### If an MD5 checksum error stops building of Ant Script 

TreeTagger binaries and models are often updated to a newer version, and stored in the same URL. Thus, MD5 checkshum fails on such case. Current ANT script is updated in Novermber, 2014. 

In such a case, do one of the followings: 

1. update MD5 sum in the ant script: check the url is correct, and if you trust the updated binary, just update MD5 sum as the actual file in the website shows. 

2. notify the responsible person and get an updated version: contact noh@cl.uni-heidelberg.de (Tae-Gil Noh) for an updated ant script. 
Clone this wiki locally