Twitter annotation for xLiMe project (public version)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Text Annotation Service

Luis Rei @lmrei


Annotator for text documents.

A tornado-powered web service with a zmq powered backend.



For the Licecnse see (MIT). External tools are under their own licenses (obviously) and are not distributed here.

Project, Evaluation and Corpora

Twitter Annotator was built for the xLiMe project.

The sentiment classifier built and trained for SYMPHONY.

For Project evaluation metrics see the file.

List of Corpora:

* The [xLiMe Twitter Corpus](
* [Ritter Twitter Corpus](
* [SemEval 2014 Task 9](
* [TASS 2015](


    git clone
    cd twitter_annotator
    pip install -r requirements.txt
    chmod +x
    chmod +x
    chmod +x

External Dependencies

NOTE: you need to setup the models in the same directory (or change settings).


python -c "import nltk;'stopwords')"


NER requires Java 8 and Stanford NER as well as models for each language. See the tree section below. Unzip models and tool to their respective directories.


POS requires Java 8 and Stanford POS Tagger as well as models for each language. See the tree section below.

python -c "import nltk;'universal_tagset')"

Unzip models and tool to their respective directories.


The models for sentiment are available for download here

To extract:

    tar -jxf senti_model.tar.bz2


Twitter Annotator (main service)

Running the Service


./ --help

To see the options. E.g

./ --port 1984 --n_jobs 10

To terminate:

kill -s INT <pid>

Test Client

This is a very basic client that can serve as an example of how to write an annotator client or it can be used to test if it's working.

Pass the port number where the annotator service is running

./ [PORT]

Press CTRL-C to quit. the test client

Text Classifier (

Running as a pipe:

chmod +x
cat test.txt | ./ --model models/model_file --preprocess > result.txt

Where test.txt is line-delimited text

Running as a zmq service

Using Library

clf = sgd.load('model_file')
sgd.classify(clf, text, preprocess=True)


To train and Test, files should be headerless TSV files with

col[0] = tokenized text
col[1] = class value