Text Annotation Service
Luis Rei email@example.com @lmrei
Annotator for text documents.
A tornado-powered web service with a zmq powered backend.
For the Licecnse see LICENSE.md (MIT). External tools are under their own licenses (obviously) and are not distributed here.
Project, Evaluation and Corpora
Twitter Annotator was built for the xLiMe project.
The sentiment classifier built and trained for SYMPHONY.
For Project evaluation metrics see the EVALUATION.md file.
List of Corpora:
* The [xLiMe Twitter Corpus](https://github.com/lrei/xlime_twitter_corpus). * [Ritter Twitter Corpus](https://github.com/aritter/twitter_nlp) * [SemEval 2014 Task 9](http://alt.qcri.org/semeval2014/task9/) * [TASS 2015](http://www.sngularmeaning.team/TASS2015/tass2015.php)
git clone firstname.lastname@example.org:lrei/twitter_annotator.git cd twitter_annotator pip install -r requirements.txt chmod +x annotator.py chmod +x annotatorservice.py chmod +x sgd.py
NOTE: you need to setup the models in the same directory (or change settings).
python -c "import nltk; nltk.download('stopwords')"
NER requires Java 8 and Stanford NER as well as models for each language. See the tree section below. Unzip models and tool to their respective directories.
POS requires Java 8 and Stanford POS Tagger as well as models for each language. See the tree section below.
python -c "import nltk; nltk.download('universal_tagset')"
Unzip models and tool to their respective directories.
The models for sentiment are available for download here
tar -jxf senti_model.tar.bz2
Twitter Annotator (main service)
Running the Service
``` ./annotator.py --help ```
To see the options. E.g
``` ./annotator.py --port 1984 --n_jobs 10 ```
``` kill -s INT <pid> ```
This is a very basic client that can serve as an example of how to write an annotator client or it can be used to test if it's working.
Pass the port number where the annotator service is running
``` ./test_client.py [PORT] ```
Press CTRL-C to quit. the test client
Text Classifier (sgd.py)
Running as a pipe:
``` chmod +x sgd.py cat test.txt | ./sgd.py --model models/model_file --preprocess > result.txt ```
Where test.txt is line-delimited text
Running as a zmq service
``` clf = sgd.load('model_file') sgd.classify(clf, text, preprocess=True) ```
To train and Test, files should be headerless TSV files with
col = tokenized text col = class value