Skip to content

Setting up Extraction libraries

Nithin Krishna edited this page Aug 10, 2016 · 1 revision

#Apache tika

  1. Download and build tika locally.
  2. Run tika's HTTP server on port 9998
java -jar tika-server/target/tika-server-1.13-SNAPSHOT.jar 9998

#Stanford Core NLP NER

  1. Needs to be downloaded from here.
  2. Environment variable STANFORD_MODELS needs to be set, should point to the model files downloaded.

#Grobid Quantities

  1. Needs to be downloaded setup and trained as described here.
  2. Start the service at 8080. mvn -Dmaven.test.skip=true jetty:run-war

#Python dependencies

  1. Install python dependencies listed in the dependencies.txt file.
  2. Install nltk data.