akka-http server exposing a REST API for text annotation via the processors library
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
project
src
ui
.gitignore
.scalafmt.conf
.travis.yml
LICENSE
README.md
build.sbt
docker-compose.yml
production.conf
scalastyle-config.xml
version.sbt

README.md

Build Status

Current version: 3.2.1

processors-server

What is it?

An akka-http server exposing a REST API for text annotation via the processors library

Requirements

  1. Java 8
  2. sbt
  3. npm

How is this useful?

This might be useful to people wanting to do NLP in a non-JVM language without a good existing parser. Currently there are services for using processors' CluProcessor, FastNLPProcessor (a wrapper for CoreNLP) and BioNLPProcessor.

Running processors-server

git clone https://github.com/myedibleenso/processors-server.git

Fire up the server. This may take a minute or so to load the large model files.

cd processors-server
sbt "runMain NLPServer"

By default, the server will run on port 8888 and localhost, though you can start the server using a different port and host:

sbt "runMain NLPServer --host <your favorite host here> --port <your favorite port here>"

Building a docker container

sbt docker

This will create a container named myedibleenso/processors-server:latest, which you can run with docker-compose up using the included docker-compose.yml file.

You can find all of the official containers published on Docker Hub for this project in this repo.

Logging

A server log is written to processors-server.log in home directory of the user who launches the server.

Communicating with the server

_NOTE: Once the server has started, a summary of the services currently available (including links to demos) can be found at the following url: http://<your host name here>:<your port here>

Annotating text

The following services are available:

  1. Text annotation (open-domain or biomedical) involving:
  • sentence splitting
  • tokenization
  • lemmatization
  • PoS tagging
  • NER
  • dependency parsing
  1. Sentiment analysis
  2. Rule-based IE using Odin

Text can be annotated by sending a POST request containing json with a "text" field to one of the following annotate endpoints (see example).

You may also send text already segmented into sentences by posting a SegmentedMessage (see example) to the same annotate endpoint. This is just a json frame with a "sentences" field pointing to an array of strings.

CluProcessor

  • http://localhost:<your port here>/api/clu/annotate

FastNLPProcessor

  • http://localhost:<your port here>/api/annotate
  • http://localhost:<your port here>/api/fastnlp/annotate

BioNLPProcessor

The resources (model files) for this processor are loaded lazily when the first call is made.

Text can be annotated by sending a POST request containing json with a "text" field to the following endpoint (see example):

  • http://localhost:<your port here>/api/bionlp/annotate

Sentiment analysis with CoreNLP

You can also send text that has already been segmented into sentences:

Responses will be SentimentScores (see example)

Rule-based IE with Odin

For more info on Odin, see the manual

Responses

A POST to an /api/annotate endpoint will return a Document of the form specified in document.json.

An example using cURL

To see it in action, you can try to POST json using cuRL. The text to parse should be given as the value of the json's text field:

curl -H "Content-Type: application/json" -X POST -d '{"text": "My name is Inigo Montoya. You killed my father. Prepare to die."}' http://localhost:8888/api/annotate
{
  "text": "My name is Inigo Montoya. You killed my father. Prepare to die.",
  "sentences": [
    {
      "words": [
        "My",
        "name",
        "is",
        "Inigo",
        "Montoya",
        "."
      ],
      "startOffsets": [
        0,
        3,
        8,
        11,
        17,
        24
      ],
      "endOffsets": [
        2,
        7,
        10,
        16,
        24,
        25
      ],
      "lemmas": [
        "my",
        "name",
        "be",
        "Inigo",
        "Montoya",
        "."
      ],
      "tags": [
        "PRP$",
        "NN",
        "VBZ",
        "NNP",
        "NNP",
        "."
      ],
      "entities": [
        "O",
        "O",
        "O",
        "PERSON",
        "PERSON",
        "O"
      ],
      "dependencies": {
        "edges": [
          {
            "destination": 0,
            "source": 1,
            "relation": "poss"
          },
          {
            "destination": 1,
            "source": 4,
            "relation": "nsubj"
          },
          {
            "destination": 2,
            "source": 4,
            "relation": "cop"
          },
          {
            "destination": 3,
            "source": 4,
            "relation": "nn"
          },
          {
            "destination": 5,
            "source": 4,
            "relation": "punct"
          }
        ],
        "roots": [
          4
        ]
      }
    },
    {
      "words": [
        "You",
        "killed",
        "my",
        "father",
        "."
      ],
      "startOffsets": [
        26,
        30,
        37,
        40,
        46
      ],
      "endOffsets": [
        29,
        36,
        39,
        46,
        47
      ],
      "lemmas": [
        "you",
        "kill",
        "my",
        "father",
        "."
      ],
      "tags": [
        "PRP",
        "VBD",
        "PRP$",
        "NN",
        "."
      ],
      "entities": [
        "O",
        "O",
        "O",
        "O",
        "O"
      ],
      "dependencies": {
        "edges": [
          {
            "destination": 2,
            "source": 3,
            "relation": "poss"
          },
          {
            "destination": 3,
            "source": 1,
            "relation": "dobj"
          },
          {
            "destination": 4,
            "source": 1,
            "relation": "punct"
          },
          {
            "destination": 0,
            "source": 1,
            "relation": "nsubj"
          }
        ],
        "roots": [
          1
        ]
      }
    },
    {
      "words": [
        "Prepare",
        "to",
        "die",
        "."
      ],
      "startOffsets": [
        48,
        56,
        59,
        62
      ],
      "endOffsets": [
        55,
        58,
        62,
        63
      ],
      "lemmas": [
        "prepare",
        "to",
        "die",
        "."
      ],
      "tags": [
        "VB",
        "TO",
        "VB",
        "."
      ],
      "entities": [
        "O",
        "O",
        "O",
        "O"
      ],
      "dependencies": {
        "edges": [
          {
            "destination": 2,
            "source": 0,
            "relation": "xcomp"
          },
          {
            "destination": 3,
            "source": 0,
            "relation": "punct"
          },
          {
            "destination": 1,
            "source": 2,
            "relation": "aux"
          }
        ],
        "roots": [
          0
        ]
      }
    }
  ]
}

json schema for responses

Response schema can be found at src/main/resources/json/schema

Examples of each can be found at src/main/resources/json/examples

Other Stuff

Shutting down the server

You can shut down the server by posting anything to /shutdown

Checking the server's build

send a GET to /buildinfo

py-processors

If you're a Python user, you may be interested in using py-processors in your NLP project.

Where can I get the latest and greatest fat jar?

Cloning the project and running sbt jarify ensures the latest jar. Published jars are available at this URL: http://py-processors.parsertongue.com/v?.?.?/processors-server.jar (substitute your desired version for ?.?.?).