Web-based synthesis of nifty NLP and entity extraction services
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
docs
parserbot
test
.gitignore
LICENSE
Procfile
README.md
config.py
cordova.py
gardner.py
harvard.py
ica.py
key.py
list.py
mfa.py
peabody.py
requirements.txt
rose.py
run.py
setup.py
shell.py

README.md

Parserbot

Parserbot is your one-stop shop for natural language parsing, tagging, and entity extraction. It wraps a variety of services and APIs into one app for easy parsing and cross-reference. Currently:

Built for the Artbot project on Flask.

Setup

Tested with Python 2.7.x. Setup within a virtualenv is recommended, or even better, a virtualenvwrapper. After cloning the repo and activating the virtualenv:

  • pip install .
  • run python key.py. It will spit out a secret key and auth header token; save these in environment variables (e.g. a shell profile, .env file, etc.). This is a convenience function that you can run as many times as you like.
  • python run.py to start the server
  • navigate to (http://localhost:3000) and you should see a welcome message

NOTE: All services require a PARSERBOT_SECRET_KEY environment variable.

Setting up specific NLP services:

Stanford NER -- /stanford
  • you must have Java of some flavor installed
  • pip install nltk==3.0.1
OpenCalais -- /opencalais
Zemanta -- /zemanta
Freebase
  • not currently configured. If you set it up, let us know!

Use

Python example:

headers = {'Authorization': '<YOUR_TOKEN_HERE>', 'Content-Type': 'application/json'}
data = json.dumps({'payload': 'This is a test for a man named Pablo Picasso'})
r = requests.post('http://localhost:3000/stanford', data=data, headers=headers)

If you want to play in the shell, you can use python shell.py

Tests

Tests are built for local setup only for now:

  • pip install pytest pytest-flask
  • python setup.py test

Documentation

You can find the docs in the docs subfolder. To generate new docs:

  • pip install sphinx
  • sphinx-build docs/source docs

Deployment

Currently set up to deploy on Heroku; configure the environment variables you need and it should be good to go. Heroku may complain about setting a JAVAHOME variable on the /stanford endpoint as well. A sample config:

DEBUG="False"
PARSERBOT_SECRET_KEY="<KEY_HERE>"
CALAIS_API_KEY="<KEY_HERE>"
ZEMANTA_API_KEY="<KEY_HERE>"
JAVAHOME="/usr/bin/java"

Future

Parsers to add someday:

License

Copyright (C) 2015 Massachusetts Institute of Technology

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation (http://opensource.org/licenses/GPL-2.0).