Semanticizest: dump parser and client
Go Python Makefile
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cmd
hash
internal
linking
nlp
python
wikidump
.gitignore
.travis.yml
LICENSE
README.rst
setup.py

README.rst

Travis

Semanticizer, standalone

Semanticizest is a package for doing entity linking, also known as semantic linking or semanticizing: you feed it text, and it outputs links to pertinent Wikipedia concepts. You can use these links as a "semantic representation" of the text for NLP or machine learning, or just to provide some links to background info on the Wikipedia.

Installing

Make sure you have a Go compiler (1.4 or newer) and Git. On new versions of Debian/Ubuntu/Mint, that's:

sudo apt-get install git golang-go

On CentOS:

sudo yum -y install git golang

In case your distro has older versions of go, you might consider installing a newer version via gvm.

Set up a Go workspace, if you haven't already. For example:

mkdir /some/where/go
cd /some/where/go
export GOPATH=$(pwd)

Fetch and compile:

go get -u github.com/semanticize/st/...

You now have a working parser at ${GOPATH}/bin/semanticizest-dumpparser. Issue:

${GOPATH}/bin/semanticizest-dumpparser --help

to figure out how to generate a semanticizer model, then use this model from the REST API:

${GOPATH}/bin/semanticizest --http=:5002 your_model
curl http://localhost:5002/all -d 'Does the entity linking work?'

You can also use semanticizest as a command-line tool by omitting --http. In that case, it will read paragraphs (double newline-separated) from standard input and emit a JSON representation of the candidate entities in each paragraph.

Python binding

Once you have built a semanticizer model, you can use semanticizest from Python using the Python wrapper. Install the Python wrapper with:

python setup.py install

You can use the python wrapper as follows:

The wrapper can operate in two modes: either you start the semanticizest REST server, or you have the wrapper start it for you.

If you already have a server running on port 5002, as described above:

from semanticizest import Client

sentence = 'Antwerpen'
url = 'http://localhost:5002/'
client = Client(url)
candidates = client.all_candidates(sentence)

To start the server automatically, you need to construct a Semanticizer from the path to the model file:

from semanticizest import Semanticizer

sentence = 'Antwerpen'
semanticizer = Semanticizer('/path/to/model.db')
candidates = client.all_candidates(sentence)