MediaWiki dump parser in Go
Go
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
dumpparser
hash
nlp
semanticizest
storage
.gitignore
.travis.yml
LICENSE
README.rst

README.rst

Travis

Wikipedia dump parser for semanticizest

This program parses Wikipedia database dumps for consumption by semanticizest.

Installing

Make sure you have a Go compiler (1.2 or newer) and Git. On Debian/Ubuntu/Mint, that's:

sudo apt-get install git golang-go

On CentOS:

sudo yum -y install git golang

Set up a Go workspace, if you haven't already. For example:

mkdir /some/where/go
cd /some/where/go
export GOPATH=$(pwd)

Fetch and compile:

go get github.com/semanticize/st
go install github.com/semanticize/st/dumpparser
go install github.com/semanticize/st/semanticizest

You now have a working parser at ${GOPATH}/bin/dumpparser. Issue:

${GOPATH}/bin/dumpparser --help

to figure out how to generate a semanticizer model, then use this model from the REST API:

${GOPATH}/bin/semanticizest --http=:5002 your_model
curl http://localhost:5002/all -d 'Does the entity linking work?'