Standalone Semanticizer
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc
semanticizest
.gitignore
.travis.yml
LICENSE.txt
MANIFEST.in
README.rst
requirements.txt
setup.cfg
setup.py

README.rst

Semanticizer, standalone

Semanticizest is a package for doing entity linking, also known as semantic linking or semanticizing: you feed it text, and it outputs links to pertinent Wikipedia concepts. You can use these links as a "semantic representation" of the text for NLP or machine learning, or just to provide some links to background info on the Wikipedia.

Installation

  • pip install -r requirements.txt
  • pip install .

Usage

To train a semanticizer, download a Wikipedia database dump from https://dumps.wikimedia.org/. Then issue:

python -m semanticizest.parse_wikidump <dump> <model-filename>

The result will be a semanticizer model (in SQLite 3 format, if you must know).

Alternatively, you can use the --download flag to instruct semanticizest to download the LATEST wikipedia dump. For example, to download and process the Scottish Wikipedia (which is small and useful for testing):

python -m semanticizest.parse_wikidump --download scowiki sco.model

will download https://dumps.wikimedia.org/scowiki/latest/scowiki-latest-pages-articles.xml.bz2 to scowiki.xml.bz2 and construct the model from it.

Documentation

Full documentation can be found at https://semanticize.github.io/semanticizest/

Copyright and license

Copyright 2014 University of Amsterdam/Netherlands eScience Center. The license for the semanticizest is Apache License, Version 2.0. See the file LICENSE for details.