Semanticizest is a package for doing entity linking, also known as semantic linking or semanticizing: you feed it text, and it outputs links to pertinent Wikipedia concepts. You can use these links as a "semantic representation" of the text for NLP or machine learning, or just to provide some links to background info on the Wikipedia.
pip install -r requirements.txt
pip install .
To train a semanticizer, download a Wikipedia database dump from
https://dumps.wikimedia.org/. Then issue:
python -m semanticizest.parse_wikidump <dump> <model-filename>
The result will be a semanticizer model (in SQLite 3 format, if you must know).
Alternatively, you can use the --download flag to instruct semanticizest to download the LATEST wikipedia dump. For example, to download and process the Scottish Wikipedia (which is small and useful for testing):
python -m semanticizest.parse_wikidump --download scowiki sco.model
scowiki.xml.bz2 and construct the model from it.
Full documentation can be found at https://semanticize.github.io/semanticizest/
Copyright and license
Copyright 2014 University of Amsterdam/Netherlands eScience Center. The license for the semanticizest is Apache License, Version 2.0. See the file LICENSE for details.