Package wikiassignment is a golang package that provides utility functions for automatically assigning wikipedia pages to topics.
API documentation can be found in the associated godoc reference.
Topics data can be found in overpedia.
This package can be installed with the go get command:
go get github.com/negapedia/wikiassignment/...
You will need a machine with internet connection, 16GB of RAM (for the english version) and docker storage base directory properly setted.
This package depends on PETSc
. The associated dockerfile provides a complete environment in which use this package. Otherwise PETSc
can be installed following the same steps as in the dockerfile or in the PETSc installation page.
lang
: wikipedia nationalization to parse or custom JSON, defaultit
.date
: wikipedia dump date in the format AAAAMMDD, defaultlatest
.
docker run negapedia/wikiassignment export -lang en -date 20060102
: basic usage, run the image on the english nationalization dump in date 2 January 2006 and store the result in the in-containter/data
folder, containing: ..1.semanticgraph.json
maps source page ID to the array of target page IDs. ..2.partition.json
maps typology of node (article,category or topic) to the array of page IDs belonging to it. ..3.absorptionprobabilities.csv
represents each page in a row with its ID and the weight assignment for each topic. ..4.pages.csv
represents pages in the form requested by wiki2overpediadb.docker run -v /path/2/out/dir:/data negapedia/wikiassignment -d export -lang en
: ..1. run the image as before. ..2. mount as a volume the guest/data
folder to the host folder/path/2/out/dir
, the output folder, so that at the end of the operations/path/2/out/dir
will contain the result. This folder can be changed to an arbitrary folder of your choice. ..3. run the image in detatched mode. For further explanations please refer to docker run reference.
docker pull negapedia/wikiassignment
Update the image to the last revision.docker kill --signal=SIGQUIT $(docker ps -ql)
Quit the last container and log trace dump.docker logs -f $(docker ps -ql)
Fetch the logs of the last container.docker system prune -fa --volumes
Remove all unused images and volume without asking for confirmation.