Creates a list of Page titles and their corresponding Wikidata items.
Download the json dump Wikidata database downloads and extract it somewhere on your file system and extract it. The download is approx 4G (Dez 2015) compressed json data which extracts to approx 60G.
mvn clean install -DskipTests
mvn test
mvn exec:java -Dexec.args="--help"
help displays the available options to pass instead of "--help":
Options:
-a, --aliases
Default: false
--help
Default: false
-i, --input
path to the file with the wikidata json dump
-l, --language
language to extract
Default: en
-o, --output
path to output file
The process seems to be disk bound. With a standard hdd (non ssd) it took about 6 minutes to generate the list. Probably the time to read the file once.