Skip to content

Creates a list of Page titles and their corresponding Wikidata Items

License

Notifications You must be signed in to change notification settings

gipplab/WikidataListGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikidata List Generator

DOI Build Status MavenCentral Coverage Status

Creates a list of Page titles and their corresponding Wikidata items.

Data download

Download the json dump Wikidata database downloads and extract it somewhere on your file system and extract it. The download is approx 4G (Dez 2015) compressed json data which extracts to approx 60G.

Run

mvn clean install -DskipTests
mvn test
mvn exec:java -Dexec.args="--help"

help displays the available options to pass instead of "--help":

  Options:
    -a, --aliases

       Default: false
    --help

       Default: false
    -i, --input
       path to the file with the wikidata json dump
    -l, --language
       language to extract
       Default: en
    -o, --output
       path to output file

Runtime

The process seems to be disk bound. With a standard hdd (non ssd) it took about 6 minutes to generate the list. Probably the time to read the file once.

See also

Wikibase data model