Skip to content

Commit

Permalink
+ get_wikidata_humans_dump script
Browse files Browse the repository at this point in the history
  • Loading branch information
maxlath committed Jul 20, 2018
1 parent d7cb02c commit 2b5310c
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 1 deletion.
61 changes: 61 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Expand Up @@ -112,7 +112,8 @@
"shell-quote": "^1.4.3",
"should": "^11.1.1",
"sinon": "^1.17.1",
"supervisor": "^0.10.0"
"supervisor": "^0.10.0",
"wikidata-filter": "^2.3.1"
},
"engines": {
"node": ">= 6.4"
Expand Down
10 changes: 10 additions & 0 deletions scripts/dumps/get_wikidata_humans_dump
@@ -0,0 +1,10 @@
#!/usr/bin/env zsh
# Generate a pre-filtered dump of humans in Wikidata to ease development setup
# cf https://github.com/inventaire/inventaire-deploy/blob/d280055/install_entities_search_engine#L24-L28

curl -s https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz |
gzip -d |
wikidata-filter --claim P31:Q5 --omit type,sitelinks |
gzip -c9 > humans.ndjson.gz

chmod 664 humans.ndjson.gz

0 comments on commit 2b5310c

Please sign in to comment.