STATUS: WIP
Attributing scores to Wikidata items, making those available via a web API and dumps, under a CC0 license.
Motivation: when re-using Wikidata data, it can be useful to be able to sort a bunch of items by some kind of score [1], [2]. So instead of spamming query.wikidata.org with one SPARQL request per item, we pre-calculate a score for all items from a Wikidata Dump, and serve them in bulk.
There are already pre-existing works on a Wikidata Page Rank, but no API to cherry-pick items of interest, and the data isn't in CC0. Other motivations may include traces of just having fun with scoring algorithms.
GET /scores?ids=Q8027|Q1001|Q216092|Q79969
GET /scores?ids=Q8027|Q1001|Q216092|Q79969&subscores=true
coming soon
git clone https://github.com/maxlath/wikidata-rank
cd wikidata-rank
npm install
# Starts the server on port 7264 and watch for files changes to restart
npm run watch
At this point, your server is setup, but it has nothing to serve: we need to populate the database with items scores
item base score = number of labels
+ number of descriptions * 0.5
+ number of aliases * 0.25
+ number of statements * 2
+ number of qualifiers
+ number of references
+ number of sitelinks * 4
wget -c https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz
cat latest-all.json.gz | gzip -d | ./scripts/calculate_base_scores
item network score = sum of the base scores of items linking to the item
./scripts/calculate_network_scores
item secondary network score = sum of the network scores of items linking to the item
./scripts/calculate_secondary_network_scores
item total score = base score + network score * 0.25 + secondary network score * 0.1
./scripts/calculate_total_scores
You can alternatively calculate all those scores at once:
./scripts/calculate_all_scores dump.json
See the Hub
deploy doc, simply replacing hub
by wd-rank
, especially on step 4:
echo "module.exports = {
host: 'https://tools.wmflabs.org',
// Customize root to match the URL passed by Nginx
root: '/wd-rank'
}" > config/local.js
We can't access wikidata entities dump at /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/wikidatawiki/entities/latest-all.json.gz
from the NodeJS webservice (see Phabricator ticket T193646), so a work-around is to install our own NodeJS using NVM:
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash
. $HOME/.nvm/nvm.sh
# Use the same version as `webservice --backend=kubernetes nodejs shell`
nvm install 6.11.0
npm
operations still need to be done from the webservice I can't find a way to make the environment take that new node binary into account rather that /usr/bin/node
webservice --backend=kubernetes nodejs shell
cd ~/www/js
npm install
exit
# Force the use of our custom node binary
sed -i 's@node "./scripts@~/.nvm/versions/node/v6.11.0/bin/node "./scripts@' ./scripts/calculate_all_scores
./scripts/calculate_all_scores
cd
git clone https://github.com/AvianFlu/aeternum.git
cd aeternum
make
cd ~/www/js
~/aeternum/aeternum -o ./calculate_all_scores.log -e ./calculate_all_scores.err -- ./scripts/calculate_all_scores
# Follow the logs
tail -f ./calculate_all_scores*