Wikidata Rank

STATUS: WIP

Attributing scores to Wikidata items, making those available via a web API and dumps, under a CC0 license.

Motivation: when re-using Wikidata data, it can be useful to be able to sort a bunch of items by some kind of score [1], [2]. So instead of spamming query.wikidata.org with one SPARQL request per item, we pre-calculate a score for all items from a Wikidata Dump, and serve them in bulk.

There are already pre-existing works on a Wikidata Page Rank, but no API to cherry-pick items of interest, and the data isn't in CC0. Other motivations may include traces of just having fun with scoring algorithms.

Web API

GET /scores?ids=Q8027|Q1001|Q216092|Q79969
GET /scores?ids=Q8027|Q1001|Q216092|Q79969&subscores=true

Dumps

coming soon

Development setup

Dependencies

NodeJS >v6.4.0 (recommanded way to install: NVM)

Install

git clone https://github.com/maxlath/wikidata-rank
cd wikidata-rank
npm install
# Starts the server on port 7264 and watch for files changes to restart
npm run watch

At this point, your server is setup, but it has nothing to serve: we need to populate the database with items scores

Calculate scores

Base scores

item base score = number of labels
+ number of descriptions * 0.5
+ number of aliases * 0.25
+ number of statements * 2
+ number of qualifiers
+ number of references
+ number of sitelinks * 4

wget -c https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz
cat latest-all.json.gz | gzip -d | ./scripts/calculate_base_scores

Network scores

item network score = sum of the base scores of items linking to the item

./scripts/calculate_network_scores

Secondary network scores

item secondary network score = sum of the network scores of items linking to the item

./scripts/calculate_secondary_network_scores

Total scores

item total score = base score + network score * 0.25 + secondary network score * 0.1

./scripts/calculate_total_scores

All scores

You can alternatively calculate all those scores at once:

./scripts/calculate_all_scores dump.json

Deploy to Toolforge

See the Hub deploy doc, simply replacing hub by wd-rank, especially on step 4:

echo "module.exports = {
  host: 'https://tools.wmflabs.org',
  // Customize root to match the URL passed by Nginx
  root: '/wd-rank'
}" > config/local.js

install NodeJS with NVM

We can't access wikidata entities dump at /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/wikidatawiki/entities/latest-all.json.gz from the NodeJS webservice (see Phabricator ticket T193646), so a work-around is to install our own NodeJS using NVM:

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash
. $HOME/.nvm/nvm.sh
# Use the same version as `webservice --backend=kubernetes nodejs shell`
nvm install 6.11.0

run with custom NodeJS

npm operations still need to be done from the webservice I can't find a way to make the environment take that new node binary into account rather that /usr/bin/node

webservice --backend=kubernetes nodejs shell
cd ~/www/js
npm install
exit

# Force the use of our custom node binary
sed -i 's@node "./scripts@~/.nvm/versions/node/v6.11.0/bin/node "./scripts@' ./scripts/calculate_all_scores
./scripts/calculate_all_scores

run as a daemon

cd
git clone https://github.com/AvianFlu/aeternum.git
cd aeternum
make
cd ~/www/js
~/aeternum/aeternum -o ./calculate_all_scores.log -e ./calculate_all_scores.err -- ./scripts/calculate_all_scores
# Follow the logs
tail -f ./calculate_all_scores*

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
assets		assets
config		config
lib		lib
scripts		scripts
server		server
test		test
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikidata Rank

Summary

Web API

Dumps

Development setup

Dependencies

Install

Calculate scores

Base scores

Network scores

Secondary network scores

Total scores

All scores

Deploy to Toolforge

install NodeJS with NVM

run with custom NodeJS

run as a daemon

About

Releases

Packages

Languages

maxlath/wikidata-rank

Folders and files

Latest commit

History

Repository files navigation

Wikidata Rank

Summary

Web API

Dumps

Development setup

Dependencies

Install

Calculate scores

Base scores

Network scores

Secondary network scores

Total scores

All scores

Deploy to Toolforge

install NodeJS with NVM

run with custom NodeJS

run as a daemon

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages