In [1]:
from WikidataTreeBuilderSPARQL import WikidataTreeQuery
import pandas as pd
import simplejson as json
from datetime import datetime
import re

In [2]:
print("Program starts at "+str(datetime.now()))

Program starts at 2017-07-27 20:24:41.651602


We start by looking up all descendants of a given node, and a list of "lookup claims" which are the properties that are relevant, as specified by us:

In [3]:
tree = WikidataTreeQuery(lookupClaims=["P571", "P275", "P101", "P135", "P348", "P306", "P1482", "P277", "P577", "P366", "P178", "P31", "P279", "P2572", "P3966", "P144","P170","P1324"])

In this particular example, we have set up the "lookup claims," which are the properties of interest to us for each entry. A full list of parameters for the initialisation function could be found in the docstring of the \_\_init\_\_.

After initializing the class, we are going to extract all descendants of the node of interest. In this example, it's "software" (Wikidata entry Q7397)

In [4]:
print("Tree building starts at "+str(datetime.now()))
flare = tree.fromRoot("Q7397", forbidden=["Q7889", "Q28923"])
print("Tree building ends at "+str(datetime.now()))

Tree building starts at 2017-07-27 20:24:48.086157
Tree building ends at 2017-07-27 20:27:37.281942


After the tree is built "from root" Q7397, we can save it for vizualisation with d3js :

In [5]:
with open("flareSoftware.json","wb+") as f:
    json.dump(flare,f,indent=4)

The result is here : http://webservices.uchange.co/api-viz/flared3-wikidata-software.html

Even if the hierarchical structure is visible, the result is not very user-friendly. We can convert it to human-readable information with the addLabels method :

In [6]:
flare = tree.addLabels(flare)

In [7]:
with open("flareSoftwareNamed.json","wb+") as f:
    json.dump(flare,f,indent=4)

And ta-daa : http://webservices.uchange.co/api-viz/flared3-wikidata-software-named.html

The tree is useful for visualisation, but for augmentaiton of text data, it is more useful to have the data in a table. The getPrettyDF function returns a dataframe 

In [8]:
df = tree.getPrettyDF()

In [9]:
df

Unnamed: 0,altLabel_en,altLabel_fr,description_en,description_fr,entity,label_en,label_fr,P101_field_of_work,P1324_source_code_repository,P135_movement,...,P277_programming_language,P279_subclass_of,P306_operating_system,P31_instance_of,P348_software_version,P366_use,P3966_programming_paradigm,P571_inception,P577_publication_date,visitedNodes
0,gnucash,gnucash,accounting software,logiciel de comptabilité personnelle,Q123326,GnuCash,GnuCash,,https://github.com/Gnucash/gnucash,free software movement,...,"(Scheme, C++, C, Java)",,"(Microsoft Windows, GNU, Android, GNU/Linux, F...",personal accounting software,2.6.112016,,,,,"((software, accounting software, personal acco..."
9,,,software suite,logiciel informatique,Q127141,Microsoft Bob,Microsoft Bob,,,,...,,,Windows 3.1x,"(software suite, graphical user interface)",1.0a c Plus Pack,,,,,"((software, software suite), (software, system..."
15,MPI,,message-passing system for parallel computers,protocole réseau,Q127879,Message Passing Interface,Message Passing Interface,,,,...,,,,"(library, communications protocol)",,,,,,"((software, system software, utility software,..."
25,Frostbite Engine,,game engine,logiciel informatique,Q124514,Frostbite,Frostbite Engine,,,,...,C++,,Microsoft Windows,game engine,,,,,,"((software, system software, middleware, game ..."
29,,,derivative of the Ubuntu operating system,système d’exploitation,Q72688,Xubuntu,Xubuntu,,,,...,,,,"(Linux distribution, operating system)",17.04,,,,2016-04-21T00:00:00Z,"((software, system software, operating system,..."
30,,,,logiciel informatique,Q72914,xmonad,xmonad,,,,...,Haskell,,,Tiling window manager,,,,,,"((software, system software, window manager, T..."
31,,,,plateforme en ligne d'intégration continue,Q73134,Travis CI,Travis CI,,https://github.com/travis-ci/travis-ci,,...,Ruby,,,"(website, continuous integration software)",,,,,,"((software, continuous integration software),)"
34,,,,logiciel informatique,Q74452,MODX,MODx,,,,...,PHP,,,content management system,2.4.4,,,,,"((software, content management system), (softw..."
35,,,,programme d'échecs,Q147428,Zappa,Zappa,,,,...,,,,chess engine,,,,,,"((software, program, computer program, chess e..."
36,,intergiciel,computer software that provides services to so...,logiciel tiers facilitant l'interaction entre ...,Q146768,middleware,middleware,,,,...,,system software,,type of software,,,,,,"((software, system software),)"


In [10]:
df.to_excel("tableSoftware.xlsx")

In [11]:
print("Program end at "+str(datetime.now()))

Program end at 2017-07-27 20:50:04.150030


And voilà! We have the table of all descendants of the node 'software' with all the properties we think relevant!