In [1]:
from WikidataTreeBuilderSPARQL import WikidataTreeQuery
import pandas as pd
import simplejson as json
from datetime import datetime
import re

In [2]:
print("Program starts at "+str(datetime.now()))

Program starts at 2017-07-28 09:58:19.297449


We start by looking up all descendants of a given node, and a list of "lookup claims" which are the properties that are relevant, as specified by us:

In [3]:
tree = WikidataTreeQuery(lookup_claims=["P571", "P275", "P101", "P135", "P348", "P306", "P1482", "P277", "P577", "P366", "P178", "P31", "P279", "P2572", "P3966", "P144","P170","P1324"])

In this particular example, we have set up the "lookup claims," which are the properties of interest to us for each entry. A full list of parameters for the initialisation function could be found in the docstring of the \_\_init\_\_.

After initializing the class, we are going to extract all descendants of the node of interest. In this example, it's "Computer Science" (Wikidata entry Q21198)

In [4]:
print("Tree building starts at "+str(datetime.now()))
flare = tree.from_root("Q21198")
print("Tree building ends at "+str(datetime.now()))

Tree building starts at 2017-07-28 09:58:26.265675
Tree building ends at 2017-07-28 09:58:27.495631


After the tree is built "from root" Q21198, we can save it for vizualisation with d3js :

In [5]:
with open("flareCS.json", "wb+") as f:
    json.dump(flare, f, indent=4)

The result is here : http://webservices.uchange.co/api-viz/flared3-wikidata-cs.html
(click on the circles to open subnodes and on the labels to open the Wikidata page)

Even if the hierarchical structure is visible, the result is not very user-friendly. We can convert it to human-readable information with the addLabels method :

In [6]:
flare = tree.add_labels(flare)

In [7]:
with open("flareCSNamed.json", "wb+") as f:
    json.dump(flare, f, indent=4)

And ta-daa : http://webservices.uchange.co/api-viz/flared3-wikidata-cs-named.html

The tree is useful for visualisation, but for augmentaiton of text data, it is more useful to have the data in a table. The getPrettyDF function returns a dataframe 

In [8]:
df = tree.get_pretty_DF()

In [9]:
df

Unnamed: 0,altLabel_en,altLabel_fr,description_en,description_fr,entity,label_en,label_fr,P101_field_of_work,P1324_source_code_repository,P1482_Stack_Exchange_tag,...,P275_license,P277_programming_language,P279_subclass_of,P306_operating_system,P31_instance_of,P348_software_version,P366_use,P571_inception,P577_publication_date,visitedNodes
0,"(scikit, scikits.learn, sklearn)",scikit.learn,Machine learning library for the Python progra...,librairie Python d'apprentissage statistique,Q1026367,scikit-learn,scikit-learn,,https://github.com/scikit-learn/scikit-learn,,...,BSD license,"(Python, C++, C, Cython)",,,"(library, Python library, machine learning)",0.18.1,,,,"((computer science, machine learning), (comput..."
10,,,,notion informatique,Q844824,Physical address,Adresse physique,,,,...,,,,,computer science,,,,,"((computer science,),)"
26,,"(Algorithme de fourmis, ACO, Algorithmes de co...",,algorithmes inspirés du comportement des fourm...,Q460851,Ant colony optimization algorithms,Algorithme de colonies de fourmis,,,https://stackoverflow.com/tags/ant-colony,...,,,optimization algorithm,,Metaheuristic,,,,,"((computer science, artificial intelligence, H..."
41,"(science of computing, computing science, comp...","(science informatique, informatique théorique,...",study of the theoretical foundations of inform...,domaine d'activité scientifique,Q21198,computer science,informatique,,,https://stackoverflow.com/tags/computer-science,...,,,"(engineering disciplines, formal science)",,academic discipline,,,,,"((),)"
45,"(Alentejo, Ireland)",Bliss,default computer wallpaper of Windows XP,Le fond d'écran installé par défaut sur Window...,Q2368,Bliss,Colline verdoyante,,,,...,,,,,"(photograph, computer wallpaper)",,,1996-01-01T00:00:00Z,,"((computer science, computer graphics, compute..."
48,,,,module électronique programmable,Q23773905,,kidule,,,,...,,,,,robotics,,,,,"((computer science, mechatronics engineering, ..."
51,,,machine learning software library,librairie d'apprentissage automatique,Q21447895,TensorFlow,TensorFlow,,https://github.com/tensorflow/tensorflow,,...,Apache-2.0,"(Python, C++)",,,"(free software, library, software framework, m...",1.2.1,machine learning,,,"((computer science, machine learning), (comput..."
57,AI,,branch of computer science that develops machi...,recherche de moyens susceptibles de doter les ...,Q11660,artificial intelligence,intelligence artificielle,,,,...,,,"(artificial entity, computer science, algorith...",,,,,,,"((computer science,),)"
69,IBM Watson,IBM Watson,artificial intelligence computer system made b...,programme informatique d'intelligence artifici...,Q12253,Watson,Watson,,,,...,,,computer,,"(supercomputer, artificial intelligence, one-o...",,,,,"((computer science, artificial intelligence),)"
100,ML,"(machine learning, apprentissage statistique)",construction and study of systems that can lea...,un des champs d'étude de l'intelligence artifi...,Q2539,machine learning,apprentissage automatique,,,,...,,,"(computer science, artificial intelligence)",,,,,,,"((computer science,), (computer science, artif..."


In [10]:
df.to_excel("tableComputerScience.xlsx")

In [11]:
print("Program end at "+str(datetime.now()))

Program end at 2017-07-28 09:58:57.781822


And voilà! We have the table of all descendants of the node 'software' with all the properties we think relevant!