# MUHAI Kickoff - Knowledge Graphs tutorial

In this tutorial, we will learn the basics of interacting with knowledge graphs, i.e.:
 - modeling data
 - creating data with Python RDFlib
 - creating SPARQL services on top of KGs
 
 Cfr. slides [https://tinyurl.com/muhai2020](https://tinyurl.com/muhai2020)

***************

# Part 1 - Data Modeling


Choose a domain related to robots, e.g. ```navigation```, ```perception```, ```vision```, ```manipulation```…

1. what are the types and their hierarchy ( _classes_ )?
2. what are the relations and their types ( _domain, range_ )?
3. give a set of objects with their types and relations ( _instances_ )?

Draw the graph by hand  



**********

# Part 2 - RDFlib 

rdflib is a widely used Python library for RDF (all documentation can be found [here](https://rdflib.readthedocs.io/en/stable/index.html))

To install RDFlib, run `pip3 install rdflib` in your terminal

In [21]:
# Imports
# These are the main classes and types we will be using from rdflib
from rdflib import Graph, Literal, Namespace, RDF, URIRef
from rdflib.namespace import RDF, RDFS

#### Create an empty graph

In [22]:
#A Graph object is always required to load triples

g = Graph() # an empty graph

if len(g) == 0 :
    print("Graph is empty")
else :
    print ("Graph has %s statements." % len(g))

Graph is empty



#### Create RDF triples

Triples are added to the graph with the function Graph.add(), with a Python **tuple** (subject, predicate, object) given as parameter.

Notice the namespace convenience syntax!

In [23]:
# We will now create an entity and some characteristics

# create a namespace and a prefix, then bind it to the graph 
rec = Namespace("http://example.org/recipe/")
g.bind("rc", rec)


# let's create 2 new items:
# 1st item
flour = URIRef("http://example.org/recipe/flour") 

# 2nd item, defined in another way
bowl = rec.bowl # equivalent to : URIRef("http://example.org/recipe/bowl") 

# linking the two items (a bowl can contain flour )
g.add( (bowl, rec.contains, flour) )

# let's create triples with literal values, e.g. names 
g.add( (flour, RDFS.label, Literal("Plain Flour", lang="en")) )
g.add( (flour, RDFS.label, Literal("Farine de blé", lang="fr")) )  # same name in french
g.add( (bowl, RDFS.label, Literal("Round Bowl", lang="en")) )

# let's give the bowl a shape
g.add( (bowl, rec.size, Literal('Large', lang='en')) ) # a bowl that is large

# it works with numeric values too
g.add( (flour, rec.weight, Literal(250)) ) # for pancakes we need 250gr of flour

In [24]:
# show the graph (sorted alphabetically)

for (s,p,o) in sorted(g): 
    print(s,p,o)

http://example.org/recipe/bowl http://example.org/recipe/contains http://example.org/recipe/flour
http://example.org/recipe/bowl http://example.org/recipe/size Large
http://example.org/recipe/bowl http://www.w3.org/2000/01/rdf-schema#label Round Bowl
http://example.org/recipe/flour http://example.org/recipe/weight 250
http://example.org/recipe/flour http://www.w3.org/2000/01/rdf-schema#label Plain Flour
http://example.org/recipe/flour http://www.w3.org/2000/01/rdf-schema#label Farine de blé


#### Adding a bit of semantics

There is not much of semantics in the above graph. 

We can add semantics using special terms :  ```RDF.type```, ```RDFS:subClassOf``` and ```RDFS:domain```/```RDFS:range```. 

In [25]:
## let's give items a type (a class)
g.add( (flour, RDF.type, rec.Ingredient ) )
g.add( (bowl, RDF.type, rec.Container ))

# specify hierarchy for classes
g.add((rec.Ingredient, RDFS.subClassOf, rec.RecipeComponent))
g.add((rec.Container, RDFS.subClassOf, rec.RecipeComponent))

# domain and ranges for properties
g.add((rec.contains, RDFS.domain, rec.Container))
g.add((rec.contains, RDFS.range, rec.Ingredient))

g.add((rec.weight, RDFS.domain, rec.Ingredient)) # in our world, only ingredients have a weight (grams needed for the recipe)
g.add((rec.weight, RDFS.range, RDFS.Literal)) # RDFS.Literal = includes strings and numericals

g.add((rec.size, RDFS.domain, rec.Container)) # in our world, only containers have size (small/large etc)
g.add((rec.size, RDFS.range, RDFS.Literal)) 


for (s,p,o) in sorted(g): 
    print(s,p,o)


http://example.org/recipe/Container http://www.w3.org/2000/01/rdf-schema#subClassOf http://example.org/recipe/RecipeComponent
http://example.org/recipe/Ingredient http://www.w3.org/2000/01/rdf-schema#subClassOf http://example.org/recipe/RecipeComponent
http://example.org/recipe/bowl http://example.org/recipe/contains http://example.org/recipe/flour
http://example.org/recipe/bowl http://example.org/recipe/size Large
http://example.org/recipe/bowl http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/recipe/Container
http://example.org/recipe/bowl http://www.w3.org/2000/01/rdf-schema#label Round Bowl
http://example.org/recipe/contains http://www.w3.org/2000/01/rdf-schema#domain http://example.org/recipe/Container
http://example.org/recipe/contains http://www.w3.org/2000/01/rdf-schema#range http://example.org/recipe/Ingredient
http://example.org/recipe/flour http://example.org/recipe/weight 250
http://example.org/recipe/flour http://www.w3.org/1999/02/22-rdf-syntax-ns#type ht

#### Let's add more ingredients

In [26]:
### for a Dutch pancake, we need extra ingredients (~instances)

## to go faster, we can add the triples directly
g.add((rec.salt, RDF.type, rec.Ingredient))  
g.add((rec.vanilla_sugar , RDF.type , rec.Ingredient))
g.add((rec.cinnamon, RDF.type, rec.Ingredient)) 
g.add((rec.egg, RDF.type, rec.Ingredient))
g.add((rec.milk, RDF.type, rec.Ingredient))
g.add((rec.butter, RDF.type, rec.Ingredient))

## we also need additional containers (for milk, pouring etc...) 
g.add((rec.cup, RDF.type, rec.Container))
g.add((rec.teaspoon, RDF.type, rec.Container))
g.add((rec.cup, RDF.type, rec.Container))

## and some tools 
g.add((rec.whisker, RDF.type, rec.RecipeTool))
g.add((rec.fryingpan, RDF.type, rec.RecipeTool))
g.add((rec.spatula, RDF.type, rec.RecipeTool))

## since we do not have RecipeTool in our world, let's add it to the hierarchy
g.add((rec.RepiceTool, RDFS.subClassOf, rec.RecipeComponent))

In [27]:
# let's add some more characteristics

# how much of these ingredients we need?
g.add((rec.salt, rec.weight, Literal(2))) # two grams ~= a pinch
g.add((rec.egg, rec.weight, Literal(150))) # two eggs ~= 150 grams
g.add((rec.milk, rec.weight, Literal(500))) # 500 ml ~= 500 grams
g.add((rec.cinnamon, rec.weight, Literal(5))) # 1/2 teaspoon ~= 5 grams
g.add((rec.vanilla_sugar, rec.weight, Literal(20))) # 2 teaspoon ~= 20 grams
g.add((rec.butter, rec.weight, Literal(20))) #  20 grams for the pan

# size of containers (remember : in our world only RecipeContainers have a size)
g.add((rec.teaspoon, rec.size, Literal('Small')))
g.add((rec.cup, rec.size, Literal('Medium')))

# some of the ingredients are optional
g.add((rec.cinnamon, rec.optionalIngredient, Literal(True)))
g.add((rec.vanilla_sugar, rec.optionalIngredient, Literal(True)))

# how do we handle ingredients
g.add((rec.teaspoon, rec.contains, rec.cinnamon))
g.add((rec.teaspoon, rec.contains, rec.vanilla_sugar))
g.add((rec.teaspoon, rec.contains, rec.salt))

g.add((rec.bowl, rec.contains, rec.milk))
g.add((rec.bowl, rec.contains, rec.egg))
g.add((rec.bowl, rec.contains, rec.milk))

g.add((rec.spatula, rec.spreads, rec.butter))
g.add((rec.spatula, rec.spreadOn, rec.fryingpan))

#### Save my file

In [28]:
g.serialize(destination='myrecipe.ttl' , format="turtle") # also pretty-xml, n3, rdf/xml ...

#### See my graph

You can visualise the graph using this service: http://www.ldf.fi/service/rdf-grapher

# Your turn

Create a graph of at least 30 triples (and at least 5 classes/instances) based on the domain you chose earlier. 
* Define types and hierarchies
* Define domain and ranges
* Define some instances

Compare your drawing with the online one!

*************

# Part 3 - Querying with SPARQL 

1. Load your dataset to <https://krr.triply.cc/muhai> 
    * Add your description
    * Do not forget to set the dataset to *public*


2. See your entities using the [`Browser`](https://krr.triply.cc/EASE-fall-school/robots-test/browser?resource=http%3A%2F%2Fexample.org%2Frobots%2FTurtleBot1&direction=forward) tab


3. Check your triples in the [`Table`](https://krr.triply.cc/EASE-fall-school/robots-test/table) tab
    * You can create custom prefixes by double-clicking on the prefix


4. Activate a SPARQL service in [`Services`](https://krr.triply.cc/EASE-fall-school/robots-test/services)


5. Try running few SPARQL queries 
    * How many classes?
    * How many properties?
    * Can you plot classes by number of instances?


**************

# Additional material (not mandatory)

### Querying thourgh basic triple matching

We use method Graph.triples and a Python tuple that acts as a mask for specifying our criteria

In [29]:
# Printing subjects, predicates and objects out of the tuple omits Python datatypes
print("--- printing raw triples ---")
for s, p, o in sorted(g):
    print(s, p, o)

--- printing raw triples ---
http://example.org/recipe/Container http://www.w3.org/2000/01/rdf-schema#subClassOf http://example.org/recipe/RecipeComponent
http://example.org/recipe/Ingredient http://www.w3.org/2000/01/rdf-schema#subClassOf http://example.org/recipe/RecipeComponent
http://example.org/recipe/RepiceTool http://www.w3.org/2000/01/rdf-schema#subClassOf http://example.org/recipe/RecipeComponent
http://example.org/recipe/bowl http://example.org/recipe/contains http://example.org/recipe/egg
http://example.org/recipe/bowl http://example.org/recipe/contains http://example.org/recipe/flour
http://example.org/recipe/bowl http://example.org/recipe/contains http://example.org/recipe/milk
http://example.org/recipe/bowl http://example.org/recipe/size Large
http://example.org/recipe/bowl http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/recipe/Container
http://example.org/recipe/bowl http://www.w3.org/2000/01/rdf-schema#label Round Bowl
http://example.org/recipe/butter

In [30]:
# subjects only
print("PRINTING SUBJECTS")
for s in set(g.subjects()):
    print(s)

PRINTING SUBJECTS
http://example.org/recipe/Ingredient
http://example.org/recipe/flour
http://example.org/recipe/contains
http://example.org/recipe/vanilla_sugar
http://example.org/recipe/bowl
http://example.org/recipe/teaspoon
http://example.org/recipe/whisker
http://example.org/recipe/butter
http://example.org/recipe/fryingpan
http://example.org/recipe/salt
http://example.org/recipe/weight
http://example.org/recipe/milk
http://example.org/recipe/size
http://example.org/recipe/RepiceTool
http://example.org/recipe/Container
http://example.org/recipe/spatula
http://example.org/recipe/cinnamon
http://example.org/recipe/cup
http://example.org/recipe/egg


In [31]:
# predicates only
print("PRINTING PREDICATES")
for p in set(g.predicates()):
    print(p)

PRINTING PREDICATES
http://example.org/recipe/optionalIngredient
http://example.org/recipe/spreads
http://example.org/recipe/spreadOn
http://example.org/recipe/size
http://www.w3.org/2000/01/rdf-schema#label
http://www.w3.org/2000/01/rdf-schema#range
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://example.org/recipe/contains
http://example.org/recipe/weight
http://www.w3.org/2000/01/rdf-schema#domain


In [32]:
# objects only
print("PRINTING OBJECTS")
for o in set(g.objects()):
    print(o)

PRINTING OBJECTS
http://www.w3.org/2000/01/rdf-schema#Literal
http://example.org/recipe/Ingredient
Farine de blé
Plain Flour
http://example.org/recipe/flour
Round Bowl
http://example.org/recipe/vanilla_sugar
true
5
150
http://example.org/recipe/butter
http://example.org/recipe/RecipeComponent
20
http://example.org/recipe/fryingpan
http://example.org/recipe/salt
http://example.org/recipe/milk
Large
Medium
500
http://example.org/recipe/Container
http://example.org/recipe/cinnamon
http://example.org/recipe/RecipeTool
250
Small
http://example.org/recipe/egg
2


In [33]:
# print things that have weight
print("--- printing resources with a weight ---")
for (s,p,o) in g.triples((None, rec.weight, None)):
    print(s) 

--- printing resources with a weight ---
http://example.org/recipe/egg
http://example.org/recipe/butter
http://example.org/recipe/salt
http://example.org/recipe/vanilla_sugar
http://example.org/recipe/milk
http://example.org/recipe/cinnamon
http://example.org/recipe/flour


In [34]:
print("--- printing properties of flour ---")
for (s,p,o) in g.triples((flour, None, None)):
    print(p) 

--- printing properties of flour ---
http://www.w3.org/2000/01/rdf-schema#label
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#label
http://example.org/recipe/weight


In [35]:
print("--- printing relationship between flour and bowl ---")
for (s,p,o) in g.triples((bowl, None, flour)):
    print(p) 

--- printing relationship between flour and bowl ---
http://example.org/recipe/contains


In [36]:
#everything that has a type, and its type
for s,p,o in g.triples( (None, RDF.type, None) ):
    print(s,o)

http://example.org/recipe/bowl http://example.org/recipe/Container
http://example.org/recipe/vanilla_sugar http://example.org/recipe/Ingredient
http://example.org/recipe/cup http://example.org/recipe/Container
http://example.org/recipe/whisker http://example.org/recipe/RecipeTool
http://example.org/recipe/fryingpan http://example.org/recipe/RecipeTool
http://example.org/recipe/egg http://example.org/recipe/Ingredient
http://example.org/recipe/salt http://example.org/recipe/Ingredient
http://example.org/recipe/spatula http://example.org/recipe/RecipeTool
http://example.org/recipe/cinnamon http://example.org/recipe/Ingredient
http://example.org/recipe/teaspoon http://example.org/recipe/Container
http://example.org/recipe/milk http://example.org/recipe/Ingredient
http://example.org/recipe/flour http://example.org/recipe/Ingredient
http://example.org/recipe/butter http://example.org/recipe/Ingredient


## Loading data from files

rdflib accepts importing RDF data from a variety of sources, either locally from a file (including an extensive support of serializations), or remotely via a URI (this is a great way of checking practically if URIs return RDF according to the 3rd Linked Data principle).

In [40]:
# Add triples to your graph
f = Graph()

f.parse("got.ttl", format="ttl")

print(len(f)) # prints number of triples in the graph 

20


In [41]:
# Parse directly from a string
remote_g = Graph()

remote_g.parse(URIRef('http://dbpedia.org/resource/Jon_Snow_(character)'))

for (s,p,o) in sorted(remote_g):
    print(s,p,o)

http://dbpedia.org/resource/Arya_Stark http://dbpedia.org/ontology/relative http://dbpedia.org/resource/Jon_Snow_(character)
http://dbpedia.org/resource/Bran_Stark http://dbpedia.org/ontology/relative http://dbpedia.org/resource/Jon_Snow_(character)
http://dbpedia.org/resource/Daenerys_Targaryen http://dbpedia.org/ontology/relative http://dbpedia.org/resource/Jon_Snow_(character)
http://dbpedia.org/resource/Jaehaerys_II_Targaryen http://dbpedia.org/ontology/wikiPageRedirects http://dbpedia.org/resource/Jon_Snow_(character)
http://dbpedia.org/resource/Jon_Snow http://dbpedia.org/ontology/wikiPageDisambiguates http://dbpedia.org/resource/Jon_Snow_(character)
http://dbpedia.org/resource/Jon_Snow_(A_Song_of_Ice_and_Fire) http://dbpedia.org/ontology/wikiPageRedirects http://dbpedia.org/resource/Jon_Snow_(character)
http://dbpedia.org/resource/Jon_Snow_(Game_of_Thrones) http://dbpedia.org/ontology/wikiPageRedirects http://dbpedia.org/resource/Jon_Snow_(character)
http://dbpedia.org/resource/