In [1]:
# for use in tutorial and development; do not include this `sys.path` change in production:
import sys ; sys.path.insert(0, "../")

# Measurement and inference

The following examples explore the use of [*inference*](https://www.w3.org/standards/semanticweb/inference) as an automated way to generate new relation (i.e., expand the graph) based on the data in the KG plus the rules of its vocabularies.
We'll also show how to use the `kglab.Measure` class to measure the size and composition of a KG.

Now let's load a KG from `dat/gorm.ttl` that describes a fictional small community of happy Vikings:

In [2]:
from os.path import dirname
import kglab
import os

namespaces = {
    "foaf": "http://xmlns.com/foaf/0.1/",
    "gorm": "http://example.org/sagas#",
    "rel":  "http://purl.org/vocab/relationship/",
    }

kg = kglab.KnowledgeGraph(
    name = "Happy Vikings KG example for SKOS/OWL inference",
    namespaces=namespaces,
    )

kg.load_rdf(dirname(os.getcwd()) + "/dat/gorm.ttl") ;

In [3]:
text = kg.save_rdf_text()
print(text)

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix gorm: <http://example.org/sagas#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

gorm:Astrid a gorm:Viking ;
    gorm:childOf gorm:Bodil,
        gorm:Leif ;
    foaf:topic_interest gorm:Fighting .

gorm:childOf rdfs:domain gorm:Viking ;
    rdfs:range gorm:Viking ;
    owl:inverseOf gorm:ancestorOf .

gorm:spouseOf a owl:SymmetricProperty ;
    rdfs:domain gorm:Viking ;
    rdfs:range gorm:Viking .

gorm:Berserkr a foaf:Thing ;
    skos:broader gorm:Fighting .

gorm:Bjorn a gorm:Viking ;
    gorm:childOf gorm:Gorm ;
    foaf:topic_interest gorm:Pilaging .

gorm:Bodil a gorm:Viking ;
    gorm:spouseOf gorm:Leif .

gorm:Gorm a gorm:Viking ;
    foaf:topic_interest gorm:Berserkr .

gorm:Pilaging a foaf:Thing ;
    skos:broader gorm:Fighting .

gorm:Leif a gorm:Viking ;
    gorm:childOf gorm:Bjorn .

gorm:Fighting a foaf:Th

We can use the `Measure` class to count the number of nodes and edges in the graph at this point:

In [4]:
import pandas as pd

measure = kglab.Measure()
measure.measure_graph(kg)

print("edges", measure.get_edge_count())
print("nodes", measure.get_node_count())

edges 25
nodes 15


Ancestors are important to Vikings, so let's see who's an anscestor of whom?

In [5]:
sparql = """
SELECT ?elder ?viking
  WHERE {
      ?elder gorm:ancestorOf ?viking
  }
  ORDER BY ASC(?viking)
  """

df = kg.query_as_df(sparql)
df

And who is a spouse of whom?

In [6]:
sparql = """
SELECT ?viking1 ?viking2
  WHERE {
      ?viking1 gorm:spouseOf ?viking2
  }
  """

df = kg.query_as_df(sparql)
df

Unnamed: 0,viking1,viking2
0,gorm:Bodil,gorm:Leif


Of course for Vikings one may not even need to ask, but who wants to fight?

In [7]:
sparql = """
SELECT ?viking ?hobby
  WHERE {
      ?viking foaf:topic_interest ?hobby .
      gorm:Fighting skos:narrower ?hobby
  }
  ORDER BY ASC(?viking)
  """

df = kg.query_as_df(sparql)
df

Huh. Nobody wants to fight?!? That doesn't seem especially Viking-like!
Nor do these query results seem to fit an intuitive sense of our graph data.
Let's use *inference* on the graph to help fix that.

## Inference based on `owlrl`

Now we can call the `infer_owlrl_closure()` method to add RDF statements to the graph based on OWL inference:

In [8]:
kg.infer_owlrl_closure()

In [9]:
text = kg.save_rdf_text()
print(text)

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix gorm: <http://example.org/sagas#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

gorm:childOf rdfs:domain gorm:Viking ;
    rdfs:range gorm:Viking ;
    owl:inverseOf gorm:ancestorOf ;
    owl:sameAs gorm:childOf .

gorm:spouseOf a owl:SymmetricProperty ;
    rdfs:domain gorm:Viking ;
    rdfs:range gorm:Viking ;
    owl:sameAs gorm:spouseOf .

rdf:HTML a rdfs:Datatype ;
    owl:sameAs rdf:HTML .

rdf:LangString a rdfs:Datatype ;
    owl:sameAs rdf:LangString .

rdf:PlainLiteral a rdfs:Datatype ;
    owl:sameAs rdf:PlainLiteral .

rdf:XMLLiteral a rdfs:Datatype ;
    owl:sameAs rdf:XMLLiteral .

rdf:type owl:sameAs rdf:type .

rdfs:Literal a rdfs:Datatype ;
    owl:sameAs rdfs:Literal .

rdfs:comment a owl:Anno

How much has the size of our KG increased?

In [10]:
measure = kglab.Measure()
measure.measure_graph(kg)

print("edges", measure.get_edge_count())
print("nodes", measure.get_node_count())

edges 156
nodes 74


In other word, the graph increased by 59 nodes and 131 edges.
How about the query results respectively for begetting and spousing?

In [11]:
sparql = """
SELECT ?elder_viking ?viking
  WHERE {
      ?elder_viking gorm:ancestorOf ?viking
  }
  ORDER BY ASC(?viking)
  """

df = kg.query_as_df(sparql)
df

Unnamed: 0,elder_viking,viking
0,gorm:Leif,gorm:Astrid
1,gorm:Bodil,gorm:Astrid
2,gorm:Gorm,gorm:Bjorn
3,gorm:Bjorn,gorm:Leif


In [12]:
sparql = """
SELECT ?viking1 ?viking2
  WHERE {
      ?viking1 gorm:spouseOf ?viking2
  }
  """

df = kg.query_as_df(sparql)
df

Unnamed: 0,viking1,viking2
0,gorm:Bodil,gorm:Leif
1,gorm:Leif,gorm:Bodil


Both the transitive `gorm:ancestorOf` relations and the symmetric `gorm:spouseOf` relations have been inferred to add RDF statements through the OWL-RL closure.

The `Measure` class also tallies the counts for subjects, predicates, and objects:

In [13]:
measure.s_gen.get_tally()

Unnamed: 0,count
http://example.org/sagas#Leif,5
http://www.w3.org/2002/07/owl#Nothing,5
http://example.org/sagas#Astrid,5
http://example.org/sagas#Bjorn,5
http://example.org/sagas#childOf,4
http://www.w3.org/2002/07/owl#Thing,4
http://example.org/sagas#Bodil,4
http://example.org/sagas#spouseOf,4
http://example.org/sagas#Gorm,4
http://example.org/sagas#Berserkr,3


In [14]:
measure.p_gen.get_tally()

Unnamed: 0,count
http://www.w3.org/2002/07/owl#sameAs,74
http://www.w3.org/1999/02/22-rdf-syntax-ns#type,57
http://example.org/sagas#ancestorOf,4
http://example.org/sagas#childOf,4
http://www.w3.org/2000/01/rdf-schema#subClassOf,3
http://xmlns.com/foaf/0.1/topic_interest,3
http://example.org/sagas#spouseOf,2
http://www.w3.org/2002/07/owl#equivalentClass,2
http://www.w3.org/2000/01/rdf-schema#domain,2
http://www.w3.org/2004/02/skos/core#broader,2


In [15]:
measure.o_gen.get_tally()

Unnamed: 0,count
http://www.w3.org/2000/01/rdf-schema#Datatype,37
http://www.w3.org/2002/07/owl#AnnotationProperty,10
http://example.org/sagas#Viking,10
http://xmlns.com/foaf/0.1/Thing,4
http://www.w3.org/2002/07/owl#Thing,4
http://example.org/sagas#Leif,4
http://example.org/sagas#Fighting,4
http://example.org/sagas#Astrid,3
http://www.w3.org/2002/07/owl#Nothing,3
http://www.w3.org/2002/07/owl#Class,3


## Inference based on `skosify`

Next, let's run [SKOS](https://www.w3.org/TR/skos-primer/) inference to expand the `skos:broader` and `skos:narrower` relations about Vikings' hobby interests:

In [16]:
kg.infer_skos_related()
kg.infer_skos_hierarchical(narrower=True)
kg.infer_skos_transitive(narrower=True)

measure = kglab.Measure()
measure.measure_graph(kg)

print("edges", measure.get_edge_count())
print("nodes", measure.get_node_count())

edges 158
nodes 74


This added 2 edges to the graph.
Let's query to see who wants to fight now?

In [17]:
sparql = """
SELECT ?viking ?hobby
  WHERE {
      ?viking foaf:topic_interest ?hobby .
      gorm:Fighting skos:narrower ?hobby
  }
  ORDER BY ASC(?viking)
  """

df = kg.query_as_df(sparql)
df

Unnamed: 0,viking,hobby
0,gorm:Bjorn,gorm:Pilaging
1,gorm:Gorm,gorm:Berserkr


These are the two relations added.
More fighters – now that seems much more Viking-like!

---

## Exercises

**Exercise 1:**

Starting from an initial load `kg.load_rdf("dat/gorm.ttl")` of this example Viking graph, show how to combine use of OWL and SKOS inference plus SPARQL queries to enumerate the RDF nodes in the KG which represent children of Vikings who enjoy some form of fighting.

**Exercise 2:**

Use the `Measure` class to 

  1. Tally occurrences for each RDF node in the KG that's used as a *subject* or *object*
  1. Calculate a probability distribution for nodes based on this occurrence data
  1. Render this distribution as a histogram