# Graph databases & SparQL

## Agenda

- Storing and retrieving triples
- Virtuoso
- GraphDB

*Beware*: commands may contain small typos. You must fix them to properly complete the course!

----

Prerequisites:

- JSON, YAML, xmlschema
- HTTP, OpenAPI 3
- SQL and database hints

---

## Graphs (again)

### RDF databases

An RDF graph is an (unordered) set of triples.

Each triple consists of a `subject`, `predicate`, `object`.

Graph databases such as [Virtuoso (opensource)](https://virtuoso.openlinksw.com/),
[GraphDB (proprietary)](),
[Amazon Nepture (proprietary SaaS)]()
store triples into graphs.

They can be queried using the [SparQL]() language.

----

A sparql query retrieves all entries
matching one or more sentences

In [None]:
SELECT * WHERE {
  ?subject ?predicate ?object .
  # ... more sentences ...
}

This workshop provides a non-exhaustive introduction to SparQL.

----

### Non-RDF databases

Other databases - [Neo4j (opensource)]()
use a different approach to represent graphs
such as [Labeled Property Graphs](https://en.wikipedia.org/wiki/Labeled_property_graph)
Neo4j can be queried using the [Cypher](https://neo4j.com/developer/cypher-query-language/) language.

Neo4j supports RDF datasets via the Neosemantics plugin.

----

## rdflib backends

We will simulate a graph database using
[rdflib](https://rdflib.readthedocs.io/en/stable/index.html),
that supports SparQL queries.

rdflib supports multiple backends to parse and store triples.

oxrdflib is a performant one
based on [Oxigraph](https://github.com/oxigraph/oxigraph).

In [None]:
%pip install oxrdflib

Let's test it.

In [None]:
from rdflib import Graph

g = Graph()

# Use the default backend.
%time g.parse("countries-skos-ap-act.ttl", format="text/turtle")
print("The graph contains", len(g), "triples.")

In [None]:
g=Graph(store="Oxigraph")

# Use the ox-turtle parser.
%time g.parse("countries-skos-ap-act.ttl", format="ox-turtle")
print("The graph contains", len(g), "triples.")

See also:

- <https://rdflib.readthedocs.io/en/stable/persistence.html>

---

## My first SparQL query

Let's create a graph
and load into it the [European vocabulary for countries](countries.ttl).

See also:

- [EU Authority Tables](https://op.europa.eu/en/web/eu-vocabularies/authority-tables)

In [None]:
from rdflib import Graph

# Let's create a graph.
g = Graph(store="Oxigraph")

# And load into it the European
# vocabulary for countries.
g.parse("countries-skos-ap-act.ttl", format="ox-turtle")

Now let's run our first SparQL query!

In [None]:
# List the first 3 triples.
q = """
SELECT * WHERE {
  ?subject ?predicate ?object .
}
LIMIT 3
"""
result = g.query(q)

# Print it!
for r in result:
  print(r.asdict())

Now print the result using
variable names.

In [None]:
for r in result:
    print(r.subject, r.predicate, r.object, sep="\t")

Exercise:

- Replace `?subject` with `?foo`:
  what happens?

In [None]:
q = """
WRITEME!
"""
result = g.query(q)

# Print it!
for r in result:
  print(r.asdict())

- Remove the `LIMIT` clause.
  How many triples are in the graph?

In [None]:
# Use this cell for the exercise.

---

### Traversing the graph

The Country graph contains more than countries.

In [None]:
to_curie = g.namespace_manager.curie

q = """
PREFIX country: <http://publications.europa.eu/resource/authority/country/>

SELECT DISTINCT *
WHERE {
  country:ITA skos:narrower ?narrower .
  ?narrower skos:prefLabel ?label .
  FILTER (lang(?label) = "en")
}
"""
result = g.query(q)

narrower = {to_curie(r.narrower): str(r.label) for r in result}

print(*narrower.items(), sep="\n")

Exercise:

- run the above query replacing `skos:narrower` with `skos:narrower*`;
  what happens?
- run the above query using `country:FRA` and see what happens;
  then replace `skos:narrower` with `skos:narrower/skos:narrower`:
  do you see the same number of results?

<b>
The `*` operator is used to traverse the graph
and find all the nodes reachable from the starting node.
The `*` operator is not supported by all graph databases.
</b>

#### Creating a graph

SparQL can create new graphs from an existing one.

In [None]:
q = """
PREFIX country: <http://publications.europa.eu/resource/authority/country/>

CONSTRUCT {
  ?narrower
    skos:prefLabel ?label ;
    skos:broader ?broader .
}
WHERE {
  ?narrower
    # All resources transitively related to country:FRA...
    skos:broader* country:FRA ;

    # ... with their labels ...
    skos:prefLabel ?label ;

    # ... and their broader relations.
    skos:broader ?broader .
  
  FILTER (lang(?label) = "en")
}
"""
result = g.query(q)
list(result.graph)

Let's visualize the graph.

In [None]:
from tools import plot_graph
from rdflib import SKOS

plot_graph(result.graph, label=SKOS.prefLabel)