# Tutorial on Using RDF Tools for Model-Based Methods

## Introduction

Among many others, the [World Wide Web Consortium (W3C)](https://www.w3.org/) drives the development of several standards and tools for capturing [linked data](https://www.w3.org/DesignIssues/LinkedData) in the web.
In this tutorial, we will explore how these tools can be used for [modelling](https://en.wikipedia.org/wiki/Model-driven_engineering) domains, specifically to represent geometric concepts.
We also discuss how to leverage the power of these tools for creating _composable_ models.
A more detailed discussion on the design principles for developing composable models can be found [here](https://github.com/comp-rob2b/modelling-tutorial?tab=readme-ov-file#discussion-about-composable-models).

## Technologies

### Standards

- [Resource Description Framework (RDF)](https://www.w3.org/TR/rdf11-concepts/): framework for representing information on the Web
- [Turtle](https://www.w3.org/TR/turtle/): Terse RDF Triple Language, a textual syntax for RDF (other syntaxes, e.g. [XML](https://www.w3.org/TR/rdf12-xml/), also available)
- [JSON-LD](https://www.w3.org/TR/json-ld11/): JSON-based format to serialize Linked Data
- [SPARQL](https://www.w3.org/TR/sparql11-overview/): Query language for RDF graphs
- [Shapes Constraint Language (SHACL)](https://www.w3.org/TR/shacl/): Language for specifying structural constraints of RDF graphs

### Python libraries

In [1]:
import rdflib              # https://rdflib.readthedocs.io/en/stable/
import pyshacl             # https://github.com/RDFLib/pySHACL
from pyld import jsonld    # https://github.com/digitalbazaar/pyld


## Model specification

### RDF Graphs

![](https://www.w3.org/TR/rdf11-concepts/rdf-graph.svg)

Both JSON-LD and the Turtle conforms to the RDF syntax, in which a graph is defined as a set of triples `(subject, predicate, object)`, where `object` can be literals or pointers to other nodes in the form of [Internationalized Resource Identifiers (IRIs)](https://datatracker.ietf.org/doc/html/rfc3987) (and blank nodes, but we ignore those in this context).

The following code snippet shows how the same graph (describing 2 points and their relative position) can be represented using JSON-LD and Turtle.
Notable syntactic elements in the example:
- The [`@context`](https://www.w3.org/TR/json-ld11/#the-context) in JSON-LD, in essence, defines the vocabularies for the specifying the graphs, including the predicates/relations, types, prefixes
- `@prefix` in Turtle and IRI replacement in the JSON-LD `@context` (e.g. `geom`) allows the use of [compact IRI](https://www.w3.org/TR/json-ld11/#compact-iris) in the form of `<prefix>:<suffix>`
- A `"@type": "@id"` predicate denotes that the predicate should point to an IRI, or another node in the graph
- The IRI specified by `@base` will be used as prefix if none is provided for a JSON object/node, e.g. `box-origin`
- [XML Schema Datatypes](https://www.w3.org/TR/swbp-xsch-datatypes/)(`xsd`) in the graph allows introducing literals like double, string, ...

In [2]:
jsonld_graph_str = """
{
    "@context": {
        "@base": "https://my-url.com/model/tutorial/",
        "xsd": "http://www.w3.org/2001/XMLSchema#",

        "geom": "https://my-url.com/metamodel/geometry#",
        "of-point": { "@id": "geom:of-point", "@type": "@id" },
        "wrt-point": { "@id": "geom:with-respect-to-point", "@type": "@id" },
        "of-position": { "@id": "geom:of-position", "@type": "@id" },
        "length": { "@id": "geom:length", "@type": "xsd:double" }
    },
    "@graph": [
        { "@id": "box-origin", "@type": "geom:Point" },
        { "@id": "table-origin", "@type": "geom:Point" },
        {
            "@id": "position-box-table", "@type": "geom:Position",
            "of-point": "box-origin", "wrt-point": "table-origin"
        },
        {
            "@id": "box-distance", "@type": "geom:PositionLength",
            "of-position": "position-box-table", "length": 10
        }
    ]
}
"""

position_graph = rdflib.Graph()
position_graph.parse(data=jsonld_graph_str, format="json-ld")
print(position_graph.serialize(format="turtle"))

@prefix geom: <https://my-url.com/metamodel/geometry#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://my-url.com/model/tutorial/box-distance> a geom:PositionLength ;
    geom:length 1e+01 ;
    geom:of-position <https://my-url.com/model/tutorial/position-box-table> .

<https://my-url.com/model/tutorial/box-origin> a geom:Point .

<https://my-url.com/model/tutorial/position-box-table> a geom:Position ;
    geom:of-point <https://my-url.com/model/tutorial/box-origin> ;
    geom:with-respect-to-point <https://my-url.com/model/tutorial/table-origin> .

<https://my-url.com/model/tutorial/table-origin> a geom:Point .




### Metamodeling

Metamodeling is a model-driven method, where a model of models (i.e. metamodel) is identified for a domain.
In the example, the geometry metamodel includes the `Point`, `Position` concepts and the `of-point`, `wrt-point` relations.
The graph then specify the model `postion-box-table`, which is a `Position` of `Point` `box-origin` (via relation `of-point`) with respect to `Point` `table-origin` (relation `wrt-point`).

Using JSON-LD, the metamodel is then generally introduced via the `@context`, whereas the model is in the `@graph`.
Compared to JSON-LD, the Turtle format doesn't include explicit definition of the relations between nodes, e.g. `geom:of-position`.

### What makes it composable?

Example: different coordinate systems:

In [3]:
jsonld_coordinate_extension = """
{
    "@context": {
        "@base": "https://my-url.com/model/tutorial/",

        "geom": "https://my-url.com/metamodel/geometry#",
        "origin": { "@id": "geom:origin", "@type": "@id" },
        "of-position": { "@id": "geom:of-position", "@type": "@id" },
        "as-seen-by": { "@id": "geom:as-seen-by", "@type": "@id" },
        "x": { "@id": "geom:x", "@type": "xsd:double" },
        "y": { "@id": "geom:y", "@type": "xsd:double" },
        "z": { "@id": "geom:z", "@type": "xsd:double" }
    },
    "@graph": [
        { "@id": "frame-table", "@type": "geom:Frame", "origin": "table-origin" },
        {
            "@id": "position-coord-box-table",
            "@type": [ "geom:PositionReference", "geom:PositionCoordinate", "geom:VectorXYZ" ],
            "of-position": "position-box-table",
            "as-seen-by": "frame-table",
            "x": -0.000648,
            "y": -0.000166,
            "z":  0.084487
        }
    ]
}
"""
position_graph.parse(data=jsonld_coordinate_extension, format="json-ld")

<Graph identifier=N244309afa47946d6b313dbd6fcda5dbb (<class 'rdflib.graph.Graph'>)>

The second JSON-LD graph extends the first graph by choosing the Cartesian coordinate system and a specific data format to concretely represent the position relation.
This way of modelling the relation allows enriching the models of `position-box-table` without having to modifying the original graph.
For example, other coordinate systems, e.g. cylindrical, or data format, e.g. an array of 3 numbers instead of `x,y,z`, can be introduced simply by excluding the above graph and loading another one.
This follows the [_multi-conformance_ and _open-world assumption_ principles](https://github.com/comp-rob2b/modelling-tutorial?tab=readme-ov-file#discussion-about-composable-models) of designing composable models.

Notes:
- Comparable to interfaces or plugins, as opposed to inheritance patterns
- Composability refers to principles: **JSON-LD models can be uncomposable**!

## Querying the graph

### Searching through the paths

The [SPARQL](https://www.w3.org/TR/sparql11-overview/) standard specifies the query language for retrieving information from and manipulating RDF graphs.
A SPARQL query enforces some structural constraints on the graph, in the sense that invalid graphs would not result in a match.

The following example shows the different queries to construct a graph that links `position-box-table` to `frame-table` frame of reference.
The query follows the triple construct for RDF graphs, and `?x` denotes variables.

In [4]:
import json
from timeit import default_timer as timer
from pprint import pprint

# queries for PositionCoordinate first
construct_query = """
PREFIX geom: <https://my-url.com/metamodel/geometry#>

CONSTRUCT {
    ?position geom:as-seen-by ?frame .
}
WHERE {
    ?posCoord a geom:PositionCoordinate, geom:PositionReference ;
        geom:of-position ?position ;
        geom:as-seen-by ?frame .
}
"""
start = timer()
q_res = position_graph.query(construct_query)
end = timer()
res_json = json.loads(q_res.serialize(format='json-ld'))
print(f"\nsearch Position first + chaining\nQuery time: {end - start:.5f} seconds")
pprint(res_json)

# queries for Position first and reverse path to the position coordinate concept, then forward path to the frame
construct_query = """
PREFIX geom: <https://my-url.com/metamodel/geometry#>

CONSTRUCT {
    ?position geom:as-seen-by ?frame .
}
WHERE {
    ?position a geom:Position ;
        ^geom:of-position ?posCoord .
    ?posCoord a geom:PositionReference ;
        geom:as-seen-by ?frame .
}
"""
start = timer()
q_res = position_graph.query(construct_query)
end = timer()
res_json = json.loads(q_res.serialize(format='json-ld'))
print(f"\nsearch Position first\nQuery time: {end - start:.5f} seconds")
pprint(res_json)

# queries for Position first and then chaining path to the frame
construct_query = """
PREFIX geom: <https://my-url.com/metamodel/geometry#>

CONSTRUCT {
    ?position geom:as-seen-by ?frame .
}
WHERE {
    ?position a geom:Position ;
        ^geom:of-position / geom:as-seen-by ?frame .
}
"""
start = timer()
q_res = position_graph.query(construct_query)
end = timer()
res_json = json.loads(q_res.serialize(format='json-ld'))
print(f"\nsearch Position first + chaining\nQuery time: {end - start:.5f} seconds")
pprint(res_json)


search Position first + chaining
Query time: 0.08451 seconds
[{'@id': 'https://my-url.com/model/tutorial/position-box-table',
  'https://my-url.com/metamodel/geometry#as-seen-by': [{'@id': 'https://my-url.com/model/tutorial/frame-table'}]}]

search Position first
Query time: 0.00375 seconds
[{'@id': 'https://my-url.com/model/tutorial/position-box-table',
  'https://my-url.com/metamodel/geometry#as-seen-by': [{'@id': 'https://my-url.com/model/tutorial/frame-table'}]}]

search Position first + chaining
Query time: 0.00290 seconds
[{'@id': 'https://my-url.com/model/tutorial/position-box-table',
  'https://my-url.com/metamodel/geometry#as-seen-by': [{'@id': 'https://my-url.com/model/tutorial/frame-table'}]}]


### Some advanced matching
`UNION`, `OPTIONAL`, `FILTER`

## Framing

## Structural constraints with SHACL

[The Shapes Constraint Language (SHACL)](https://www.w3.org/TR/shacl/) allows specifying rules to validate an RDF graph.
As such, SHACL allows a mean to explicitly specify structural constraints of a graph.
SHACL models are themselves RDF graphs, and as such can be written in JSON-LD, Turtle, or any other supported formats.

Two main types of constraints in SHACL are `sh:NodeShape`, for rules on nodes, and `sh:PropertyShape`, for rules on edges or properties of nodes.
The following example shows two different ways to constrain the path from a node.
First is specifying the path directly in the `sh:NodeShape` specification, here for the `geom:of-position` path.
Second is by defining a `sh:PropertyShape`, e.g. `geom:as-seen-by`, and then including it in the `sh:NodeShape` for `geom:PositionCoordinate`.
The second way allows reusing the same property `sh:PropertyShape` for different nodes.

Notes:
- Cardinality can be specified with `minCount`, `maxCount`
- Compared to SPARQL, SHACL doesn't allow variables, which can limit its ability to specify some structural constraints, e.g. loops


In [5]:
shacl_str = """
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix geom: <https://my-url.com/metamodel/geometry#> .

geom:PositionReferenceShape
    a sh:NodeShape ;
    sh:targetClass geom:PositionReference ;
    sh:property [
        sh:path geom:of-position ;
        sh:class geom:Position ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] .

geom:AsSeenByShape
    a sh:PropertyShape ;
    sh:path geom:as-seen-by ;
    sh:class geom:Frame ;
    sh:minCount 1 ;
    sh:maxCount 1 .

geom:PositionCoordinateShape
    a sh:NodeShape ;
    sh:targetClass geom:PositionCoordinate ;
    sh:property geom:AsSeenByShape .
"""

shacl_g = rdflib.Graph()
shacl_g.parse(data=shacl_str, format="turtle")
conforms, _, report_text = pyshacl.validate(
    position_graph,
    shacl_graph=shacl_g,
    data_graph_format="json-ld",
    shacl_graph_format="ttl",
    inference="rdfs",
)

if conforms:
    print("validation OK!")
else:
    print("Invalid graph:\n\n" + report_text)


validation OK!


Now we try to validate an invalid graph, where
- `geom:of-position` is pointing to a `geom:Frame` instead of a `geom:Position`, and
- a `geom:as-seen-by` relation to a `geom:Frame` is missing

These errors are described in the printed validation report!

In [6]:
jsonld_coordinate_invalid = """
{
    "@context": {
        "@base": "https://my-url.com/model/tutorial/",

        "geom": "https://my-url.com/metamodel/geometry#",
        "origin": { "@id": "geom:origin", "@type": "@id" },
        "of-position": { "@id": "geom:of-position", "@type": "@id" },
        "as-seen-by": { "@id": "geom:as-seen-by", "@type": "@id" },
        "x": { "@id": "geom:x", "@type": "xsd:double" },
        "y": { "@id": "geom:y", "@type": "xsd:double" },
        "z": { "@id": "geom:z", "@type": "xsd:double" }
    },
    "@graph": [
        { "@id": "frame-table", "@type": "geom:Frame", "origin": "table-origin" },
        {
            "@id": "position-coord-box-table",
            "@type": [ "geom:PositionReference", "geom:PositionCoordinate", "geom:VectorXYZ" ],
            "of-position": "frame-table"
        }
    ]
}
"""

invalid_graph = rdflib.Graph()
invalid_graph.parse(data=jsonld_graph_str, format="json-ld")
invalid_graph.parse(data=jsonld_coordinate_invalid, format="json-ld")

conforms, _, report_text = pyshacl.validate(
    invalid_graph,
    shacl_graph=shacl_g,
    data_graph_format="json-ld",
    shacl_graph_format="ttl",
    inference="rdfs",
)

if conforms:
    print("validation OK!")
else:
    print("Invalid graph:\n\n" + report_text)



Invalid graph:

Validation Report
Conforms: False
Results (2):
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
	Severity: sh:Violation
	Source Shape: geom:AsSeenByShape
	Focus Node: <https://my-url.com/model/tutorial/position-coord-box-table>
	Result Path: geom:as-seen-by
	Message: Less than 1 values on <https://my-url.com/model/tutorial/position-coord-box-table>->geom:as-seen-by
Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):
	Severity: sh:Violation
	Source Shape: [ sh:class geom:Position ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:path geom:of-position ]
	Focus Node: <https://my-url.com/model/tutorial/position-coord-box-table>
	Value Node: <https://my-url.com/model/tutorial/frame-table>
	Result Path: geom:of-position
	Message: Value does not have class geom:Position

