# Lab 1.2: Building RDF Schemas with ```rdflib```

**Learning Outcomes:**

*   Define RDF classes and properties programmatically using ```rdflib```.
*   Construct class hierarchies using ```rdfs:subClassOf```.
*   Integrate terms from existing vocabularies like [Schema.org](https://schema.org/).
*   Apply rules to properties using ```rdfs:domain``` and ```rdfs:range```.
*   Serialize a complete RDF schema to ```TURTLE``` (.ttl) format, export it to Protégé.

## RDF Concepts in Action
In Lab 1, we learned that RDF is a data model for representing information as linked data. The fundamental unit is the triple:

<center>
<code>&lt;subject&gt; &lt;predicate&gt; &lt;object&gt;</code> <br>
OR <br>
<code>&lt;head&gt; &lt;relation&gt; &lt;tail&gt;</code> <br>
OR <br>
<code>&lt;resource&gt; &lt;property&gt; &lt;resource&gt;</code> <br>
</center>

where their types are commonly;

*   **Subject:** URI, Blank Node
*   **Predicate:** URI
*   **Object:** URI, Blank Node, Literal


Now let's realize the statements from Lab 1 with ```rdflib```.




---

### **Step 1:** Installing ```rdflib``` and importing useful methods

In [1]:
!pip3 install rdflib -U

from rdflib import Graph, URIRef, Literal, BNode, Namespace
from rdflib.namespace import RDF, RDFS, XSD, SDO

Collecting rdflib
  Using cached rdflib-7.1.4-py3-none-any.whl.metadata (11 kB)
Collecting isodate<1.0.0,>=0.7.2 (from rdflib)
  Using cached isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Using cached rdflib-7.1.4-py3-none-any.whl (565 kB)
Using cached isodate-0.7.2-py3-none-any.whl (22 kB)
Installing collected packages: isodate, rdflib
Successfully installed isodate-0.7.2 rdflib-7.1.4



[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: C:\Users\Timur\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


Methods:

- ***Graph:*** The main object we will be adding triples to.

- ***URIRef:*** A **Uniform Resource Identifier**. They are used as unique, global identifiers for subjects, predicates, objects. Consists of a **namesapece** followed by the **resource name**.

- ***Literal:*** For actual values as objects, like a name or a number.

- ***BNode:*** For blank nodes (resources **without a URI**).

- ***Namespace:*** Function used to define new namespaces.

- ***RDF, RDFS, XSD, SDO:*** Generally useful namespaces while building RDF schemas.


---


### **Step 2:** Initializing ```Graph()``` and adding example statements

In [2]:
# Create a Graph to store our triples
g = Graph()

# Define the resources
frodo = URIRef("http://example.org/lotr/Frodo")
gandalf = URIRef("http://example.org/lotr/Gandalf")
sauron = URIRef("http://example.org/lotr/Sauron")
oneRing = URIRef("http://example.org/lotr/OneRing")

# Define the properties
hasFriend = URIRef("http://example.org/lotr/hasFriend")
isBearerOf = URIRef("http://example.org/lotr/isBearerOf")
isEnemyOf = URIRef("http://example.org/lotr/isEnemyOf")

# Add statements
g.add((frodo, hasFriend, gandalf)) # Frodo is a friend of Gandalf.
g.add((frodo, isBearerOf, oneRing)) # Frodo is the bearer of the One Ring.
g.add((gandalf, isEnemyOf, sauron)) # Gandalf is an enemy of Sauron.

# Serialize the populated Graph in TURTLE format
print(g.serialize(format="ttl"))

@prefix ns1: <http://example.org/lotr/> .

ns1:Frodo ns1:hasFriend ns1:Gandalf ;
    ns1:isBearerOf ns1:OneRing .

ns1:Gandalf ns1:isEnemyOf ns1:Sauron .




Observe that ```rdflib``` conveniently adds a designates a generic prefix (ns1) to the namespace used in our Graph ([http://example.org/lotr/](http://example.org/lotr/)). When serialized, the Graph will arrange triples to an RDF schema in our desired format, ```TURTLE``` in this case.


---


### **Step 3:** Using prefixes

Writing the namespaces explicitly becomes tedious as the amount of triples increases. Thus, we can bind our namespaces to prefixes to avoid more work:

In [4]:
# Reset the previous Graph
g = Graph()

# Define namespace
LOTR = Namespace("http://example.org/lotr/")

# Bind namespace
g.bind("lotr", LOTR)

# Define the resources
frodo = LOTR.Frodo
gandalf = LOTR.Gandalf
sauron = LOTR.Sauron
oneRing = LOTR.OneRing

# Define the properties
hasFriend = LOTR.hasFriend
isBearerOf = LOTR.isBearerOf
isEnemyOf = LOTR.isEnemyOf

# Add statements
g.add((frodo, hasFriend, gandalf)) # Frodo is a friend of Gandalf.
g.add((frodo, isBearerOf, oneRing)) # Frodo is the bearer of the One Ring.
g.add((gandalf, isEnemyOf, sauron)) # Gandalf is an enemy of Sauron.

# Serialize the graph
print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .

lotr:Frodo lotr:hasFriend lotr:Gandalf ;
    lotr:isBearerOf lotr:OneRing .

lotr:Gandalf lotr:isEnemyOf lotr:Sauron .




The string we bind our namespace to will appear in the Graph. Use prefixes for organized codes as well as increasing interoperability while working with multiple Graphs.


---


### **Step 4:** Adding literals

Literals are exact values related to our subjects. These can be strings, dates, numbers, and more (see [XSD data types](https://www.ibm.com/docs/en/jfsm/1.1.2.1?topic=queries-xsd-data-types)). We can use the ```Literal``` function from ```rdflib``` to add values as object to our Graph. We will use the imported ```XSD``` namespace to define the datatypes.

In [6]:
# Defining literals
frodoAge = Literal(33, datatype=XSD.integer) # Frodo's age
oneRingDestructionDate = Literal("3019-03-25T00:00:00", datatype=XSD.dateTime) # One Ring' destruction date
gandalfTitle = Literal("The White", datatype=XSD.string) # Gandalf's title

# Define new properties for literals
hasAge = LOTR.hasAge
hasDestructionDate = LOTR.hasDestructionDate
hasTitle = LOTR.hasTitle

# Add statements with literals
g.add((frodo, hasAge, frodoAge))
g.add((oneRing, hasDestructionDate, oneRingDestructionDate))
g.add((gandalf, hasTitle, gandalfTitle))

# Serialize the graph to see the new triples with literals
print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

lotr:Frodo lotr:hasAge 33 ;
    lotr:hasFriend lotr:Gandalf ;
    lotr:isBearerOf lotr:OneRing .

lotr:Gandalf lotr:hasTitle "The White"^^xsd:string ;
    lotr:isEnemyOf lotr:Sauron .

lotr:OneRing lotr:hasDestructionDate "3019-03-25T00:00:00"^^xsd:dateTime .




During serialization, the graph will throw a warning if the literal cannot be converted to its designated datatype.


---


### **[Optional] Step 5:** Blank Nodes


In [7]:
# Define a blank node
encounter = BNode() # Represents a specific encounter

# Define properties related to the encounter
hasParticipant = LOTR.hasParticipant
hasOutcome = LOTR.hasOutcome

# Add statements involving the blank node
g.add((encounter, hasParticipant, frodo)) # The encounter involved Frodo
g.add((encounter, hasParticipant, sauron)) # The encounter involved Sauron
g.add((encounter, hasOutcome, Literal("Frodo escapes", datatype=XSD.string))) # The outcome was Frodo escaping

# Serialize the graph to see the triples including the blank node
print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

lotr:Frodo lotr:hasAge 33 ;
    lotr:hasFriend lotr:Gandalf ;
    lotr:isBearerOf lotr:OneRing .

lotr:Gandalf lotr:hasTitle "The White"^^xsd:string ;
    lotr:isEnemyOf lotr:Sauron .

lotr:OneRing lotr:hasDestructionDate "3019-03-25T00:00:00"^^xsd:dateTime .

[] lotr:hasOutcome "Frodo escapes"^^xsd:string ;
    lotr:hasParticipant lotr:Frodo,
        lotr:Sauron .




Blank nodes solely used as subject will just be URIs with random IDs. They make more sense when used to bundle data together in a structured way; then linked to a single subject.

In [9]:
# Define a second blank node for another encounter
anotherEncounter = BNode() # Represents another specific encounter

# Add statements involving the second blank node
g.add((anotherEncounter, hasParticipant, gandalf)) # The encounter involved Gandalf
g.add((anotherEncounter, hasParticipant, LOTR.Balrog)) # The encounter involved a Balrog (assuming a resource for Balrog exists or is implicitly defined)
g.add((anotherEncounter, hasOutcome, Literal("Gandalf falls", datatype=XSD.string))) # The outcome was Gandalf falling

# Define a resource for a movie and a property to link movies to encounters
theFellowshipOfTheRing = LOTR.TheFellowshipOfTheRing
hasEncounter = LOTR.hasEncounter

# Link the movie to both encounters
g.add((theFellowshipOfTheRing, hasEncounter, encounter))
g.add((theFellowshipOfTheRing, hasEncounter, anotherEncounter))


# Serialize the graph to see the new triples including the blank nodes and the movie link
print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

lotr:TheFellowshipOfTheRing lotr:hasEncounter [ lotr:hasOutcome "Gandalf falls"^^xsd:string ;
            lotr:hasParticipant lotr:Balrog,
                lotr:Gandalf ],
        [ lotr:hasOutcome "Gandalf falls"^^xsd:string ;
            lotr:hasParticipant lotr:Balrog,
                lotr:Gandalf ],
        [ lotr:hasOutcome "Frodo escapes"^^xsd:string ;
            lotr:hasParticipant lotr:Frodo,
                lotr:Sauron ] .

lotr:Frodo lotr:hasAge 33 ;
    lotr:hasFriend lotr:Gandalf ;
    lotr:isBearerOf lotr:OneRing .

lotr:OneRing lotr:hasDestructionDate "3019-03-25T00:00:00"^^xsd:dateTime .

lotr:Gandalf lotr:hasTitle "The White"^^xsd:string ;
    lotr:isEnemyOf lotr:Sauron .




## Schema-Level Graphs
The Graph we just created is instance level. On a higher perspective, we can also build schema level graphs. The main difference is that a schema level graph describes the rules and structure of your data (as a blueprint), while an instance level graph contains the actual data points that follow those rules.

Think of these as cooking a meal. Schema level graphs, define the categories of things you need (e.g., flour, sugar, egg) and the rules for how they interact (e.g., "combine dry ingredients," "bake at 180°C for 30 minutes"). It's the abstract template. The instance level is the actual, physical food you cook. It's a specific realization of the recipe, with particular ingredients (e.g., all-purpose flour, the specific brown egg you used, granulated sugar) all combined in the instructed way. You can bake many different meals (instances) from the same food's recipe (schema).


---


### **Step 1:** Setup the Graph and Namespaces

In [10]:
g = Graph()

# Define our custom namespace
LOTR = Namespace("http://example.org/lotr/")

# Bind prefixes, best practice for readable output
g.bind("lotr", LOTR)
g.bind("rdfs", RDFS)
g.bind("xsd", XSD)
g.bind("schema", SDO)

The ```SDO``` namespace corresponds to URI [https://schema.org/](https://schema.org/); a shared vocabulary that contains useful resources and properties. We will use it to connect our schema to widely used concepts.


---


## **Step 2:** Defining Classes and Hierarchies
We will start by defining our main concepts, i.e. classes.

In [11]:
g.add((LOTR.Character, RDF.type, RDFS.Class))
g.add((LOTR.Artifact, RDF.type, RDFS.Class))

print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

lotr:Artifact a rdfs:Class .

lotr:Character a rdfs:Class .




See that ```RDF.type``` ([http://www.w3.org/1999/02/22-rdf-syntax-ns#type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type)) predicate is printed as "a" automatically. This is a shorthand that increases readability since this is a widely used property while building schemas.

Following this, we can give our schema even more detail and structure by defining subclasses. We will use ```RDFS.subClassOf``` predicate.



In [12]:
g.add((LOTR.Hobbit, RDFS.subClassOf, LOTR.Character))
g.add((LOTR.Wizard, RDFS.subClassOf, LOTR.Character))
g.add((LOTR.DarkLord, RDFS.subClassOf, LOTR.Character))

g.add((LOTR.Character, RDFS.subClassOf, SDO.Person)) # linked to an external vocabulary

print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .

lotr:Artifact a rdfs:Class .

lotr:Character a rdfs:Class ;
    rdfs:subClassOf schema:Person .

lotr:DarkLord rdfs:subClassOf lotr:Character .

lotr:Hobbit rdfs:subClassOf lotr:Character .

lotr:Wizard rdfs:subClassOf lotr:Character .




 Linking our schema to a public one like [Schema.org](https://schema.org/) is a highly recommended practice in Linked Data. It means any other system that understands ```schema:Person``` will now have a clue what our ```lotr:Character``` is. This prevents our data from being isolated, increasing interoperability.

#### Why Define Classes and Subclasses?
*   **Creating Structure:** Defining classes and subclasses makes our model a faithful representation of the domain's structure.
*   **Enable Reasoning:** The hierarchy is not just for organization. An RDF reasoner can use the ```rdfs:subClassOf``` relationship to make inferences. For example, if we state that Frodo is a Hobbit, a reasoner will automatically understand he is also a Character. This concept of inheritance means any property that applies to a Character (like hasAge) will also apply to a Hobbit without us needing to state it explicitly.


---


### **Step 3:** Defining Properties
Now we define the properties from our instance graph at the schema level by giving them a ```rdfs:domain``` and ```rdfs:range```. There are two types of properties we can define: Datatype Properties and Object Properties. The former asserts a literal for the object, and the latter an IRI.

Adding datatype properties:

In [13]:
# Define lotr:hasAge
g.add((LOTR.hasAge, RDF.type, RDF.Property))
g.add((LOTR.hasAge, RDFS.domain, LOTR.Character)) # subject is a Character class
g.add((LOTR.hasAge, RDFS.range, XSD.integer)) # object is an integer literal

# Define lotr:hasDestructionDate
g.add((LOTR.hasDestructionDate, RDF.type, RDF.Property))
g.add((LOTR.hasDestructionDate, RDFS.domain, LOTR.Artifact)) # subject is an Artifact class
g.add((LOTR.hasDestructionDate, RDFS.range, XSD.dateTime)) # object is a datetime literal

print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

lotr:Artifact a rdfs:Class .

lotr:Character a rdfs:Class ;
    rdfs:subClassOf schema:Person .

lotr:DarkLord rdfs:subClassOf lotr:Character .

lotr:Hobbit rdfs:subClassOf lotr:Character .

lotr:Wizard rdfs:subClassOf lotr:Character .

lotr:hasAge a rdf:Property ;
    rdfs:domain lotr:Character ;
    rdfs:range xsd:integer .

lotr:hasDestructionDate a rdf:Property ;
    rdfs:domain lotr:Artifact ;
    rdfs:range xsd:dateTime .




Adding object properties:

In [14]:
# Define lotr:hasFriend
g.add((LOTR.hasFriend, RDF.type, RDF.Property))
g.add((LOTR.hasFriend, RDFS.domain, LOTR.Character)) # subject is a Character class
g.add((LOTR.hasFriend, RDFS.range, LOTR.Character)) # object is a Character class

# Define lotr:isBearerOf
g.add((LOTR.isBearerOf, RDF.type, RDF.Property))
g.add((LOTR.isBearerOf, RDFS.domain, LOTR.Character)) # subject is a Character class
g.add((LOTR.isBearerOf, RDFS.range, LOTR.Artifact)) # object is an Artifact class

print(g.serialize(format="ttl"))

@prefix lotr: <http://example.org/lotr/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

lotr:Artifact a rdfs:Class .

lotr:Character a rdfs:Class ;
    rdfs:subClassOf schema:Person .

lotr:DarkLord rdfs:subClassOf lotr:Character .

lotr:Hobbit rdfs:subClassOf lotr:Character .

lotr:Wizard rdfs:subClassOf lotr:Character .

lotr:hasAge a rdf:Property ;
    rdfs:domain lotr:Character ;
    rdfs:range xsd:integer .

lotr:hasDestructionDate a rdf:Property ;
    rdfs:domain lotr:Artifact ;
    rdfs:range xsd:dateTime .

lotr:hasFriend a rdf:Property ;
    rdfs:domain lotr:Character ;
    rdfs:range lotr:Character .

lotr:isBearerOf a rdf:Property ;
    rdfs:domain lotr:Character ;
    rdfs:range lotr:Artifact .




 #### Why Define Properties?
* **Data Consistency:** Defining domains and ranges creates limits our data. It ensures that only ```Character```s can have a ```hasAge```, and the value must be an integer. This prevents illogical statements, and improves the quality.
* **Enable Validation:** Allows tools to validate instance data. If someone tries to add a triple that violates these rules, it can be automatically flagged as an error.
* **Richer Inferences:** A reasoner can use these rules to infer new information. For instance, if a reasoner sees the triple <code>&lt;lotr:Gandalf&gt; &lt;lotr:hasAge&gt; 9000</code>, and it knows from the schema that the domain of ```hasAge``` is ```lotr:Character```; it can infer that ```lotr:Gandalf``` is an instance of the ```lotr:Character``` class even if that wasn't stated directly.


---


### **Step 4:** Exporting the Schema
Now that we have a small but fully-realized schema, we can save it as a ```TURTLE``` (.ttl) file.

In [15]:
g.serialize(destination="lotr_schema.ttl", format="ttl")

<Graph identifier=Na1d23630627949ecb2cbf3857e0ebb1f (<class 'rdflib.graph.Graph'>)>

This file can be exported the ontology editor Protégé. Check the different tabs to explore our schema, visualize it to get a higher understanding of what we have constructed today.


---


## **Exercise 1:** Protégé for Visualization

1.   Download ```lotr_schema.ttl``` from the files tab of this on the left sidebar (or find it under the same folder as this notebook if you are running locally).
2.   Upload it to the Protégé application and explore the schema.
3.   Enable the tab under the Window -> Tabs -> OntoGraf. Expand all the entities to see the visual representation of our schema.
4.   [OPTIONAL] Export it to PNG.


---


## **Exercise 2:** Reverse Engineering

1.   Study the screenshot of a schema's OntoGraf visualization below. \
![Visual Graph](./visual_graph.png)
2.   Rebuild it with ```rdflib``` to practice what we have learned.
3.   Serialize the final graph and load it onto Protégé to validate your work.

### Answer:

In [16]:
g = Graph()

UNI = Namespace("http://example.org/uni/")

g.bind("uni", UNI)
g.bind("rdfs", RDFS)
g.bind("xsd", XSD)
g.bind("schema", SDO)

g.add((UNI.Person, RDF.type, RDFS.Class))
g.add((UNI.Professor, RDF.type, RDFS.Class))
g.add((UNI.Student, RDF.type, RDFS.Class))
print(g.serialize(format="ttl"))

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix uni: <http://example.org/uni/> .

uni:Person a rdfs:Class .

uni:Professor a rdfs:Class .

uni:Student a rdfs:Class .




In [18]:
g.add((UNI.Professor, RDFS.subClassOf, UNI.Person))
g.add((UNI.Student, RDFS.subClassOf, UNI.Person))
print(g.serialize(format="ttl"))

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix uni: <http://example.org/uni/> .

uni:Person a rdfs:Class .

uni:Professor a rdfs:Class ;
    rdfs:subClassOf uni:Person .

uni:Student a rdfs:Class ;
    rdfs:subClassOf uni:Person .




In [21]:
g.add((UNI.advises, RDF.type, RDF.Property))
g.add((UNI.advises, RDFS.domain, UNI.Professor)) # subject is an Artifact class
g.add((UNI.advises, RDFS.range, UNI.Student))
print(g.serialize(format="ttl"))

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix uni: <http://example.org/uni/> .

uni:Person a rdfs:Class .

uni:Professor a rdfs:Class ;
    rdfs:subClassOf uni:Person .

uni:Student a rdfs:Class ;
    rdfs:subClassOf uni:Person .

<http://example.org/lotr/advises> rdfs:range uni:Student .

uni:advises a rdf:Property ;
    rdfs:domain uni:Professor ;
    rdfs:range uni:Student .




In [22]:
g.add((UNI.hasName, RDF.type, RDF.Property))
g.add((UNI.hasName, RDFS.domain, XSD.Person)) # subject is an Artifact class
g.add((UNI.hasName, RDFS.range, XSD.string))
print(g.serialize(format="ttl"))

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix uni: <http://example.org/uni/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

uni:Person a rdfs:Class .

uni:Professor a rdfs:Class ;
    rdfs:subClassOf uni:Person .

uni:Student a rdfs:Class ;
    rdfs:subClassOf uni:Person .

<http://example.org/lotr/advises> rdfs:range uni:Student .

uni:advises a rdf:Property ;
    rdfs:domain uni:Professor ;
    rdfs:range uni:Student .

uni:hasName a rdf:Property ;
    rdfs:domain xsd:string ;
    rdfs:range xsd:string .




In [23]:
g.serialize(destination="uni_schema.ttl", format="ttl")

<Graph identifier=Nfc7fab99e5cf4ea4aba0c2a7ac0011f9 (<class 'rdflib.graph.Graph'>)>

In [24]:
g.serialize(destination="uni_schema.trig", format="trig")


<Graph identifier=Nfc7fab99e5cf4ea4aba0c2a7ac0011f9 (<class 'rdflib.graph.Graph'>)>