# Using Python to read and print linked data

This notebook offers a preliminary and quick introduction to the `rdflib` 
library, which supports the parsing and serializing of linked data in Python.
The library supports output in RDF/XML, N3, Turtle, and JSON-LD.

## Setup

If you don't have [RDFLib](https://rdflib.readthedocs.io/en/), import it:

In [None]:
!pip install rdflib

## Demonstration

The cells below provide a quick demonstration of importing the library,
reading and parsing an RDF document, then serializing that data
in another format. 

In [1]:
from rdflib import Graph

In [3]:
# create a graph
g = Graph()

# parse the data
g.parse('../data/xml/jajohnst-foaf.rdf')

<Graph identifier=N838a8a0dacce402b8aab3aa24a17f3cc (<class 'rdflib.graph.Graph'>)>

To see how many statements are in the graph, use the `len()` function:

In [4]:
print('The graph contains '+str(len(g))+' statements.')

The graph contains 31 statements.


Each statement should be a tuple with three values: subject, predicate, object. Let's take a look:

In [5]:
# call the triples and loop through
for subj, pred, obj in g:
    print(subj, pred, obj)

file:///Users/jajohnst/Desktop/si676-2024-data/data/xml/jajohnst-foaf.rdf#ricky http://xmlns.com/foaf/0.1/mbox_sha1sum None
http://lh3.ggpht.com/VJzsbZ4cCNHmdaLDRxDBU14AapsOWxnJ8M-OTLopbw0-SAXxbsbyTyAb4OYN9QAa04WIWtSYy7Zin0rxmpvWYCp7=s200-c http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Image
file:///Users/jajohnst/Desktop/si676-2024-data/data/xml/jajohnst-foaf.rdf#jamis-2010 http://purl.org/dc/terms/abstract This article presents an English-language introduction to the cimbalom, known in Czech as cimbál, as it is played in the Czech Republic. The article presents a holistic perspective on the cimbalom in Moravia, beginning with a descriptive organography, which covers Moravian organology evidence, historical iconography, and ethnographic evidence, with particular attention to the nineteenth-century ethnographic expeditions of Leos Janácek and folkloric nationalism. The article also proceeds to discuss musical style and the cimbalom's role in traditional ense

To then output, or serialize, the data, you can request different formats like turtle or JSON-LD:

In [6]:
# serialize in turtle
print(g.serialize(format="turtle"))

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<file:///Users/jajohnst/Desktop/si676-2024-data/data/xml/jajohnst-foaf.rdf#jesse> a foaf:Person ;
    foaf:depiction <http://lh3.ggpht.com/VJzsbZ4cCNHmdaLDRxDBU14AapsOWxnJ8M-OTLopbw0-SAXxbsbyTyAb4OYN9QAa04WIWtSYy7Zin0rxmpvWYCp7=s200-c> ;
    foaf:family_name "Johnston" ;
    foaf:firstName "Jesse" ;
    foaf:homepage <http://www.jesseajohnston.net/> ;
    foaf:knows <file:///Users/jajohnst/Desktop/si676-2024-data/data/xml/jajohnst-foaf.rdf#ricky> ;
    foaf:name "Jesse A. Johnston" ;
    foaf:publications <file:///Users/jajohnst/Desktop/si676-2024-data/data/xml/jajohnst-foaf.rdf#jamis-2010> ;
    foaf:schoolHomepage <http://www.umich.edu/> ;
    foaf:title "Ph.D." ;
    foaf:workplaceHomepage <http://www.neh.gov/> .

<file:///Users/jajohnst/Desktop/si676-2024-data/data/xml/jajohnst-foaf.rdf#

Or try output to JSON-LD:

In [7]:
print(g.serialize(format="json-ld", indent=2))

[
  {
    "@id": "http://deepblue.lib.umich.edu/handle/2027.42/87955",
    "http://purl.org/dc/elements/1.1/format": [
      {
        "@value": "pdf"
      }
    ]
  },
  {
    "@id": "http://lh3.ggpht.com/VJzsbZ4cCNHmdaLDRxDBU14AapsOWxnJ8M-OTLopbw0-SAXxbsbyTyAb4OYN9QAa04WIWtSYy7Zin0rxmpvWYCp7=s200-c",
    "@type": [
      "http://xmlns.com/foaf/0.1/Image"
    ],
    "http://purl.org/dc/elements/1.1/creator": [
      {
        "@value": "Ricardo Punzalan"
      }
    ],
    "http://purl.org/dc/elements/1.1/date": [
      {
        "@value": "2010"
      }
    ],
    "http://purl.org/dc/elements/1.1/description": [
      {
        "@value": "Photo of Jesse Johnston"
      }
    ]
  },
  {
    "@id": "file:///Users/jajohnst/Desktop/si676-2024-data/data/xml/jajohnst-foaf.rdf#ricky",
    "@type": [
      "http://xmlns.com/foaf/0.1/Person"
    ],
    "http://www.w3.org/2000/01/rdf-schema#seeAlso": [
      {
        "@id": "http://www.rpunzalan.com/"
      }
    ],
    "http://xmlns.com/foa

## Try a larger example

This takes in another graph and . . . 

In [8]:
# if you haven't imported rdflib yet, run this cell
from rdflib import Graph, Literal, RDF, URIRef

You can also import namespaces in RDF (not all, see the list of available namespaces here):

In [9]:
# rdflib knows about some namespaces, like FOAF
from rdflib.namespace import FOAF , XSD

In [10]:
# create a Graph
g = Graph()

In [12]:
# Create an RDF URI node to use as the subject for multiple triples
jane = URIRef("http://id.loc.gov/authorities/names/n79032879")

In [13]:
# Add triples using store's add() method.
g.add((jane, RDF.type, FOAF.Person))
g.add((jane, FOAF.nick, Literal("jane", lang="en")))
g.add((jane, FOAF.name, Literal("Jane Austen")))
g.add((jane, FOAF.mbox, URIRef("mailto:jane@austen.org")))

<Graph identifier=Nfe6e135c348e4453a746ae0a7ba06752 (<class 'rdflib.graph.Graph'>)>

In [14]:
# Add another person
jajohnst = URIRef("http://jesseajohnston.net/about")

In [18]:
# Add triples using store's add() method.
g.add((jajohnst, RDF.type, FOAF.Person))
g.add((jajohnst, FOAF.nick, Literal("Jesse", datatype=XSD.string)))
g.add((jajohnst, FOAF.name, Literal("Jesse A. Johnston")))
g.add((jajohnst, FOAF.mbox, URIRef("mailto:jajohnst@umich.edu")))

<Graph identifier=Nfe6e135c348e4453a746ae0a7ba06752 (<class 'rdflib.graph.Graph'>)>

In [17]:
# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
    print((s, p, o))

--- printing raw triples ---
(rdflib.term.URIRef('http://jesseajohnston.net/about'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/nick'), rdflib.term.Literal('Jesse', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('http://id.loc.gov/authorities/names/n79032879'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Jane Austen'))
(rdflib.term.URIRef('http://id.loc.gov/authorities/names/n79032879'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
(rdflib.term.URIRef('http://jesseajohnston.net/about'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
(rdflib.term.URIRef('http://jesseajohnston.net/about'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Jesse A. Johnston'))
(rdflib.term.URIRef('http://jesseajohnston.net/about'), rdflib.term

In [19]:
# For each foaf:Person in the store, print out their mbox property's value.
print("--- printing mboxes ---")
for person in g.subjects(RDF.type, FOAF.Person):
    for mbox in g.objects(person, FOAF.mbox):
        print(mbox)


--- printing mboxes ---
mailto:jane@austen.org
mailto:e.jajohnst@umich.edu
mailto:jajohnst@umich.edu


In [20]:
# Bind the FOAF namespace to a prefix for more readable output
g.bind("foaf", FOAF)

In [21]:
# print all the data in the Notation3 format
print("--- printing mboxes ---")
print(g.serialize(format='n3'))

--- printing mboxes ---
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://id.loc.gov/authorities/names/n79032879> a foaf:Person ;
    foaf:mbox <mailto:jane@austen.org> ;
    foaf:name "Jane Austen" ;
    foaf:nick "jane"@en .

<http://jesseajohnston.net/about> a foaf:Person ;
    foaf:mbox <mailto:e.jajohnst@umich.edu>,
        <mailto:jajohnst@umich.edu> ;
    foaf:name "Jesse A. Johnston" ;
    foaf:nick "Jesse"^^xsd:string .


