# Using `rdflib` to create a data graph

Linked data is somewhat more idiomatic in Python than is XML on its own.
To parse, manage, and created linked data according to the W3C standard Resource Description Framework (RDF),
you can use the `rdflib`. The library is documented at <https://rdflib.readthedocs.io/en/stable/index.html>.

This notebook demonstrates the creation of a data graph about Jane Austen, her novel Pride and Prejudice,
and then adds various standard information about an ebook manifestation of the work.  

## Import & Setup

If you don't yet have the library installed, you can use `pip` to install.
If you already have it, then skip the next cell:

In [None]:
!pip install rdflib

In [1]:
from rdflib import Graph, URIRef, Literal, BNode, Namespace

## Creating and Serializing basic DublinCore

The next section illustrates how to use a simple list to create a small graph of DublinCore data.

### First, add the namespaces you may need

The `rdflib` supports a few widely used metadata element sets and structures, including DublinCore.
To import all of these core namespaces, the library uses a process it calls "bind".
Once a graph is established with the `Graph()` function, you can use this approach to add the namespaces.
For more, see <https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html>. 

In [2]:
basicDCgraph = Graph(bind_namespaces="rdflib")

In [3]:
# you can add specific namespaces, like DublinCore
from rdflib.namespace import DCTERMS, DCMITYPE, FOAF, RDF, RDFS, XSD

In [4]:
# for the example, also add some other useful vocabularies

# id.loc.gov
locid = Namespace("http://id.loc.gov")
mods = Namespace("http://www.loc.gov/mods/v3")
# bibframe
bf = Namespace("http://id.loc.gov/ontologies/bibframe/")

# other
viaf = Namespace("https://viaf.org/viaf/")

# wikidata
wdt = Namespace("https://www.wikidata.org/wiki/")
wdtprop = Namespace("https://www.wikidata.org/wiki/Property:")


To attach, or make usable the namespaces in the graph under construction, use the `.bind()` function:

In [5]:
basicDCgraph.bind('bf',bf)
basicDCgraph.bind('wdt',wdt)
basicDCgraph.bind('wdtprop',wdtprop)
basicDCgraph.bind('rdfs',RDFS)
basicDCgraph.bind('dcterms',DCTERMS)
basicDCgraph.bind('xsd',XSD)
basicDCgraph.bind('dcmitype',DCMITYPE)

Some sample data about the work Pride and Prejudice, as manifested in the Standard Ebooks edition at <https://standardebooks.org/ebooks/jane-austen/pride-and-prejudice>.

In [6]:
pride = URIRef("http://id.loc.gov/authorities/names/n2002041181")
title = Literal("Pride and prejudice")
creator = Literal("Jane Austen")
jane = URIRef("http://id.loc.gov/authorities/names/n79032879")
creatorWDT = "Q36322"
date_of_publication = Literal(1813, datatype=XSD.date)
ebook = URIRef("https://standardebooks.org/ebooks/jane-austen/pride-and-prejudice")


Now, build the graph by "adding" triples using the `.add()` function:

In [7]:
# note triples are created as tuples

#basic info about Jane Austen
basicDCgraph.add((jane, RDFS.label, Literal("Austen, Jane, 1775-1817")))
basicDCgraph.add((jane, wdtprop.P31, wdt.Q5)) # jane was a human
basicDCgraph.add((jane, wdtprop.P800, wdt.Q170583)) # jane has the notable work Pride and Prejudice as exemplified in wikidata

print(basicDCgraph.serialize())

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .

<http://id.loc.gov/authorities/names/n79032879> rdfs:label "Austen, Jane, 1775-1817" ;
    wdtprop:P31 wdt:Q5 ;
    wdtprop:P800 wdt:Q170583 .




Use the `.n3()` function to see how particular elements or statements might look in N-Triples:

In [8]:
date_of_publication.n3(basicDCgraph.namespace_manager)

'"1813"^^xsd:date'

In [None]:
# now add some more information about Pride and Prejudice

basicDCgraph.add((pride, RDFS.label, title)) # assert this URI has title as its string label (i.e., nomen)
basicDCgraph.add((pride, wdtprop.P31, wdt.Q7725634)) # assert the resource is a literary work
basicDCgraph.add((pride, wdtprop.P1476, Literal("Pride and Prejudice",lang="en"))) # assert the resource has the title "Pride and Prejudice" in english
basicDCgraph.add((pride, bf.Work, URIRef("https://id.loc.gov/resources/works/15665436.html"))) # connect to the Bibframe work record (unclear if this is the right one)
basicDCgraph.add((pride, bf.Work, Literal("Pride & Prejudice", lang="en"))) # assert the resource has the bibframe work title Pride & Prejudice

print(basicDCgraph.serialize())

@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .

<http://id.loc.gov/authorities/names/n2002041181> rdfs:label "Pride and prejudice" ;
    bf:Work <https://id.loc.gov/resources/works/15665436.html>,
        "Pride & Prejudice"@en ;
    wdtprop:P1476 "Pride and Prejudice"@en ;
    wdtprop:P31 wdt:Q7725634 .

<http://id.loc.gov/authorities/names/n79032879> rdfs:label "Austen, Jane, 1775-1817" ;
    wdtprop:P31 wdt:Q5 ;
    wdtprop:P800 wdt:Q170583 .




In [10]:
# now add the rest of the data in DC statemensts
basicDCgraph.add((pride, DCTERMS.title, Literal("Pride and Prejudice", lang="en")))
basicDCgraph.add((pride, DCTERMS.creator, jane))
basicDCgraph.add((pride, DCTERMS.creator, Literal("Jane Austen")))
basicDCgraph.add((pride, DCTERMS.created, Literal(1813, datatype=XSD.date)))

print(basicDCgraph.serialize(format="ttl")) # print the graph in terse triples

@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://id.loc.gov/authorities/names/n2002041181> rdfs:label "Pride and prejudice" ;
    bf:Work <https://id.loc.gov/resources/works/15665436.html>,
        "Pride & Prejudice"@en ;
    dcterms:created "1813"^^xsd:date ;
    dcterms:creator <http://id.loc.gov/authorities/names/n79032879>,
        "Jane Austen" ;
    dcterms:title "Pride and Prejudice"@en ;
    wdtprop:P1476 "Pride and Prejudice"@en ;
    wdtprop:P31 wdt:Q7725634 .

<http://id.loc.gov/authorities/names/n79032879> rdfs:label "Austen, Jane, 1775-1817" ;
    wdtprop:P31 wdt:Q5 ;
    wdtprop:P800 wdt:Q170583 .




In [11]:
# and some information about the Standard Ebooks edition

basicDCgraph.add((pride, DCTERMS.hasVersion, ebook))
basicDCgraph.add((ebook, DCTERMS.isVersionOf, pride))
basicDCgraph.add((ebook, DCTERMS.identifier, BNode())) # ISBN
basicDCgraph.add((ebook, DCTERMS.rights, URIRef("https://creativecommons.org/publicdomain/zero/1.0/")))
basicDCgraph.add((ebook, DCTERMS.publisher, Literal("Standard Ebooks", lang="en")))
basicDCgraph.add((ebook, DCTERMS.description, Literal("Pride and Prejudice may today be one of Jane Austen’s most enduring novels, having been widely adapted to stage, screen, and other media since its publication in 1813. The novel tells the tale of five unmarried sisters and how their lives change when a wealthy eligible bachelor moves in to their neighborhood.", lang="en")))
basicDCgraph.add((ebook, DCTERMS.title, Literal("Pride and Prejudice", lang="en")))
basicDCgraph.add((ebook, DCTERMS.type, DCMITYPE.Text))
basicDCgraph.add((ebook, DCTERMS.format, Literal("epub")))
basicDCgraph.add((pride, DCTERMS.subject, Literal("novel of manners; courtship; England; economics of marriage; romance; enemies-to-lovers", lang="en")))

print(basicDCgraph.serialize())

@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix dcmitype: <http://purl.org/dc/dcmitype/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://id.loc.gov/authorities/names/n2002041181> rdfs:label "Pride and prejudice" ;
    bf:Work <https://id.loc.gov/resources/works/15665436.html>,
        "Pride & Prejudice"@en ;
    dcterms:created "1813"^^xsd:date ;
    dcterms:creator <http://id.loc.gov/authorities/names/n79032879>,
        "Jane Austen" ;
    dcterms:hasVersion <https://standardebooks.org/ebooks/jane-austen/pride-and-prejudice> ;
    dcterms:subject "novel of manners; courtship; England; economics of marriage; romance; enemies-to-lovers"@en ;
    dcterms:title "Pride and Prejudice"@en ;
    wdtprop:P1476 "Pride and Prejudice"@en ;
    wdtprop:P31 wd

Now possible to serialize in different formats, including RDF in XML, N triples, or JSON linked data:

In [12]:
# print the graph in RDF XML
print(basicDCgraph.serialize(format="xml"))

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
   xmlns:bf="http://id.loc.gov/ontologies/bibframe/"
   xmlns:dcterms="http://purl.org/dc/terms/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:wdtprop="https://www.wikidata.org/wiki/Property:"
>
  <rdf:Description rdf:about="http://id.loc.gov/authorities/names/n2002041181">
    <rdfs:label>Pride and prejudice</rdfs:label>
    <wdtprop:P31 rdf:resource="https://www.wikidata.org/wiki/Q7725634"/>
    <wdtprop:P1476 xml:lang="en">Pride and Prejudice</wdtprop:P1476>
    <bf:Work rdf:resource="https://id.loc.gov/resources/works/15665436.html"/>
    <bf:Work xml:lang="en">Pride &amp; Prejudice</bf:Work>
    <dcterms:title xml:lang="en">Pride and Prejudice</dcterms:title>
    <dcterms:creator rdf:resource="http://id.loc.gov/authorities/names/n79032879"/>
    <dcterms:creator>Jane Austen</dcterms:creator>
    <dcterms:created rdf:datatype="http://www.w3.org/2001/XMLSchema

In [13]:
# print the triples, encoded in UTF-8
print(basicDCgraph.serialize(format="nt"))

<http://id.loc.gov/authorities/names/n2002041181> <http://id.loc.gov/ontologies/bibframe/Work> "Pride & Prejudice"@en .
<http://id.loc.gov/authorities/names/n79032879> <https://www.wikidata.org/wiki/Property:P800> <https://www.wikidata.org/wiki/Q170583> .
<https://standardebooks.org/ebooks/jane-austen/pride-and-prejudice> <http://purl.org/dc/terms/isVersionOf> <http://id.loc.gov/authorities/names/n2002041181> .
<http://id.loc.gov/authorities/names/n2002041181> <http://purl.org/dc/terms/creator> "Jane Austen" .
<https://standardebooks.org/ebooks/jane-austen/pride-and-prejudice> <http://purl.org/dc/terms/type> <http://purl.org/dc/dcmitype/Text> .
<http://id.loc.gov/authorities/names/n2002041181> <https://www.wikidata.org/wiki/Property:P31> <https://www.wikidata.org/wiki/Q7725634> .
<http://id.loc.gov/authorities/names/n2002041181> <http://purl.org/dc/terms/created> "1813"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://id.loc.gov/authorities/names/n2002041181> <http://purl.org/dc/term

### Saving the output for export

Now, if you want to reuse or export, you can output the graph by adding the "destination" argument to the `.serialize()` function:  

In [14]:
basicDCgraph.serialize(format="xml", destination="pride-and-prejudice-rdf.xml")

<Graph identifier=N006288ed4a294248a5d0b1fab3bab05c (<class 'rdflib.graph.Graph'>)>

In [15]:
basicDCgraph.serialize(format="json-ld", destination="pride-and-prejudice.json")

<Graph identifier=N006288ed4a294248a5d0b1fab3bab05c (<class 'rdflib.graph.Graph'>)>