# Using Python to read and print linked data

This notebook offers a preliminary and quick introduction to the `rdflib` 
library, which supports the parsing and serializing of linked data in Python.
The library supports output in RDF/XML, N3, Turtle, and JSON-LD.

## Setup

If you don't have [RDFLib](https://rdflib.readthedocs.io/en/), import it:

In [None]:
!pip install rdflib

## Background

Before getting started, it is useful to survey some of the 
resources and approaches in linked data, or semantic data. 
These include a few different metadata schemes, which are
widely used across the web and provide a core of "semantic data
standards". These include:

* RDF (Resource Description Framework) - the initial Web data linking standard,
which was serialized and shared initially in XML according to a W3C standard.
* FOAF (Friend of a Friend) - a shared metadata scheme that defines many terms
useful for describing people and relationships between them, their work, and
sometimes even creations. 
* Wikidata - a linked data service that is openly sourced by a large
user community. The platform offers URIs for each concept, including a basic
distinction between entities (Q nodes) and properties (P nodes, which can act as both links and nodes).
* ID.loc.gov - the Library of Congress' Linked Data service, which provides URIs for 
all of the major LOC vocabularies, including subject headings, name authorities, and more.

The demonstration below makes use of all of these resources, amongst others.

For the purposes of the demonstration, let's build a linked data
graph of novels by Jane Austen and associated entities. 

Some useful Wikidata resources: 

* Instance-of property (`P31`)
* Jane Austen, English novelist (`Q36322`)
* Pride and Prejudice / by Jane Austen (`Q170583`)
* Literary work (`Q7725634`)
* title property (`P1476`) - often takes a literal?
* author property (`P50`) - note the directionality is particularly clear here (a person is not "an author of", but rather a work "has author"), so in this case the property applies to works, not agents. 

## Demonstration

The cells below provide a quick demonstration of importing the library,
reading and parsing an RDF document, then serializing that data
in another format.

This demonstrates using the sample data described in lecture. 

In [1]:
# if you haven't imported rdflib yet, run this cell
from rdflib import Graph, Literal, RDF, URIRef, Namespace

In [2]:
# you can add specific namespaces, like DublinCore
from rdflib.namespace import DCTERMS, FOAF, RDF

Set up additional namespaces that aren't default in the module:

In [3]:
locid = Namespace('http://id.loc.gov')
wdt = Namespace('https://www.wikidata.org/wiki/')
wdtprop = Namespace('https://www.wikidata.org/wiki/Property:')
mods = Namespace('http://www.loc.gov/mods/v3')
viaf = Namespace('https://viaf.org/viaf/')

In [4]:
# create a Graph
g = Graph()

Use the `.bind()` method to attach the namespaces to the specific graph that was just created:

In [5]:
# set namespaces
g.bind('locid', locid)
g.bind('wdt', wdt)
g.bind('wdtprop', wdtprop)
g.bind('mods', mods)
g.bind('foaf', FOAF)
g.bind('dcterms', DCTERMS)

Let's create some triples about Jane Austen

In [6]:
jane = URIRef('http://id.loc.gov/authorities/names/n79032879') # assign a URI for the person (in this case, we know already there is an LOC authority URI)
name = Literal('Jane Austen') # the name as a string or 'literal'

# create triples about Jane Austen & Pride & Prejudice using .add() 
g.add((jane, RDF.type, FOAF.Person)) # assert that the URI represents a person
g.add((jane, FOAF.name, name)) # assert that the literal represents the name of the said person
g.add((jane, wdtprop.P31, wdt.Q36322)) # asser that the loc authority is also an instance of the WikiData entity for Jane Austen
g.add((jane, DCTERMS.creator, Literal('Pride and Prejudice', lang='en'))) # assert that the URI is the creator of another thing, represented by this string
g.add((wdt.Q170583, wdtprop.P1476, Literal('Pride and Prejudice', lang='en'))) # assert that Pride and Prejudice has the title 
g.add((wdt.Q170583, DCTERMS.creator, jane)) # assert that jane is the creator of Q170583 as well
g.add((wdt.Q170583, wdtprop.P50, jane)) # asserts that the novel pride and prejudice has the author Jane
g.add((jane, wdtprop.P2963, URIRef('https://www.goodreads.com/author/show/1265')))

<Graph identifier=N5419ce11adcd4f30be524b60ab57dae5 (<class 'rdflib.graph.Graph'>)>

In [7]:
print(g.serialize(format='ttl'))

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .

wdt:Q170583 dcterms:creator <http://id.loc.gov/authorities/names/n79032879> ;
    wdtprop:P1476 "Pride and Prejudice"@en ;
    wdtprop:P50 <http://id.loc.gov/authorities/names/n79032879> .

<http://id.loc.gov/authorities/names/n79032879> a foaf:Person ;
    dcterms:creator "Pride and Prejudice"@en ;
    foaf:name "Jane Austen" ;
    wdtprop:P2963 <https://www.goodreads.com/author/show/1265> ;
    wdtprop:P31 wdt:Q36322 .




In [8]:
print(g.serialize(format='json-ld'))

[
  {
    "@id": "http://id.loc.gov/authorities/names/n79032879",
    "@type": [
      "http://xmlns.com/foaf/0.1/Person"
    ],
    "http://purl.org/dc/terms/creator": [
      {
        "@language": "en",
        "@value": "Pride and Prejudice"
      }
    ],
    "http://xmlns.com/foaf/0.1/name": [
      {
        "@value": "Jane Austen"
      }
    ],
    "https://www.wikidata.org/wiki/Property:P2963": [
      {
        "@id": "https://www.goodreads.com/author/show/1265"
      }
    ],
    "https://www.wikidata.org/wiki/Property:P31": [
      {
        "@id": "https://www.wikidata.org/wiki/Q36322"
      }
    ]
  },
  {
    "@id": "https://www.wikidata.org/wiki/Q170583",
    "http://purl.org/dc/terms/creator": [
      {
        "@id": "http://id.loc.gov/authorities/names/n79032879"
      }
    ],
    "https://www.wikidata.org/wiki/Property:P1476": [
      {
        "@language": "en",
        "@value": "Pride and Prejudice"
      }
    ],
    "https://www.wikidata.org/wiki/Property:P5

Now, add some more information. For example, add triples about two of Austen's novels: _Pride and Prejudice_ and _Emma_.

In [9]:
# add information about two novels

g.add((jane, DCTERMS.creator, Literal("Emma",lang="en"))) #add Emma as a literal
g.add((jane, DCTERMS.creator, wdt.Q223880)) #add Emma as a wikidata URI
g.add((wdt.Q223880, wdtprop.P31, wdt.Q7725634)) # Emma is an instance of (P31) a literary work, (Q7725634)

<Graph identifier=N5419ce11adcd4f30be524b60ab57dae5 (<class 'rdflib.graph.Graph'>)>

In [10]:
print(g.serialize(format='ttl'))

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .

wdt:Q170583 dcterms:creator <http://id.loc.gov/authorities/names/n79032879> ;
    wdtprop:P1476 "Pride and Prejudice"@en ;
    wdtprop:P50 <http://id.loc.gov/authorities/names/n79032879> .

wdt:Q223880 wdtprop:P31 wdt:Q7725634 .

<http://id.loc.gov/authorities/names/n79032879> a foaf:Person ;
    dcterms:creator wdt:Q223880,
        "Emma"@en,
        "Pride and Prejudice"@en ;
    foaf:name "Jane Austen" ;
    wdtprop:P2963 <https://www.goodreads.com/author/show/1265> ;
    wdtprop:P31 wdt:Q36322 .




Let's link the "Emma" literal to the Emma Q URI (`Q223880`), using P1476, has title

In [11]:
# add title for Emma Q identifier
g.add((wdt.Q223880, wdtprop.P1476, Literal("Emma",lang="en")))

<Graph identifier=N5419ce11adcd4f30be524b60ab57dae5 (<class 'rdflib.graph.Graph'>)>

In [12]:
print(g.serialize(format="ttl"))

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .

wdt:Q170583 dcterms:creator <http://id.loc.gov/authorities/names/n79032879> ;
    wdtprop:P1476 "Pride and Prejudice"@en ;
    wdtprop:P50 <http://id.loc.gov/authorities/names/n79032879> .

wdt:Q223880 wdtprop:P1476 "Emma"@en ;
    wdtprop:P31 wdt:Q7725634 .

<http://id.loc.gov/authorities/names/n79032879> a foaf:Person ;
    dcterms:creator wdt:Q223880,
        "Emma"@en,
        "Pride and Prejudice"@en ;
    foaf:name "Jane Austen" ;
    wdtprop:P2963 <https://www.goodreads.com/author/show/1265> ;
    wdtprop:P31 wdt:Q36322 .




### Basic example: Octavia

Simple "graph" (one statement) about Octavia Butler: 

In [13]:
# initialize graph
octavia = Graph()

# bind prefixes / import namespaces
octavia.bind('wdt', wdt)
octavia.bind('wdtprop', wdtprop)
octavia.bind('viaf', viaf)
octavia.bind('dcterms', DCTERMS)

octavia

<Graph identifier=N0a7bcebd3a7e436d89fb210c1bf6aa69 (<class 'rdflib.graph.Graph'>)>

In [14]:
# create a statement
octavia.add((viaf.v34453955, wdtprop.P31, wdt.P5))

# display
print(octavia.serialize(format='ttl'))

@prefix viaf: <https://viaf.org/viaf/> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .

viaf:v34453955 wdtprop:P31 wdt:P5 .




## Working with a Bibliographic Example

The following illustrate a graph with various URIs tied to LoC's linked data service

In [15]:
# Create an RDF URI node to use as the subject for multiple triples
huckfinnURI = URIRef("http://id.loc.gov/authorities/names/n79132705")

In [16]:
# Add another resource
twainURI = URIRef("http://id.loc.gov/authorities/names/n79021164")

The `.add()` method is a way to add specific triple statements:

In [18]:
g.add((huckfinnURI, wdt.P50, twainURI))

<Graph identifier=N5419ce11adcd4f30be524b60ab57dae5 (<class 'rdflib.graph.Graph'>)>

In [20]:
g.add((huckfinnURI, DCTERMS.title, Literal('Adventures of Huckleberry Finn', lang='en')))

<Graph identifier=N5419ce11adcd4f30be524b60ab57dae5 (<class 'rdflib.graph.Graph'>)>

In [21]:
print(g.serialize(format='n3'))

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wdt: <https://www.wikidata.org/wiki/> .
@prefix wdtprop: <https://www.wikidata.org/wiki/Property:> .

<http://id.loc.gov/authorities/names/n79132705> dcterms:title "Adventures of Huckleberry Finn"@en ;
    wdt:P50 <http://id.loc.gov/authorities/names/n79021164> .

wdt:Q170583 dcterms:creator <http://id.loc.gov/authorities/names/n79032879> ;
    wdtprop:P1476 "Pride and Prejudice"@en ;
    wdtprop:P50 <http://id.loc.gov/authorities/names/n79032879> .

wdt:Q223880 wdtprop:P1476 "Emma"@en ;
    wdtprop:P31 wdt:Q7725634 .

<http://id.loc.gov/authorities/names/n79032879> a foaf:Person ;
    dcterms:creator wdt:Q223880,
        "Emma"@en,
        "Pride and Prejudice"@en ;
    foaf:name "Jane Austen" ;
    wdtprop:P2963 <https://www.goodreads.com/author/show/1265> ;
    wdtprop:P31 wdt:Q36322 .




In [22]:
print(g.serialize(format='json-ld'))

[
  {
    "@id": "http://id.loc.gov/authorities/names/n79032879",
    "@type": [
      "http://xmlns.com/foaf/0.1/Person"
    ],
    "http://purl.org/dc/terms/creator": [
      {
        "@language": "en",
        "@value": "Pride and Prejudice"
      },
      {
        "@language": "en",
        "@value": "Emma"
      },
      {
        "@id": "https://www.wikidata.org/wiki/Q223880"
      }
    ],
    "http://xmlns.com/foaf/0.1/name": [
      {
        "@value": "Jane Austen"
      }
    ],
    "https://www.wikidata.org/wiki/Property:P2963": [
      {
        "@id": "https://www.goodreads.com/author/show/1265"
      }
    ],
    "https://www.wikidata.org/wiki/Property:P31": [
      {
        "@id": "https://www.wikidata.org/wiki/Q36322"
      }
    ]
  },
  {
    "@id": "https://www.wikidata.org/wiki/Q223880",
    "https://www.wikidata.org/wiki/Property:P1476": [
      {
        "@language": "en",
        "@value": "Emma"
      }
    ],
    "https://www.wikidata.org/wiki/Property:P31"