# Linked Data with Omeka S Provided Example

Linked data can be retrieved from the Omeka S API,
and you can process this information using the `rdflib` in Python.
In short, this notebook illustrates an extract and transform process
that retrieves two data sources (i.e., *extracts* data from existing
Omeka S items and an imaginary triple store), then combines
the information in a new data graph that can be expressed in
triples (i.e., *transformed* into a new data structure).
An assumed final step, where data might be exported back into JSON-LD
then returned into Omeka S (i.e., *load*), is not illustrated here.

## Setup

This section imports requests for working the REST API. More importantly, however,
are various import statements for `rdflib`.
Of particular note are the import of various specific RDF datatypes, including Graph, URIRef, Literal, BNode (for blank nodes), and Namespace.
The serializer and parser functions assist in processing graph data into various transport formats, including RDF in XML, JSON, and turtle, among others.
Finally, note the last import line, which imports various built-in schemes that can be used as namespaces,
including regular Resource Description Framework datatypes (both RDF and RDFs), Friend of a Friend (FOAF), DublinCore extended terms (DCTERMS), and schema.org (SDO).

In [1]:
import requests
import rdflib
from rdflib import Graph, URIRef, Literal, BNode, Namespace, plugin, Variable
from rdflib.serializer import Serializer
from rdflib.plugin import register, Parser
from rdflib.namespace import RDF, RDFS, FOAF, DCTERMS, SDO

In [2]:
# create sample data to add to the graph
newData = {
    'Jane Austen' : {
        'https://schema.org/deathDate' : 1817,
        'https://schema.org/birthDate' : 1775,
        'https://schema.org/deathPlace': 'https://en.wikipedia.org/wiki/England'
    },
    'Octavia E. Butler' : { 
        'https://schema.org/deathDate' : 2006,
        'https://schema.org/birthDate' : 1947,
        'https://schema.org/deathPlace': 'https://en.wikipedia.org/wiki/Lake_Forest_Park,_Washington'
        },
    'Herman Melville' : { 
        'https://schema.org/deathDate' : 1891,
        'https://schema.org/birthDate' : 1819,
        'https://schema.org/deathPlace' : 'https://en.wikipedia.org/wiki/New_York_City'
        }
}

Add namespace information for Omeka S's scheme:

In [3]:
omekas_ns = Namespace('http://omeka.org/s/vocabs/o#')

## Retrieve Data from Omeka S

Search for all the items in the specified set

In [4]:
url = 'http://jajohnst.si676.si.umich.edu/omeka-s/api'

action = '/items'

# if you create items in your Omeka S site,
# your item set will have a different id (specific to your site)
parameters = {
    'item_set_id':311,
}

In [5]:
r = requests.get(url + action, params=parameters)

print(r.url)
print(r.status_code)

http://jajohnst.si676.si.umich.edu/omeka-s/api/items?item_set_id=311
200


In [6]:
r.json()

[{'@context': 'http://jajohnst.si676.si.umich.edu/omeka-s/api-context',
  '@id': 'http://jajohnst.si676.si.umich.edu/omeka-s/api/items/303',
  '@type': 'o:Item',
  'o:id': 303,
  'o:is_public': True,
  'o:owner': {'@id': 'http://jajohnst.si676.si.umich.edu/omeka-s/api/users/2',
   'o:id': 2},
  'o:resource_class': None,
  'o:resource_template': None,
  'o:thumbnail': None,
  'o:title': 'A Mere Title for an item created via the API',
  'thumbnail_display_urls': {'large': None, 'medium': None, 'square': None},
  'o:created': {'@value': '2025-11-07T23:16:45+00:00',
   '@type': 'http://www.w3.org/2001/XMLSchema#dateTime'},
  'o:modified': {'@value': '2025-11-14T04:56:54+00:00',
   '@type': 'http://www.w3.org/2001/XMLSchema#dateTime'},
  'o:primary_media': None,
  'o:media': [],
  'o:item_set': [{'@id': 'http://jajohnst.si676.si.umich.edu/omeka-s/api/item_sets/195',
    'o:id': 195},
   {'@id': 'http://jajohnst.si676.si.umich.edu/omeka-s/api/item_sets/311',
    'o:id': 311}],
  'o:site': [{

## Parse data with the RDFLib module

Using the `rdflib` module capabilities, parse this data.

First, create an RDF graph from it. Note in the following the usage of the `bind_namespaces` argument, which binds all of the imported namespaces
so that they are available later when writing new triples in the graph.

In [10]:
g = Graph(bind_namespaces='rdflib').parse(data=r.text, format='json-ld')

Add the Omeka S namespace (`omekas_ns`, prefixed as `o`):

In [11]:
g.bind('o',omekas_ns)

Now, look through the graph. The graph is a series of "triples",
which are subject-predicate-object tuples. These can be modified. example, after the initial look, you can remove all of those with the Omeka S namespace (`o`).
Note that RDFLib may drop or delete any orphaned subjects or objects that may not be part of a triple. 

In [12]:
for s, p, o in g:
    print(f'{s} -> {p} -> {o} .')

http://jajohnst.si676.si.umich.edu/omeka-s/api/resource_classes/1 -> http://omeka.org/s/vocabs/o#id -> 1 .
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/309 -> http://omeka.org/s/vocabs/o#owner -> http://jajohnst.si676.si.umich.edu/omeka-s/api/users/1 .
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/312 -> http://www.w3.org/1999/02/22-rdf-syntax-ns#type -> http://omeka.org/s/vocabs/o#Item .
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/306 -> http://omeka.org/s/vocabs/o#item_set -> http://jajohnst.si676.si.umich.edu/omeka-s/api/item_sets/195 .
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/306 -> http://purl.org/dc/terms/title -> An image of an Orca, archived from an old website, and uploaded via the API .
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/306 -> http://omeka.org/s/vocabs/o#site -> http://jajohnst.si676.si.umich.edu/omeka-s/api/sites/1 .
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/306 -> http://omeka.org/s/vocabs/o#created -> 2025-1

### Outputting, saving, and serializing

Convert the graph to the terse triple format, a.k.a. *Turtle*:

In [13]:
ser = g.serialize(format='turtle')

print(ser)

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix o: <http://omeka.org/s/vocabs/o#> .
@prefix schema: <https://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://jajohnst.si676.si.umich.edu/omeka-s/api/items/303> a o:Item ;
    o:created "2025-11-07T23:16:45+00:00"^^xsd:dateTime ;
    o:id 303 ;
    o:is_public true ;
    o:item_set <http://jajohnst.si676.si.umich.edu/omeka-s/api/item_sets/195>,
        <http://jajohnst.si676.si.umich.edu/omeka-s/api/item_sets/311> ;
    o:modified "2025-11-14T04:56:54+00:00"^^xsd:dateTime ;
    o:owner <http://jajohnst.si676.si.umich.edu/omeka-s/api/users/2> ;
    o:site <http://jajohnst.si676.si.umich.edu/omeka-s/api/sites/1> ;
    o:title "A Mere Title for an item created via the API" ;
    dcterms:rights "No known restrictions on publication."@en-us ;
    dcterms:title "A Mere Title for an item created via the API"@en-us .

<http://jajohnst.si676.si.umich.edu/omeka-s/api/item

Save it to a file

In [14]:
with open('item-set-graph-1.ttl', 'w') as f:
    f.write(ser)

## Parsing, Modifying, and Adding to the Graph

Now try to remove the Omeka data in order to get a closer look
at the collection specific data.
To do this, the following loop uses `.remove()` to delete triples
with the Omeka S vocabulary namespace in the *predicate* (element 1 of the tuple).

In [15]:
# remove the omeka specific data
for triple in g:
    if 'http://omeka.org/s/vocabs/o#' in triple[1]:
        g.remove(triple)

In [16]:
for s, p, o in g:
    print(f'{s} -> {p} -> {o}')

http://jajohnst.si676.si.umich.edu/omeka-s/api/items/308 -> http://www.w3.org/1999/02/22-rdf-syntax-ns#type -> http://purl.org/dc/terms/Agent
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/312 -> http://xmlns.com/foaf/0.1/name -> Octavia E. Butler
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/312 -> https://schema.org/identifier -> Q239739
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/312 -> http://www.w3.org/1999/02/22-rdf-syntax-ns#type -> http://omeka.org/s/vocabs/o#Item
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/306 -> http://purl.org/dc/terms/title -> An image of an Orca, archived from an old website, and uploaded via the API
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/303 -> http://www.w3.org/1999/02/22-rdf-syntax-ns#type -> http://omeka.org/s/vocabs/o#Item
http://jajohnst.si676.si.umich.edu/omeka-s/api/items/309 -> http://www.w3.org/1999/02/22-rdf-syntax-ns#type -> http://omeka.org/s/vocabs/o#Item
http://jajohnst.si676.si.umich.edu/omeka-s/

### Adding New Triples to the Graph

To create new triples for the graph, use the information
previously created and stored in the `newData` variable.
This was a small dictionary. The process will use the `rdflib`
functions `.triples()` and `.add()`.

The `.triples()` is used here to match the new triples to
the existing nodes. In this case, the function will look for
existing nodes with a string that exactly matches
an existing node `foaf:name` property by looking for a *predicate*
with the *foaf* property and an *object* that matches the string literal.

Then, for each matching triple, the `.add()` function
creates new triples with the matched nodes as the *subject*,
builtin schema properties using a construction like `SDO.property`,
then provides the new data as the triple *object*.

In [17]:
# Note: this will only work if the Keys are in the data already on the site,
# so the data must be uploaded and added first

# loop through the "newData" dictionary with an 'author_name' iterator
for author_name in newData: 
    # match existing triples, stored as 3-part tuples, that match the 'author_name'
    for s, p, o, in g.triples((None, FOAF.name, Literal(author_name))):
        # retrieve the data by matching on the name (here stored in the object part of the matched triple)
        birthDate = newData[o.value]['https://schema.org/birthDate']
        deathDate = newData[o.value]['https://schema.org/deathDate']
        deathPlace = newData[o.value]['https://schema.org/deathPlace']

        # add the new data to the matched node
        g.add((s, SDO.birthDate, Literal(birthDate)))
        g.add((s, SDO.deathDate, Literal(deathDate)))
        g.add((s, SDO.deathPlace, URIRef(deathPlace)))

To see how the graph changed, serialize the new graph:

In [18]:
ser2 = g.serialize(format='turtle')

with open('item-set-graph-2.ttl', 'w') as f:
    f.write(ser2)