# RDF version of Open Stemmata data

Author: Elena Spadini

Date: April 2023

Reusing data from: Jean-Baptiste Camps, Gustave Fernandez Riva, Simon Gabay, *Open Stemmata: Database*, version 0.1, 2021, https://github.com/OpenStemmata/database/ (CC BY-SA 4.0).

[Open Stemmata](https://openstemmata.github.io/) is an organisation aiming at creating an Open Source database of textual genealogies (i.e. stemmata), curated by Jean-Baptiste Camps and Gustavo Riva. In the database, a DOT (Graphviz) transcription is provided for each stemma, alongside metadata (in txt and xml/tei) and images.

The DOT transcriptions from Open Stemmata have been transformed here to RDF. Please note that this RDF dataset has not been checked by the Open Stemmata curators and any error is the sole responsability of the transformation author.

This transformation has been made for testing purposes: the RDF version of the data has an enriched semantic structure that can be queried through SPARQL and might help to gain new insights from the database. The ontology created is also part of the test, the URL is not resolvable and the data URIs won't be maintained.

This jupyter notebook is a playground for exploring the RDF dataset. It can be run locally or through the myBinder button.

The data transformed to RDF is available [here](openStemmataData.ttl). The ontology used to structure the RDF data is available [here](stemma-onto.ttl).

## Data structure (ontology)

The ontology expresses in OWL the modelling choices of OpenStemmata (*[Prepare your stemma](https://openstemmata.github.io/guidelines.html)*.

A human-readable html version of the ontology is available at [stemma-onto.html](stemma-onto.html), produced with [pyLode](https://github.com/RDFLib/pyLODE).

### Ontology description

There are two main classes, `:Stemma` and `:Witness`. The second has two subclasses, `:ExistingWitness` (also a subclass of `frbr:Expression`) and `:HypotheticalWitness`.

Each `:Witness` is related to a `:Stemma` through the property `:isWitnessIn`. For each stemma, the language of the work is recorded (`dct:language`) and the original location in OpenStemmata is provided as a link to the Github repo (`:hasOpenStemmataEntry`).

The stemmatic relations between the witnesses are all subclasses of `:isAncestorOf` (inverse property `:isDescendantOf`): `:isExemplarOf`, `:isHypotheticalExemplarOf`, `:isContaminatingExemplarOf`, and `:isHypotheticalContaminatingExemplarOf`.


## Query the data

Add your SPARQL query into `my_query` and run. You might need to format the results. 

See below for **examples of queries**.

Please note that no inference is supported here (e.g., subclasses), unless you expand the graph with owlrl, for which see Example 2 below.

In [1]:
import rdflib
g = rdflib.Graph()
g.parse("openStemmataData.ttl")

my_query = """
SELECT DISTINCT *
WHERE {
    ?s ?p ?o
} LIMIT 5"""

qres = g.query(my_query)
for row in qres:
    print(row)


(rdflib.term.Literal('υ'), rdflib.term.URIRef('file:///home/elena/Dropbox/b_boulot/workshop_conferenze_SPEAKER/2023_04_newcastleVirtual/openStemmata/repo/Kenney_1995_Metamorphoses_node_ypsilon'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'))
(rdflib.term.URIRef('file:///home/elena/Dropbox/b_boulot/workshop_conferenze_SPEAKER/2023_04_newcastleVirtual/openStemmata/repo/Cartelle_1987_LMinor_De_Coitu_stemma'), rdflib.term.URIRef('file:///home/elena/Dropbox/b_boulot/workshop_conferenze_SPEAKER/2023_04_newcastleVirtual/openStemmata/repo/Cartelle_1987_LMinor_De_Coitu_node_F'), rdflib.term.URIRef('http://example.com/stemma-onto#isWitnessIn'))
(rdflib.term.Literal('Mz'), rdflib.term.URIRef('file:///home/elena/Dropbox/b_boulot/workshop_conferenze_SPEAKER/2023_04_newcastleVirtual/openStemmata/repo/Korte_1914_RenMont1_node_Mz'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'))
(rdflib.term.URIRef('file:///home/elena/Dropbox/b_boulot/workshop_conferenze_SPEAKER/

## Example 1. All stemmas with contamination

The query selects all stemmas whose witnesses are linked through properties `:isContaminatingExemplarOf` or `:isHypotheticalContaminatingExemplarOf`, the two properties expressing contamination in this model.

At the moment (April 2023) there are 161 stemmas in total and 33 of them includes contamination, that is around 20%.

In [32]:
import rdflib
g = rdflib.Graph()
g.parse("openStemmataData.ttl")

my_query = """
PREFIX stemma-onto: <http://example.com/stemma-onto#>
PREFIX dct: <http://purl.org/dc/terms/>

select distinct
?stemma
where { 
    {?witness stemma-onto:isContaminatingExemplarOf ?descendant }
    UNION
    {?witness stemma-onto:isHypotheticalContaminatingExemplarOf ?descendant}
    
    ?witness stemma-onto:isWitnessIn ?stemma .
} 
"""

qres = g.query(my_query)
print(str(len(qres))+" RESULTS\n")
for row in qres:
    stemmaName = row.stemma.split('/')[-1]
    print(stemmaName)

33 RESULTS

Nicolodi_2003_Romani-Turco_stemma
Trovato_2004_Sannazaro-Arcadia_stemma
Einar_1954_Brennu-njalssaga_stemma
CataldiPalau_1982_LesAmours_stemma
Erbse_1961_Aratos-Phainomena_stemma
Erbse_1961_Thukydides-Historiai_stemma
Arnott_1996_Athenaeus_stemma
Zink_1984_Cleriadus_stemma
Stroński_1906_EliasDeBarjolsVII_stemma
Stroński_1906_EliasDeBarjolsXIV_stemma
Stroński_1906_EliasDeBarjolsX_stemma
Jardin_2013_SumaDeReyes_stemma
Pfannmueller_1911_Heidin_stemma
Spiller_1909_BayerischeChronik_stemma
Wetzel_1992_Tristan_stemma
Strippel_1978_KoeniginVonFrankreich_stemma
Mettke_1974_ArmerHeinrich_stemma
Brackert_1963_Nibelungenlied_stemma
Strauch_1900_Fuerstenbuch_stemma
LeonardiTrachsler_2015_Meliadus_stemma
Constans_1890_Thebes_stemma
Zufferey_2007_Alexis_stemma
Demaison_1887_Aimeri_stemma
Bedier_1902_Tristan_stemma
Salverda_1888_Eneas_stemma
Rolin_1897_Aliscans2-Rainouart_stemma
Gundlach_1883_SiegeBarbastre_stemma
Kenney_1995_Metamorphoses_stemma
Shaw_2009_Monarchia_stemma
Petrocchi_1966_C

## Example 2. Number of descendants per witness (and OWLRL reasoner)

This query selects the number of descendants per witness, using the superproperty `:isAncestorOf` and the superclass `:Witness`.

The results are 1299 witnesses and the corresponding number of descendants for each of them. The results are saved to a CSV file for better readability. On the 1299 total witnesses:
- 20 have more than 4 descendants (5 to 11)
- 40 have 4 descendants
- 123 have 3 descendants
- 791 have 2 descendants
- 325 have 1 descendant

### Reasoner
To run this query, a reasoner is needed to infer the sub/super classes and properties relations. The same query could be written without making use of them, but it would be much more verbose. The reasoner used here is [OWL-RL](https://github.com/RDFLib/OWL-RL), an extension of the `RDFLib` library. It is rather slow, but it works here for our testing purposes. Note that both the data and the ontology are needed (`g.parse(onto) + g.parse(data)`).


In [34]:
import rdflib
import owlrl

g = rdflib.Graph()
g.parse("stemma-onto.ttl") + g.parse("openStemmataData.ttl")

owlrl.DeductiveClosure(owlrl.OWLRL_Semantics).expand(g)

my_query = """
PREFIX stemma-onto: <http://example.com/stemma-onto#>
select
?witness  (COUNT( ?witness) AS ?numberOfDescendants) 
where { 
    ?witness stemma-onto:isAncestorOf ?descendant 
    
} GROUP BY ?witness
"""

qres = g.query(my_query)
print(str(len(qres))+" RESULTS\n\n")
for row in qres:
    print(row.witness.split('/')[-1])
    print(row.numberOfDescendants)
    
# save results in a csv
import csv
csvFileName = 'resultExport/witnessANDnumberOfDescendants.csv'
with open(csvFileName, 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in qres:
        witnessName = row.witness.split('/')[-1]
        numberOfDescendants = row.numberOfDescendants
        csvwriter.writerow([witnessName, numberOfDescendants])


1299 RESULTS


Rolin_1897_Aliscans2-Rainouart_node_B
2
Vietor_1876_Loherains_node_lambda
2
LeonardiTrachsler_2015_GuironCourtois2_node_delta3
2
Huebner_1964_Rennewart_node_Alpha
3
Erbse_1961_Lykophron-Alexandra_node_AB
2
Korte_1914_RenMont6_node_y
3
Wagner_1972_SpiritualisPhilosophia_node_Archetypus
2
Stroński_1906_EliasDeBarjolsXII_node_x
3
Bedier_1928_lai_node_O
2
Segre_1971_Roland_node_2
2
Jardin_2013_SumaDeReyes_node_gamma
2
LeonardiTrachsler_2015_GuironCourtois2_node_epsilon3
3
Behaghel_1882_Eneas_node_X1
2
LeonardiTrachsler_2015_Meliadus_node_delta4
2
Trovato_2004_Sannazaro-Arcadia_node_Fi1
2
Nicolodi_2003_Romani-Turco_node_Mi14
5
Masami_2005_Miracles2_node_beta
2
Ponceau_1997c_SaintGraal_node_5
2
Zufferey_2010_Alexis_node_b
3
Zufferey_2010_Alexis_node_a
3
Trovato_2004_Sannazaro-Arcadia_node_Bind1
4
Karnein_1970_DeAmoreDeutsch_node_Archetypus
3
Witzel_1995_ElsTrojaBuch_node_X3
2
Stroński_1906_EliasDeBarjolsIX_node_x
2
Stroński_1906_EliasDeBarjolsX_node_archetype
2
Resconi_2014_Ch