# Creating a SKOS Thesaurus from the cleaned signature schema

We'll use [RDFlib](https://rdflib.readthedocs.io) for managing SKOS terms etc. You don't _have_ to as it is quite possible to just output RDF through string manipulation, but seeing as this is not a huge dataset, we can afford to go through an in-memory RDF graph.

In [None]:
import pandas as pd
import re
from rdflib import Graph, Namespace, RDF
from rdflib.namespace import SKOS

## Load the data
We assume to be working with the final output of the [signatures_processing](signatures_processing.ipynb) notebook.

In [2]:
df = pd.read_csv('data/csv/sig_updated.csv',dtype={'numbis': str, 'backreference': str, 'text_4': str})

# Create a multi-index as we might need to access rows over and over.

df.set_index(['lev','sys','numbis'], inplace=True)
df.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,text,backreference,text_1,text_2,text_3,text_4
lev,sys,numbis,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1.0,A,,Handbücherei,,,,,
1.0,B,,Italienische Kunst,,,,,
1.0,D,,Topographie Rom,,,,,
1.0,F,,Reiseberichte,,,,,
1.0,G,,Quellenschriften und Quellenkunde,,,,,
...,...,...,...,...,...,...,...,...
6.0,Hh 6946,,"19. und 20. Jh. international, sonstige einzel...",19. und 20. Jh. international,Ikonographie,Biblisch-christlicher Themenkreis,Der kanonische Bilderkreis (Altes und Neues Te...,Apokalypse und Weltgericht
6.0,Hh 6960,,Deesis-Bilder,,Ikonographie,Biblisch-christlicher Themenkreis,Der kanonische Bilderkreis (Altes und Neues Te...,Apokalypse und Weltgericht
6.0,Hh 6990,,sonstige Einzelfragen,,Ikonographie,Biblisch-christlicher Themenkreis,Der kanonische Bilderkreis (Altes und Neues Te...,Apokalypse und Weltgericht
,,,Bibliotheca Hertziana,,,,,


In [None]:
g = Graph()
NS_DATA = Namespace('http://data.biblhertz.it/term/sys/')

for index, row in df.iterrows():
    try:
        sys_uri = re.sub(r'\s+', '/', str(index[1]).strip())
        g.add((NS_DATA[sys_uri], RDF.type, SKOS.Concept))
    except KeyError:
        # Better to ask for forgiveness than for permission
        pass
    

In [None]:
len(g)