# Simple semantics and "federated upload" of datasets via python and 

In [1]:
# load InteractiveShell: enable to show all output generated in a cell, not only the last one
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import os
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

# Slice the dataset and upload via the mesh

In this notebook, we slice the original dataset (stored in `../pmdco2_tto_example.ttl`) and distribute the slices to 3 `ontodocker` instances on the mesh. We will slice it into 3 pieces corresponding to the orientation in which the respective specimen was cut from the steel-sheet relative to the rolling direction. We then serialize the generated graphs as turtle-files and do the upload.

## Create three subgraphs
These orientations are *parallel*, *perpendicular* and *diagonal*. Each of the resulting graphs/ datasets should contain all "general" information of the original dataset, i.e. information like general metadata, which cannot be assigned to a specific orientation.

To do that, we have to
- collect all processes with a rolling direction
- collect all other triples belonging to the resp. process
- collect all triples which are not related to a rolling diretion

The resulting graphs then are finally serialized as turtle-files.

First, we import the required objects from `rdflib` and define namespaces which are used:


In [2]:
from rdflib import Graph, Namespace, URIRef, Literal
from rdflib.namespace import RDF, XSD

ns_pmdco = Namespace("https://w3id.org/pmd/co/")
ns_tte = Namespace("https://w3id.org/pmd/ao/tte/")

Instantiate the `Graph()`-object for the original (full) graph/dataset and parse the data into it:

In [3]:
g = Graph()
g.parse("../datasets/pmdco2_tto_example.ttl", format="turtle")

<Graph identifier=N1c449a44e8424e8c8b6ec9961be23c53 (<class 'rdflib.graph.Graph'>)>

Now, we collect all **processes** accossiated with a rolling direction. Using `set()` prevents the occurance of doubled information (dublicates). We do this by iterating over all triples from the full graph `g` and searching all triples with predicate "pmdco.value" and the string "*_rollingDirection" in their subject.

In [4]:
processes_parallel = set()
processes_perpendicular = set()
processes_diagonal = set()

for rolling_dir in g.subjects(predicate=ns_pmdco.value, object=None):
    if "_rollingDirection" in str(rolling_dir):
        for s, p, o in g.triples((rolling_dir, ns_pmdco.value, None)):
            value = str(o).strip().lower()
            for proc in g.subjects(predicate=ns_pmdco.characteristic, object=rolling_dir):
                if value == "in rolling direction":
                    processes_parallel.add(proc)
                elif value == "perpendicular to rolling direction":
                    processes_perpendicular.add(proc)
                elif value == "diagonal to rolling direction":
                    processes_diagonal.add(proc)

Next, we collect al other triples, which are acossiated with the processes from above. For this, we define a function iterating over all triples as long as the triples IRI is from the TTE namespace.

In [5]:
def collect_tripels(process, graph, collected=None):
    if collected is None:
        collected = set()
    for t in graph.triples((process, None, None)):
        if t not in collected:
            collected.add(t)
            # collect recursively, if object (t[2]) is "URIRef" (IRI) in the same namespace (TTE)
            if isinstance(t[2], URIRef) and str(t[2]).startswith(str(ns_tte)):
                collect_tripels(t[2], graph, collected)
    return collected

Now we actualy collect the triples by iterating over all triples and creating the unions of triples belonging to a process (which, in turn, is accossiated with an orientation).

In [6]:
tripels_parallel = set()
tripels_perpendicular = set()
tripels_diagonal = set()

for proc in processes_parallel:
    tripels_parallel |= collect_tripels(proc, g) # (a|=b) == (a = a|b) == (a.update(b)) == (a = a.union(b)); in-place Union/Vereinigung
for proc in processes_perpendicular:
    tripels_perpendicular |= collect_tripels(proc, g)
for proc in processes_diagonal:
    tripels_diagonal |= collect_tripels(proc, g)

Also, we need to collect general information which is not related to a process at all. We do this, by first creating the graph which *only* contains information related to a process and substracting this from the overall full graph `g`:

In [7]:
rolling_tripels = tripels_parallel | tripels_perpendicular | tripels_diagonal # (c = a | b) == (c = a.union(b)); Union/Vereinigung
general_tripels = set(g) - rolling_tripels

Finally, we create the graphs from the union of the resp. (set of process-realted) triples and the general triples. The line magic `%%capture` suppresses cell output.

In [8]:
%%capture cap

g_parallel = Graph()
g_perpendicular = Graph()
g_diagonal = Graph()
for t in tripels_parallel | general_tripels:
    g_parallel.add(t)
for t in tripels_perpendicular | general_tripels:
    g_perpendicular.add(t)
for t in tripels_diagonal | general_tripels:
    g_diagonal.add(t)

The final step now is to adjust the resource loactions where to find the CSV files referenced in the datasets. Formerly, they were located at some github repsitory. First, we define a function which performs URL replacements in the graph objects:

In [9]:
def apply_url_replacements(g: Graph, replacements: dict[str, str]):
    """
    In-place replacement of csvw:url objects according to a mapping.
    
    Args:
        g: rdflib.Graph loaded with your triples.
        replacements: dict mapping old URL strings -> new URL strings.
    
    Returns:
        List of (table_subject_uri, old_url_str, new_url_str) for each change.
    """
    ns_csvw = Namespace("http://www.w3.org/ns/csvw#")
    changes = []

    for table, old in g.subject_objects(predicate=ns_csvw.url):
        old_s = str(old)
        new_val = replacements.get(old_s)
        if new_val is None:
            continue

        # Preserve node kind: URIRef stays IRI; Literal stays xsd:anyURI
        new_node = URIRef(new_val) if isinstance(old, URIRef) else Literal(new_val, datatype=XSD.anyURI)

        if new_node != old:
            g.remove((table, ns_csvw.url, old))
            g.add((table, ns_csvw.url, new_node))
            changes.append((table, old_s, new_val))

    return changes

Then, we define the repalcements to be performed:

In [10]:
replacements_parallel = {
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx1.csv": "https://data.mpi-susmat.pmd.internal/Zx1.csv",
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx2.csv": "https://data.mpi-susmat.pmd.internal/Zx2.csv",
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx3.csv": "https://data.mpi-susmat.pmd.internal/Zx3.csv",
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx4.csv": "https://data.mpi-susmat.pmd.internal/Zx4.csv",
}
replacements_perpendicular = {
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy1.csv": "https://data.kit-3.pmd.internal/S355/01_primary_data/Zy1.csv",
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy2.csv": "https://data.kit-3.pmd.internal/S355/01_primary_data/Zy2.csv",
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy3.csv": "https://data.kit-3.pmd.internal/S355/01_primary_data/Zy3.csv",
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy4.csv": "https://data.kit-3.pmd.internal/S355/01_primary_data/Zy4.csv",
}
replacements_diagonal = {
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zd2.csv": "https://ckan.iwm.pmd.internal/en/dataset/1bae2004-1729-46c1-8aeb-25e2652b8485/resource/c6d930b7-e06b-42f0-8a6e-c2ec5f721b21/download/zd2.csv",
    "https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zd3.csv": "https://ckan.iwm.pmd.internal/en/dataset/1bae2004-1729-46c1-8aeb-25e2652b8485/resource/4ee98330-c256-4c0c-abed-08ff8df6af0f/download/zd3.csv",
}

In [11]:
# %%capture cap

apply_url_replacements(g_parallel, replacements_parallel)
apply_url_replacements(g_perpendicular, replacements_perpendicular)
apply_url_replacements(g_diagonal, replacements_diagonal)

[(rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zx3_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx3.csv',
  'https://data.mpi-susmat.pmd.internal/Zx3.csv'),
 (rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zx2_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx2.csv',
  'https://data.mpi-susmat.pmd.internal/Zx2.csv'),
 (rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zx4_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx4.csv',
  'https://data.mpi-susmat.pmd.internal/Zx4.csv'),
 (rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zx1_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zx1.csv',
  'https://data.mpi-susmat.pmd.internal/Zx1.csv')]

[(rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zy4_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy4.csv',
  'https://data.kit-3.pmd.internal/S355/01_primary_data/Zy4.csv'),
 (rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zy1_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy1.csv',
  'https://data.kit-3.pmd.internal/S355/01_primary_data/Zy1.csv'),
 (rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zy2_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy2.csv',
  'https://data.kit-3.pmd.internal/S355/01_primary_data/Zy2.csv'),
 (rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zy3_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zy3.csv',
  'https://data.kit-3.pmd.internal/S355/01_primary_data

[(rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zd3_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zd3.csv',
  'https://ckan.iwm.pmd.internal/en/dataset/1bae2004-1729-46c1-8aeb-25e2652b8485/resource/4ee98330-c256-4c0c-abed-08ff8df6af0f/download/zd3.csv'),
 (rdflib.term.URIRef('https://w3id.org/pmd/ao/tte/Zd2_csv'),
  'https://github.com/materialdigital/application-ontologies/tree/main/tensile_test_ontology_TTO/data/primary_data/Zd2.csv',
  'https://ckan.iwm.pmd.internal/en/dataset/1bae2004-1729-46c1-8aeb-25e2652b8485/resource/c6d930b7-e06b-42f0-8a6e-c2ec5f721b21/download/zd2.csv')]

Define handles for the datasets and the related files:

In [12]:
datasetname_parallel = "pmdco2_tto_example_parallel"
filename_parallel = "../datasets/" + datasetname_parallel + ".ttl"

datasetname_perpendicular = "pmdco2_tto_example_perpendicular"
filename_perpendicular = "../datasets/" + datasetname_perpendicular + ".ttl"

datasetname_diagonal = "pmdco2_tto_example_diagonal"
filename_diagonal = "../datasets/" + datasetname_diagonal + ".ttl"

Serialize the graphs to turtle-files (`.ttl`):

In [13]:
g_parallel.serialize(filename_parallel, format="turtle")
g_perpendicular.serialize(filename_perpendicular, format="turtle")
g_diagonal.serialize(filename_diagonal, format="turtle")

<Graph identifier=N1b020514912d4654a76f063df89e8629 (<class 'rdflib.graph.Graph'>)>

<Graph identifier=N52dd40be9fc146ada5a84d5540351a50 (<class 'rdflib.graph.Graph'>)>

<Graph identifier=N974f263332124bb4a548b3a2b16015ca (<class 'rdflib.graph.Graph'>)>

## Upload the three slices via the mesh

This part requires some basic knowledge of the mesh-related tools used inthe Demonstrator. Have a look at `01-setup_and_fist_steps.ipynb` for information.  
Before any usage of the mesh, we have to point `requests` to the right certificate bundle:

populate the mesh-paticipant registry:

In [14]:
import os
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

from pmd_demo_tools import mesh_tools

# index all servers
partners_full = mesh_tools.mesh_namespace_grouped_by_company()

# attach hosted services
_ = mesh_tools.attach_services_in_place(partners_full)

# fill in web tokens
import json
with open('../secrets/tokens.json') as f:
    tokens = json.load(f, object_hook=mesh_tools.namespace_object_hook())

partners_full.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.iwt.services.ontodocker.token = tokens.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.ontodocker.token
partners_full.Fraunhofer_IWM.iwm.services.ontodocker.token = tokens.Fraunhofer_IWM.ontodocker.token
partners_full.KIT.kit_3.services.ontodocker_proxy.token = tokens.KIT.ontodocker_proxy.token
partners_full.MPISusMat.mpi_susmat.services.ontodocker.token = tokens.MPISusMat.ontodocker.token

# reduce to some selected partners
selection = ["Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT", "Fraunhofer_IWM", "KIT", "MPISusMat"]
partners = mesh_tools.select_toplevel(partners_full, selection, deepcopy=True)



In [15]:
from pmd_demo_tools import sparql_tools
import requests

endpoints = sparql_tools.list_sparql_endpoints(partners, verify=True)

Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT
Available SPARQL-endpoints at "ontodocker.iwt.pmd.internal":
https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql
https://ontodocker.iwt.pmd.internal/api/v1/jena/test_dataset/sparql

Fraunhofer_IWM
Available SPARQL-endpoints at "ontodocker.iwm.pmd.internal":
https://ontodocker.iwm.pmd.internal/api/v1/jena/test/sparql
https://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_diagonal/sparql

KIT
Available SPARQL-endpoints at "ontodocker-proxy.kit-3.pmd.internal":
http://ontodocker-proxy.kit-3.pmd.internal/api/v1/jena/test_dataset/sparql
http://ontodocker-proxy.kit-3.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql
http://ontodocker-proxy.kit-3.pmd.internal/api/v1/jena/tt_test/sparql

MPISusMat
Available SPARQL-endpoints at "ontodocker.mpi-susmat.pmd.internal":
http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/newset/sparql
http://ontodocker.mpi-susmat.pmd.internal/api/v

Optionally delete concurring datasets before uploading: 

In [16]:
datasetname_list = [datasetname_parallel, datasetname_perpendicular, datasetname_diagonal]
filename_list = [filename_parallel, filename_perpendicular, filename_diagonal]

for company, server, service_key, service in mesh_tools.iter_servers_with_services_matching(partners, ["*ontodocker*"]):
    print(company+ "\n")
    address = service.address
    token = service.token
    for datasetname in datasetname_list:
        try:
            headers = {"Authorization": f"Bearer {token}"}
            timeout=(3, 3)
            result = requests.delete(f'https://{address}/api/v1/jena/{datasetname}', headers=headers, timeout=timeout, verify=True).content.decode()
            print(f'Deleting dataset {datasetname} at "{address}":')
            print(result)
            print("")
        except Exception as e:
            print (f"An error occurred for the service with address '{address}':\n")
            print(str(type(e))+"\n"+str(e)+"\n\n")

Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT

Deleting dataset pmdco2_tto_example_parallel at "ontodocker.iwt.pmd.internal":
"Dataset name pmdco2_tto_example_parallel destroyed"

Deleting dataset pmdco2_tto_example_perpendicular at "ontodocker.iwt.pmd.internal":
"No such dataset registered: /pmdco2_tto_example_perpendicular\n"

Deleting dataset pmdco2_tto_example_diagonal at "ontodocker.iwt.pmd.internal":
"No such dataset registered: /pmdco2_tto_example_diagonal\n"

Fraunhofer_IWM

Deleting dataset pmdco2_tto_example_parallel at "ontodocker.iwm.pmd.internal":
"401 - Unauthorized"

Deleting dataset pmdco2_tto_example_perpendicular at "ontodocker.iwm.pmd.internal":
"401 - Unauthorized"

Deleting dataset pmdco2_tto_example_diagonal at "ontodocker.iwm.pmd.internal":
"401 - Unauthorized"

KIT

Deleting dataset pmdco2_tto_example_parallel at "ontodocker-proxy.kit-3.pmd.internal":
"No such dataset registered: /pmdco2_tto_example_parallel\n"

Deleting dataset pmdco2_tto_example_

Do the upload via http requests to the ontodocker api:

In [17]:
#select institutes to upload data to
ontodockers = [partners.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.iwt.services.ontodocker,
               partners.KIT.kit_3.services.ontodocker_proxy,
               partners.Fraunhofer_IWM.iwm.services.ontodocker
              ]

for (ontodocker, datasetname, filename) in zip(ontodockers, datasetname_list, filename_list):
    address = ontodocker.address
    token = ontodocker.token
    
    endpoint = f'https://{address}/api/v1/jena/{datasetname}'
    headers = {"Authorization": f'Bearer {token}'}

    # create dataset
    print(f'Creating & filling dataset at "{address}":')
    print("--> "+ requests.put(endpoint, headers=headers, verify=True).content.decode())

    # uplaod file
    print(f'Upload "{filename}" to dataset "{datasetname}" at "{address}"')
    print("--> " + requests.post(endpoint, headers=headers, files={'file': open(filename, 'rb')}, verify=True).content.decode())
    print("")

Creating & filling dataset at "ontodocker.iwt.pmd.internal":
--> "Dataset name pmdco2_tto_example_parallel created"
Upload "../datasets/pmdco2_tto_example_parallel.ttl" to dataset "pmdco2_tto_example_parallel" at "ontodocker.iwt.pmd.internal"
--> "Upload succeeded { \n  \"count\" : 1378 ,\n  \"tripleCount\" : 1378 ,\n  \"quadCount\" : 0\n}\n"

Creating & filling dataset at "ontodocker-proxy.kit-3.pmd.internal":
--> "Dataset name pmdco2_tto_example_perpendicular created"
Upload "../datasets/pmdco2_tto_example_perpendicular.ttl" to dataset "pmdco2_tto_example_perpendicular" at "ontodocker-proxy.kit-3.pmd.internal"
--> "Upload succeeded { \n  \"count\" : 1378 ,\n  \"tripleCount\" : 1378 ,\n  \"quadCount\" : 0\n}\n"

Creating & filling dataset at "ontodocker.iwm.pmd.internal":
--> "401 - Unauthorized"
Upload "../datasets/pmdco2_tto_example_diagonal.ttl" to dataset "pmdco2_tto_example_diagonal" at "ontodocker.iwm.pmd.internal"
--> "401 - Unauthorized"

