In [1]:
# load InteactiveShell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Slice the dataset and upload via the mesh

In this notebook, we slice the original dataset (stored in `../pmdco2_tto_example.ttl`) and distribute the slices to 3 `ontodocker` instances on the mesh. We will slice it into 3 pieces corresponding to the orientation in which the respective specimen was cut from the steel-sheet relative to the rolling direction. We then serialize the generated graphs as turtle-files and do the upload.

## Create three subgraphs
These orientations are *parallel*, *perpendicular* and *diagonal*. Each of the resulting graphs/ datasets should contain all "general" information of the original dataset, i.e. information like general metadata, which cannot be assigned to a specific orientation.

To do that, we have to
- collect all processes with a rolling direction
- collect all other triples belonging to the resp. process
- collect all triples which are not related to a rolling diretion

The resulting graphs then are finally serialized as turtle-files.

First, we import the required objects from `rdflib` and define namespaces which are used:


In [2]:
from rdflib import Graph, Namespace, URIRef, Literal

ns_pmdco = Namespace("https://w3id.org/pmd/co/")
ns_tte = Namespace("<https://w3id.org/pmd/ao/tte/")

Instantiate the `Graph()`-object for the original (full) graph/dataset and pars the data into it:

In [3]:
g = Graph()
g.parse("../pmdco2_tto_example.ttl", format="turtle")

<Graph identifier=N1b398964f0ae4aabbcfd235dedc748d2 (<class 'rdflib.graph.Graph'>)>

Now, we collect all **processes** accossiated with a rolling direction. Using `set()` prevents the occurance of doubled information (dublicates). We do this by iterating over all triples from the full graph `g` and searching all triples with predicate "pmdco.value" and the string "*_rollingDirection" in their subject.

In [4]:
processes_parallel = set()
processes_perpendicular = set()
processes_diagonal = set()

for rolling_dir in g.subjects(predicate=ns_pmdco.value, object=None):
    if "_rollingDirection" in str(rolling_dir):
        for s, p, o in g.triples((rolling_dir, ns_pmdco.value, None)):
            value = str(o).strip().lower()
            for proc in g.subjects(predicate=ns_pmdco.characteristic, object=rolling_dir):
                if value == "in rolling direction":
                    processes_parallel.add(proc)
                elif value == "perpendicular to rolling direction":
                    processes_perpendicular.add(proc)
                elif value == "diagonal to rolling direction":
                    processes_diagonal.add(proc)

Next, we collect al other triples, which are acossiated with the processes from above. For this, we define a function iterating over all triples as long as the triples IRI is from the TTE namespace.

In [5]:
def collect_tripels(process, graph, collected=None):
    if collected is None:
        collected = set()
    for t in graph.triples((process, None, None)):
        if t not in collected:
            collected.add(t)
            # collect recursively, if object (t[2]) is "URIRef" (IRI) in the same namespace (TTE)
            if isinstance(t[2], URIRef) and str(t[2]).startswith(str(ns_tte)):
                collect_tripels(t[2], graph, collected)
    return collected

Now we actualy collect the triples by iterating over all triples and creating the unions of triples belonging to a process (which, in turn, is accossiated with an orientation).

In [6]:
tripels_parallel = set()
tripels_perpendicular = set()
tripels_diagonal = set()

for proc in processes_parallel:
    tripels_parallel |= collect_tripels(proc, g) # (a|=b) == (a = a|b) == (a.update(b)) == (a = a.union(b)); in-place Union/Vereinigung
for proc in processes_perpendicular:
    tripels_perpendicular |= collect_tripels(proc, g)
for proc in processes_diagonal:
    tripels_diagonal |= collect_tripels(proc, g)

Also, we need to collect general information which is not related to a process at all. We do this, by first creating the graph which *only* contains information related to a process and substracting this from the overall full graph `g`:

In [7]:
rolling_tripels = tripels_parallel | tripels_perpendicular | tripels_diagonal # (c = a | b) == (c = a.union(b)); Union/Vereinigung
general_tripels = set(g) - rolling_tripels

Finally, we create the graphs from the union of the resp. (set of process-realted) triples and the general triples.

In [8]:
%%capture

g_parallel = Graph()
g_perpendicular = Graph()
g_diagonal = Graph()
for t in tripels_parallel | general_tripels:
    g_parallel.add(t)
for t in tripels_perpendicular | general_tripels:
    g_perpendicular.add(t)
for t in tripels_diagonal | general_tripels:
    g_diagonal.add(t)

Define handles for the datasets and the related files:

In [9]:
datasetname_parallel = "pmdco2_tto_example_parallel"
filename_parallel = "../" + datasetname_parallel + ".ttl"

datasetname_perpendicular = "pmdco2_tto_example_perpendicular"
filename_perpendicular = "../" + datasetname_perpendicular + ".ttl"

datasetname_diagonal = "pmdco2_tto_example_diagonal"
filename_diagonal = "../" + datasetname_diagonal + ".ttl"

Serialize the graphs to turtle-files (`.ttl`):

In [10]:
g_parallel.serialize(filename_parallel, format="turtle")
g_perpendicular.serialize(filename_perpendicular, format="turtle")
g_diagonal.serialize(filename_diagonal, format="turtle")

<Graph identifier=N85fe88645da14c69b1722b7d5df147a5 (<class 'rdflib.graph.Graph'>)>

<Graph identifier=N849e719f1b054bc49ccc46a8cbb0ad39 (<class 'rdflib.graph.Graph'>)>

<Graph identifier=N44213d5354b740d1a3e8f0db04e08b1e (<class 'rdflib.graph.Graph'>)>

## Upload the three slices via the mesh

Before any usage of the mesh, we have to point `requests` to the right certificate bundle:

In [11]:
import os
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

Load the mesh-participant information:

In [12]:
import sys
sys.path.append("..")
import helpers

import json

with open('../secrets/participant_registry.json') as f:
    partners = json.load(f, object_hook=lambda d: helpers.RecursiveNamespace(**d))

Do the upload via http requests to the ontodocker api (see also `00-technical_checks.ipynb`):

In [13]:
import requests

datasetname_list = [datasetname_parallel, datasetname_perpendicular, datasetname_diagonal]
filename_list = [filename_parallel, filename_perpendicular, filename_diagonal]

for (key, datasetname, filename) in zip(partners.__dict__, datasetname_list, filename_list):
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    endpoint = f'{address}/api/v1/jena/{datasetname}'
    headers = {"Authorization": f'Bearer {token}'}

    # create dataset
    print(f'Creating & filling dataset at "{address}":')
    print("--> "+ requests.put(endpoint, headers=headers).content.decode())

    # uplaod file
    print(f'Upload "{filename}" to dataset "{datasetname}" at "{address}"')
    print("--> " + requests.post(endpoint, headers=headers, files={'file': open(filename, 'rb')}).content.decode())
    print("")

Creating & filling dataset at "https://ontodocker.iwm.pmd.internal":
--> "Name already registered /pmdco2_tto_example_parallel\n"
Upload "../pmdco2_tto_example_parallel.ttl" to dataset "pmdco2_tto_example_parallel" at "https://ontodocker.iwm.pmd.internal"
--> "Upload succeeded { \n  \"count\" : 2476 ,\n  \"tripleCount\" : 2476 ,\n  \"quadCount\" : 0\n}\n"

Creating & filling dataset at "https://ontodocker.iwt.pmd.internal":
--> "Name already registered /pmdco2_tto_example_perpendicular\n"
Upload "../pmdco2_tto_example_perpendicular.ttl" to dataset "pmdco2_tto_example_perpendicular" at "https://ontodocker.iwt.pmd.internal"
--> "Upload succeeded { \n  \"count\" : 2476 ,\n  \"tripleCount\" : 2476 ,\n  \"quadCount\" : 0\n}\n"

Creating & filling dataset at "https://ontodocker.mpi-susmat.pmd.internal":
--> "Name already registered /pmdco2_tto_example_diagonal\n"
Upload "../pmdco2_tto_example_diagonal.ttl" to dataset "pmdco2_tto_example_diagonal" at "https://ontodocker.mpi-susmat.pmd.interna

**Deletion** (uncomment if you want to delete the datasets)

In [14]:
#for (key, datasetname) in zip(partners.__dict__, datasetname_list):
#    address = getattr(partners, key).ontodocker.address
#    token = getattr(partners, key).ontodocker.token
#    
#    endpoint = f'{address}/api/v1/jena/{datasetname}'
#    headers = {"Authorization": f'Bearer {token}'}
#    
#    print(f'Deleting dataset "{datasetname}" at "{address}":')
#    print(f'Endpoint was "{endpoint}"')
#    print(requests.delete(endpoint, headers=headers).content.decode()+"\n")