# Sharing data between `pyironhub` and `ontodocker` via the pmd-mesh
In this notebook, we demonstrate how to use the API provided by ontodocker from within jupyter notebooks. This mostly shows a working example and performs similar API calls as shown in the notebook "[`example.ipynb`](https://github.com/materialdigital/ontodocker/blob/dev2/example/example.ipynb)" from the ontodocker github repository.

As guidelines for the showcased functionality we used userstories:
- **Setup:** The user is member of an institution which is part of a compound project between different institutions. Each institution has deployed a PMD-Server with runing services (in particular Ontodocker and pyiron/jupyterhub). Those services are connected to the PMD-Mesh. Web interfaces are Exposed via `https`. Access to those interfaces is then managed by the respective IT department via firewall rules (e.g. they are only accessible from within the institutes Network/VPN).
- **US 1:** The user wants to access a set of known SPARQL endpoint to digest the hosted data.
- **NEXT STEP** **US 2:** Three disjunct parts of the S355 dataset are hosted on three different instances on the mesh. The user wants to access those, concatenate them and perform an alisis. The concatenated Graph is extended by the analisis results and the uploaded to another ontodocker instance.

Some basic imports and setting of the environment variable `REQUESTS_CA_BUNDLE` required for proper verification of the mesh certificates:

In [15]:
import requests
import os

os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

Assinging the mesh-addresses of 

In [18]:
ontodocker_url_iwt = "https://ontodocker.iwt.pmd.internal"
ontodocker_jwt_iwt = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJNYXRlcmlhbERpZ2l0YWwiLCJpYXQiOjE3NDY2MTc5NDcuMzgzMDgsImV4cCI6MTc2MjE2OTk0Ny4zODMwOCwiYXVkIjoib250b2RvY2tlciIsInVzZXJpZCI6MX0.0J-txnTdnvNTJVmRpnv3JJdjdQWS838W1F7EVfk34Ko"

ontodocker_url_iwm = "https://ontodocker.iwm.pmd.internal"
ontodocker_jwt_iwm = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJNYXRlcmlhbERpZ2l0YWwiLCJpYXQiOjE3NDU4MjUzODUuODkyOTUsImV4cCI6MTc1MzYwMTM4NS44OTI5NSwiYXVkIjoib250b2RvY2tlciIsInVzZXJpZCI6MTB9.HIKJb7usjeDltbjdJkExPQM19zKtundHfuK9SyhEbFU"

ontodocker_url_mpisusmat = "https://ontodocker.mpi-susmat.pmd.internal"
ontodocker_jwt_mpisusmat = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJNYXRlcmlhbERpZ2l0YWwiLCJpYXQiOjE3NDU4NDI4NDguMTQ0NDY4LCJleHAiOjE3NTM2MTg4NDguMTQ0NDY4LCJhdWQiOiJvbnRvZG9ja2VyIiwidXNlcmlkIjoxfQ.kAk7xcBeEXdEb3ghlstQszMRFVmPfaV3c_Fz6DQjRzs"

ontodocker_urls = [ontodocker_url_iwt, ontodocker_url_iwm, ontodocker_url_mpisusmat]
ontodocker_jwts = [ontodocker_jwt_iwt, ontodocker_jwt_iwm, ontodocker_jwt_mpisusmat]

Request a list of all (sparql) Endpoints on the instances:

In [19]:
for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    print(f'Available SPARQL-endpoints at "{url}":')
    headers = {"Authorization": f"Bearer {jwt}"}
    print(requests.get(f'{url}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Available SPARQL-endpoints at "https://ontodocker.iwt.pmd.internal":
[]

Available SPARQL-endpoints at "https://ontodocker.iwm.pmd.internal":
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]

Available SPARQL-endpoints at "https://ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/S355_pmdco-v2/sparql"]



Create an (empty) dataset

In [None]:
dataset_name = "pmdco2_tto_example"
for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    endpoint = f'{url}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {jwt}'}
    print(f'Creating empty dataset at "{url}":')
    print(requests.put(endpoint, headers=headers).content.decode())
    print("")

Upload a turtle file to the datasets:

In [None]:
turtle_file_path = "./pmdco2_tto_example.ttl"
#dataset_name = "pmdco2_tto_example"
for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    endpoint = f'{url}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {jwt}'}
    print(f'Upload "{turtle_file_path}" to dataset "{dataset_name}" at "{url}"')
    print(requests.post(endpoint, headers=headers, files={'file': open(turtle_file_path, 'rb')}).content.decode())

Testen mit einer query und `SPARQLWrapper`:

In [None]:
# this fixes a bug in SPARQLWrapper. Must be placed befor SPARQLWrapper is imported
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

from SPARQLWrapper import SPARQLWrapper

Search again for all Endpoints to find the newly created one

In [None]:
for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    print(f'Available SPARQL-endpoints at "{url}":')
    headers = {"Authorization": f"Bearer {jwt}"}
    print(requests.get(f'{url}/api/v1/endpoints', headers=headers).content.decode())
    print("")

**CAUTION, there's a bug: you need to remove `:None` and add `/v1/` between `api` and `jena`!**

The returned value of the API call to `.../api/v1/endpoints` is a string. When looking at them, you see they usually are *lists of strings*. We can use the `ast` module to convert the output to the "right" python objects:  
**Question:** What is the actually returne object type? Can we change it to be something which is easierly processed than a string, e.g. a json/dict structure? Am I missing something?

In [None]:
import ast

endpoint_keys = ["iwt_endpoints", "iwm_endoints"]
endpoints = dict.fromkeys(endpoint_keys)

for url, jwt, endpoint_key in zip(ontodocker_urls, ontodocker_jwts, endpoints.keys()): #.values()):
    headers = {"Authorization": f"Bearer {jwt}"}
    result = requests.get(f'{url}/api/v1/endpoints', headers=headers).content.decode()
    endpoints[endpoint_key] = ast.literal_eval(result) # <--- here

# make replacements/ insertions

Query the whole Graph. The result could later on e.g. be instantiated using `semantikon` or `tools4RDF` to peform simple reasoning or construct queries via tab completion.

In [None]:
#dataset_name = "pmdco2_tto_example"

query ="""
SELECT ?s ?p ?o WHERE { ?s ?p ?o }
""" 

for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    sparql_endpoint = f'{url}/api/v1/jena/{dataset_name}/sparql'
    sparql = SPARQLWrapper(sparql_endpoint)
    sparql.setReturnFormat('json')
    sparql.addCustomHttpHeader("Authorization", f'Bearer {jwt}')
    print(f'Sending query to "{sparql_endpoint}". Result:')
    sparql.setQuery(query)
    result = sparql.queryAndConvert()
    print(str(result)[:1500]+" ... ") # shortened only for demonstration purposes
    print("")

Query other information, e.g. the resource location of the corresponding `.csv` files:

In [None]:
#dataset_name = "pmdco2_tto_example"

query ="""
PREFIX base: <https://w3id.org/pmd/co/>
PREFIX csvw: <http://www.w3.org/ns/csvw#>
SELECT ?url ?p
WHERE {
    ?p a base:TensileTest .
    ?p base:characteristic ?dataset .
    ?dataset a base:Dataset .
    ?dataset base:resource ?table .
    ?table a csvw:Table .
    ?table csvw:url ?url .
}
ORDER BY ?p
""" 

for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    sparql_endpoint = f'{url}/api/v1/jena/{dataset_name}/sparql'
    sparql = SPARQLWrapper(sparql_endpoint)
    sparql.setReturnFormat('json')
    sparql.addCustomHttpHeader("Authorization", f'Bearer {jwt}')
    print(f'Sending query to "{sparql_endpoint}". Result:')
    sparql.setQuery(query)
    result = sparql.queryAndConvert()
    print(result)
    print("")

In [None]:
query ="""
PREFIX pmd: <https://w3id.org/pmd/co/>
SELECT distinct ?p ?rollingDirection
WHERE {
?s a pmd:TestPiece .
?p a pmd:TensileTest .
?p pmd:input ?s .
?p pmd:characteristic ?characteristic .
?characteristic a pmd:MaterialRelated .
?characteristic pmd:value ?rollingDirection .
} ORDER BY ?p
""" 

for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    sparql_endpoint = f'{url}/api/v1/jena/{dataset_name}/sparql'
    sparql = SPARQLWrapper(sparql_endpoint)
    sparql.setReturnFormat('json')
    sparql.addCustomHttpHeader("Authorization", f'Bearer {jwt}')
    print(f'Sending query to "{sparql_endpoint}". Result:')
    sparql.setQuery(query)
    result = sparql.queryAndConvert()
    print(result)
    print("")

Deletion:

In [21]:
dataset_name = "pmdco2_tto_example"
for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    endpoint = f'{url}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {jwt}'}
    print(f'Deleting dataset at "{url}":')
    print(f'Dataset endpoint was "{endpoint}"')
    print(requests.delete(endpoint, headers=headers).content.decode())
    print("")

Deleting dataset at "https://ontodocker.iwt.pmd.internal":
Dataset endpoint was "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example"
"No such dataset registered: /pmdco2_tto_example\n"

Deleting dataset at "https://ontodocker.iwm.pmd.internal":
Dataset endpoint was "https://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example"
"Dataset name pmdco2_tto_example destroyed"

Deleting dataset at "https://ontodocker.mpi-susmat.pmd.internal":
Dataset endpoint was "https://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/pmdco2_tto_example"
"No such dataset registered: /pmdco2_tto_example\n"



In [None]:
for url, jwt in zip(ontodocker_urls, ontodocker_jwts):
    print(f'Available SPARQL-endpoints at "{url}":')
    headers = {"Authorization": f"Bearer {jwt}"}
    print(requests.get(f'{url}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Next steps:
- create three disjunct datasets, host them on three servers
- concatenate and upload to mpi-susmat Ontodocker via mesh
- ...
- set up PMD-CKAN as data resource loaction within the mesh
- digest the raw data from there (tensile test analisis)
- ...
- CKAN as EP registry (via api and GUI accessible)
- deploy jupyterhubs on all servers
- support notebookusage from remote kernels (i.e. running on a different server) to loacate the job execution on the data owner's server.