# Sharing data between `pyironhub` and `ontodocker` via the pmd-mesh
In this notebook, we demonstrate how to use the API provided by ontodocker from within jupyter notebooks. This mostly shows a working example and performs similar API calls as shown in the notebook "[`example.ipynb`](https://github.com/materialdigital/ontodocker/blob/dev2/example/example.ipynb)" from the ontodocker github repository.

As guidelines for the showcased functionality we used userstories:
- **Setup:** The user is member of an institution which is part of a compound project between different institutions. Each institution has deployed a PMD-Server with runing services (in particular Ontodocker and pyiron/jupyterhub). Those services are connected to the PMD-Mesh. Web interfaces are Exposed via `https`. Access to those interfaces is then managed by the respective IT department via firewall rules (e.g. they are only accessible from within the institutes Network/VPN).
- **US 1:** The user wants to access a set of known SPARQL endpoint to digest the hosted data.
- **NEXT STEP** **US 2:** Three disjunct parts of the S355 dataset are hosted on three different instances on the mesh. The user wants to access those, concatenate them and perform an alisis. The concatenated Graph is extended by the analisis results and the uploaded to another ontodocker instance.

Some basic imports and setting of the environment variable `REQUESTS_CA_BUNDLE` required for proper verification of the mesh certificates:

In [1]:
import requests
import os
import json

os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

Load self-defined helpers, e.g. tab-completeion of connected server information. Some characters in the key-names might brake this, like e.g. hyphens.

In [2]:
import helpers

load all server information from the json as a `RecursiveNamespace` object:

In [3]:
with open('../secrets/endpoints.json') as f:
    partners = json.load(f, object_hook=lambda d: helpers.RecursiveNamespace(**d))

In [6]:
partners.mpisusmat.ontodocker.name

'ontodocker-mpisusmat'

We can access the data required for accessing the server via a dot notation: `partners.<institute-name>.<service-name>.<address>` now has the services (mesh-) adress as value. Have a look at `../secrets/endpoints.json` and `helpers.py` for more. You can browse the contents of `partners` using tab completion. Iteration can be done using `dict`-mapping (`partners.__dict__`) and `getattr()`.

Request a list of all (sparql) Endpoints on the instances:

In [7]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    print(f'Available SPARQL-endpoints at "{address}":')
    headers = {"Authorization": f"Bearer {token}"}
    print(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Available SPARQL-endpoints at "https://ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql"]

Available SPARQL-endpoints at "https://ontodocker.iwm.pmd.internal":
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]

Available SPARQL-endpoints at "https://ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example_diagonal/sparql"]



Create an (empty) dataset

In [9]:
dataset_name = "pmdco2_tto_example"
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    endpoint = f'{address}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {token}'}
    print(f'Creating empty dataset at "{address}":')
    print(requests.put(endpoint, headers=headers).content.decode())
    print("")

Creating empty dataset at "https://ontodocker.iwt.pmd.internal":
"Dataset name pmdco2_tto_example created"

Creating empty dataset at "https://ontodocker.iwm.pmd.internal":
"Dataset name pmdco2_tto_example created"

Creating empty dataset at "https://ontodocker.mpi-susmat.pmd.internal":
"Dataset name pmdco2_tto_example created"



Upload a turtle file to the datasets:

In [10]:
turtle_file_path = "../pmdco2_tto_example.ttl"
#dataset_name = "pmdco2_tto_example"
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    endpoint = f'{address}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {token}'}
    print(f'Upload "{turtle_file_path}" to dataset "{dataset_name}" at "{address}"')
    print(requests.post(endpoint, headers=headers, files={'file': open(turtle_file_path, 'rb')}).content.decode())

Upload "../pmdco2_tto_example.ttl" to dataset "pmdco2_tto_example" at "https://ontodocker.iwt.pmd.internal"
"Upload succeeded { \n  \"count\" : 2752 ,\n  \"tripleCount\" : 2752 ,\n  \"quadCount\" : 0\n}\n"
Upload "../pmdco2_tto_example.ttl" to dataset "pmdco2_tto_example" at "https://ontodocker.iwm.pmd.internal"
"Upload succeeded { \n  \"count\" : 2752 ,\n  \"tripleCount\" : 2752 ,\n  \"quadCount\" : 0\n}\n"
Upload "../pmdco2_tto_example.ttl" to dataset "pmdco2_tto_example" at "https://ontodocker.mpi-susmat.pmd.internal"
"Upload succeeded { \n  \"count\" : 2752 ,\n  \"tripleCount\" : 2752 ,\n  \"quadCount\" : 0\n}\n"


Test with a query and `SPARQLWrapper`:

In [11]:
# this fixes a bug in SPARQLWrapper. Must be placed befor SPARQLWrapper is imported
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

from SPARQLWrapper import SPARQLWrapper

Search again for all Endpoints to find the newly created one

In [12]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    print(f'Available SPARQL-endpoints at "{address}":')
    headers = {"Authorization": f"Bearer {token}"}
    print(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Available SPARQL-endpoints at "https://ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql","https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example/sparql"]

Available SPARQL-endpoints at "https://ontodocker.iwm.pmd.internal":
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]

Available SPARQL-endpoints at "https://ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example/sparql","http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example_diagonal/sparql"]



**CAUTION, there's a bug: you need to remove `:None` and add `/v1/` between `api` and `jena`!**

The returned value of the API call to `.../api/v1/endpoints` is a string. When looking at them, you see they usually are *lists of strings*. We can use the `ast` module to convert the output to the "right" python objects: 

In [13]:
import ast

for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    headers = {"Authorization": f"Bearer {token}"}
    result = requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode()
    print(result)
    print(f"return value has type {type(result)}\n")
    print(ast.literal_eval(result))
    print(f"now casted to {type(ast.literal_eval(result))}\n\n")

# make replacements/ insertions

["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql","https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example/sparql"]
return value has type <class 'str'>

['https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql', 'https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example/sparql']
now casted to <class 'list'>


["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]
return value has type <class 'str'>

['http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example/sparql', 'http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql', 'http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql']
now casted to <class 'list'>


["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_

Query the whole Graph. The result could later on e.g. be instantiated using `semantikon` or `tools4RDF` to peform simple reasoning or construct queries via tab completion.

In [14]:
#dataset_name = "pmdco2_tto_example"

query ="""
SELECT ?s ?p ?o WHERE { ?s ?p ?o }
""" 

for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    sparql_endpoint = f'{address}/api/v1/jena/{dataset_name}/sparql'
    sparql = SPARQLWrapper(sparql_endpoint)
    sparql.setReturnFormat('json')
    sparql.addCustomHttpHeader("Authorization", f'Bearer {token}')
    print(f'Sending query to "{sparql_endpoint}". Result:')
    sparql.setQuery(query)
    result = sparql.queryAndConvert()
    print(str(result)[:1500]+" ... ") # shortened only for demonstration purposes
    print("")

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example/sparql". Result:
{'head': {'vars': ['s', 'p', 'o']}, 'results': {'bindings': [{'s': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/FunderIdentifierScheme'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/2000/01/rdf-schema#subClassOf'}, 'o': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/IdentifierScheme'}}, {'s': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/local-funder-identifier-scheme'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}, 'o': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/FunderIdentifierScheme'}}, {'s': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/Wikidata'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}, 'o': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/IdentifierScheme'}}, {'s': {'type': 'uri', 'value': 'https://orcid.org/0000-0002-3717-7104'}, 'p': {'t

# Deletion:

In [15]:
#dataset_name = "pmdco2_tto_example"

for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    endpoint = f'{address}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {token}'}
    print(f'Deleting dataset at "{address}":')
    print(f'Dataset endpoint was "{endpoint}"')
    print(requests.delete(endpoint, headers=headers).content.decode()+"\n")

Deleting dataset at "https://ontodocker.iwt.pmd.internal":
Dataset endpoint was "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example"
"Dataset name pmdco2_tto_example destroyed"

Deleting dataset at "https://ontodocker.iwm.pmd.internal":
Dataset endpoint was "https://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example"
"Dataset name pmdco2_tto_example destroyed"

Deleting dataset at "https://ontodocker.mpi-susmat.pmd.internal":
Dataset endpoint was "https://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/pmdco2_tto_example"
"Dataset name pmdco2_tto_example destroyed"



In [16]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    print(f'Available SPARQL-endpoints at "{address}":')
    headers = {"Authorization": f"Bearer {token}"}
    print(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Available SPARQL-endpoints at "https://ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql"]

Available SPARQL-endpoints at "https://ontodocker.iwm.pmd.internal":
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]

Available SPARQL-endpoints at "https://ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example_diagonal/sparql"]



# Next steps:
- create three disjunct datasets, host them on three servers
- concatenate and upload to mpi-susmat Ontodocker via mesh
- ...
- set up PMD-CKAN as data resource loaction within the mesh
- digest the raw data from there (tensile test analisis)
- ...
- CKAN as EP registry (via api and GUI accessible)
- deploy jupyterhubs on all servers
- support notebookusage from remote kernels (i.e. running on a different server) to loacate the job execution on the data owner's server.