# Sharing data between `pyironhub` and `ontodocker` via the pmd-mesh
...

## Requirements
- The user has access to a (PMD-)Server hosting services which are connected to the pmd-mesh. Handly of the access to web interfaces (e.g. via https) has to be managed ba the resp. IT department (e.g. via firewall rules).
- The user has collected information about other service instances she/he wants to access (e.g. via SPARQL-queries, for up-/downloading datasets etc.) in a `.json` file.  
  **Here, we collected everything in `pmd_mesh-demonstrator/secrets/participant_registry.json`.** The file has the following structure:
  ```json
    {
        "institution-1": {
            "service": {
                "name": "<service name>",
                "address": "<service address>",
                "token": "<service token>"
            }
        },
        ...
        "institution-N": {
            "service-1": {
                "name": "<service-1 name>",
                "address": "<service-1 address>",
                "token": "<service-1 token>"
            },
            ...
            "service-N": {
                "name": "<service-N name>",
                "address": "<service-N address>"
            }
        }
    }
  ```
  - The field names `name`, `address` and `token` are not prescribed, but it is recommended to use them. Each institution can have as many services as required. If not applicable, fields can be left out (e.g. `token` under `service-N` in `institution-N`).
  - We later on make the information from this file browsable via dot-notation and tab-comletion. Therefore, **it is recommended to avoid any characters which are not allowed for python identifiers in the field keys** (so not `[.,<,>,,:,<whitespace>,/,\,-,+,~,...]`).  
    **If this is not the case, such characters are replaced by an underscore!** See `pmd_mesh-demonstrator/helpers.py: cananify_string(), RecursiveNamespace`.
    The example here contains such "bad" key for demonstration puroses (e.g. there is a replacement `mpi-susmat` &rarr; `mpi_susmat`).
  - Tokens have to be obtained individually for each service via the responsible admin.
  - **This file is not version controlled** and it shouldn't be exposed anywhere. Tokens are confidential information, that allow the owner to delete anything on the resp. instance.

## Getting started

### Configuring the environment for using mesh addresses
First, we need to set an environment variable to let the OS correctly handle the mesh certificates.:

In [1]:
import os
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

This has to be done whenever one wants to use the mesh address of a service. Perhaps we can include this in the Container definition of the pyrion-image.

### Loading the participant information
We need to import 2 modules:
- `json`: to handle `json` files
- `../helpers.py` to construct the "right" Object from the loaded json.

The file `helpers.py` contains different functions and classes which will be useful when using the mesh.

In [2]:
import json

import sys
sys.path.append("..")
import helpers

Now we can load all the service's information from the `json` file into a `RecursiveNamespace` object (defined in `helpers.py`):

In [3]:
with open('../secrets/participant_registry.json') as f:
    partners = json.load(f, object_hook=lambda d: helpers.RecursiveNamespace(**d))

Instances of `RecursiveNamespace` allow for browsing through the server information via dot-notation and tab completion:

In [4]:
partners.iwt.ontodocker.address

'https://ontodocker.iwt.pmd.internal'

Iteration can be done using `dict`-mapping (`partners.__dict__`) and `getattr()` (See below).  
Once again a reminder: **If any key contains invalid Python identifiers, these characters are replaced by an underscore!** See `pmd_mesh-demonstrator/helpers.py: cananify_string(), RecursiveNamespace`.

### The Ontodocker API

Here, we demonstrate how to use the `ontodocker` API from within jupyter notebooks. This mostly shows a working example and performs similar API calls as shown in the notebook `example/example.ipynb` from the [ontodocker github repository](https://github.com/materialdigital/ontodocker/blob/dev2/).  
We need the `requests` module:

In [5]:
import requests

Now, we can do requests/ API calls:  
- Get a list of all SPARQL endpoints on each instace:

In [6]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    print(f'Available SPARQL-endpoints at "{address}":')
    headers = {"Authorization": f"Bearer {token}"}
    print(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Available SPARQL-endpoints at "https://ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql"]

Available SPARQL-endpoints at "https://ontodocker.iwm.pmd.internal":
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]

Available SPARQL-endpoints at "https://ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example_diagonal/sparql"]



- Create an (empty) dataset

In [7]:
dataset_name = "pmdco2_tto_example"
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    endpoint = f'{address}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {token}'}
    print(f'Creating empty dataset at "{address}":')
    print(requests.put(endpoint, headers=headers).content.decode())
    print("")

Creating empty dataset at "https://ontodocker.iwt.pmd.internal":
"Dataset name pmdco2_tto_example created"

Creating empty dataset at "https://ontodocker.iwm.pmd.internal":
"Dataset name pmdco2_tto_example created"

Creating empty dataset at "https://ontodocker.mpi-susmat.pmd.internal":
"Dataset name pmdco2_tto_example created"



- Upload a turtle file to the datasets:

In [8]:
turtle_file_path = "../pmdco2_tto_example.ttl"
#dataset_name = "pmdco2_tto_example"
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    endpoint = f'{address}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {token}'}
    print(f'Upload "{turtle_file_path}" to dataset "{dataset_name}" at "{address}"')
    print(requests.post(endpoint, headers=headers, files={'file': open(turtle_file_path, 'rb')}).content.decode())
    print("")

Upload "../pmdco2_tto_example.ttl" to dataset "pmdco2_tto_example" at "https://ontodocker.iwt.pmd.internal"
"Upload succeeded { \n  \"count\" : 2752 ,\n  \"tripleCount\" : 2752 ,\n  \"quadCount\" : 0\n}\n"

Upload "../pmdco2_tto_example.ttl" to dataset "pmdco2_tto_example" at "https://ontodocker.iwm.pmd.internal"
"Upload succeeded { \n  \"count\" : 2752 ,\n  \"tripleCount\" : 2752 ,\n  \"quadCount\" : 0\n}\n"

Upload "../pmdco2_tto_example.ttl" to dataset "pmdco2_tto_example" at "https://ontodocker.mpi-susmat.pmd.internal"
"Upload succeeded { \n  \"count\" : 2752 ,\n  \"tripleCount\" : 2752 ,\n  \"quadCount\" : 0\n}\n"



### Making SPARQL-queries
We will make queries with `SPARQLWrapper`:

In [9]:
# this fixes a bug in SPARQLWrapper. Must be placed befor SPARQLWrapper is imported
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

from SPARQLWrapper import SPARQLWrapper

Search again for all Endpoints to find the newly created one

In [10]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    print(f'Available SPARQL-endpoints at "{address}":')
    headers = {"Authorization": f"Bearer {token}"}
    print(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Available SPARQL-endpoints at "https://ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql","https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example/sparql"]

Available SPARQL-endpoints at "https://ontodocker.iwm.pmd.internal":
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]

Available SPARQL-endpoints at "https://ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example/sparql","http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example_diagonal/sparql"]



**CAUTION, there's a bug: you need to remove `:None` and add `/v1/` between `api` and `jena`!**
Also, the returned value of the API call to `.../api/v1/endpoints` is a string. When looking at them, you see they usually are *lists of strings*. We can use the `ast` module to convert the output to the "right" python objects. The method `rectify_endpoints()` from `helpers.py` takes care of both at once:

In [11]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    headers = {"Authorization": f"Bearer {token}"}
    endpoints_native = requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode()
    endpoints_rectified = helpers.rectify_endpoints(endpoints_native)
    print(f"Return value has type {type(endpoints_native)}:")
    print(endpoints_native)
    print(f"Rectified endpoints have type {type(endpoints_rectified)}:")
    print(endpoints_rectified)
    print("")

Return value has type <class 'str'>:
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql","https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example/sparql"]
Rectified endpoints have type <class 'list'>:
['https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql', 'https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example/sparql']

Return value has type <class 'str'>:
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]
Rectified endpoints have type <class 'list'>:
['http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example/sparql', 'http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql', 'http://ontodocker.iwm.pmd.internal/api/v1/jena/test/sparql']

Return value has type <class 'str'>:
["ht

Now, we can send SPARQL-queries to the endpoints by iterating over them. E.g., we query here the whole graph:  
(output is truncated to improve readability)

In [12]:
#dataset_name = "pmdco2_tto_example"
query ="""
SELECT ?s ?p ?o WHERE { ?s ?p ?o }
""" 

for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    sparql_endpoint = f'{address}/api/v1/jena/{dataset_name}/sparql'
    sparql = SPARQLWrapper(sparql_endpoint)
    sparql.setReturnFormat('json')
    sparql.addCustomHttpHeader("Authorization", f'Bearer {token}')
    print(f'Sending query to "{sparql_endpoint}". Result:')
    sparql.setQuery(query)
    result = sparql.queryAndConvert()
    print(str(result)[:1500]+" ... ") # shortened only for demonstration purposes
    print("")

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example/sparql". Result:
{'head': {'vars': ['s', 'p', 'o']}, 'results': {'bindings': [{'s': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/FunderIdentifierScheme'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/2000/01/rdf-schema#subClassOf'}, 'o': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/IdentifierScheme'}}, {'s': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/local-funder-identifier-scheme'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}, 'o': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/FunderIdentifierScheme'}}, {'s': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/Wikidata'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}, 'o': {'type': 'uri', 'value': 'http://purl.org/spar/datacite/IdentifierScheme'}}, {'s': {'type': 'uri', 'value': 'https://orcid.org/0000-0002-3717-7104'}, 'p': {'t

### Deletion of datasets via the ontodocker API

In [13]:
#dataset_name = "pmdco2_tto_example"

for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    endpoint = f'{address}/api/v1/jena/{dataset_name}'
    headers = {"Authorization": f'Bearer {token}'}
    print(f'Deleting dataset at "{address}":')
    print(f'Dataset endpoint was "{endpoint}"')
    print(requests.delete(endpoint, headers=headers).content.decode()+"\n")

Deleting dataset at "https://ontodocker.iwt.pmd.internal":
Dataset endpoint was "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example"
"Dataset name pmdco2_tto_example destroyed"

Deleting dataset at "https://ontodocker.iwm.pmd.internal":
Dataset endpoint was "https://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example"
"Dataset name pmdco2_tto_example destroyed"

Deleting dataset at "https://ontodocker.mpi-susmat.pmd.internal":
Dataset endpoint was "https://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/pmdco2_tto_example"
"Dataset name pmdco2_tto_example destroyed"



In [14]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    print(f'Available SPARQL-endpoints at "{address}":')
    headers = {"Authorization": f"Bearer {token}"}
    print(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print("")

Available SPARQL-endpoints at "https://ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql"]

Available SPARQL-endpoints at "https://ontodocker.iwm.pmd.internal":
["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]

Available SPARQL-endpoints at "https://ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example_diagonal/sparql"]



# Next steps:
- create three disjunct datasets, host them on three servers
- concatenate and upload to mpi-susmat Ontodocker via mesh
- ...
- set up PMD-CKAN as data resource loaction within the mesh
- digest the raw data from there (tensile test analisis)
- ...
- CKAN as EP registry (via api and GUI accessible)
- deploy jupyterhubs on all servers
- support notebookusage from remote kernels (i.e. running on a different server) to loacate the job execution on the data owner's server.