# Ontodocker on the mesh

Here, we demonstrate how to use the `ontodocker` API from within jupyter notebooks. This mostly shows a working example and performs similar API calls as shown in the notebook `example/example.ipynb` from the [ontodocker github repository](https://github.com/materialdigital/ontodocker/blob/dev2/).

## Preparation
We need the `requests` module to perform requests via `http(s)`.
In the beginning, we demonstrate the API calls more explicitely, then we introduce a few abstactions, making things semantically more attractive and readable.

### Basic imports

In [1]:
%load_ext autoreload
# reload modules automatically before each cell
%autoreload 2

import requests
import os
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt" # if not set by the OS; doesn't hurt

# fixes a bug in SPARQLwrapper
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [2]:
# Used for toggeling SSL verification debugging
VERIFY = False

# silence the related warnings
import warnings
from urllib3.exceptions import InsecureRequestWarning
warnings.filterwarnings("ignore", category=InsecureRequestWarning)

### Poplulating a mesh-participant registry

In [3]:
from pmd_demo_tools import mesh_tools

In [4]:
partners_full = mesh_tools.mesh_namespace_grouped_by_company()
_ = mesh_tools.attach_services_in_place(partners_full)
# partners_full.show()



#### Reducing registries
We also deduce a selection corresponding only to the services where we have valit JWTs for ontodocker services.

In [5]:
import json
with open('../secrets/tokens.json') as f:
    tokens = json.load(f, object_hook=mesh_tools.namespace_object_hook())

partners_full.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.iwt.services.ontodocker.token = tokens.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.ontodocker.token
partners_full.Fraunhofer_IWM.iwm.services.ontodocker.token = tokens.Fraunhofer_IWM.ontodocker.token
partners_full.KIT.kit_3.services.ontodocker_proxy.token = tokens.KIT.ontodocker_proxy.token
partners_full.MPISusMat.mpi_susmat.services.ontodocker.token = tokens.MPISusMat.ontodocker.token

selection = ["Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT", "Fraunhofer_IWM", "KIT", "MPISusMat"]
partners = mesh_tools.select_toplevel(partners_full, selection, deepcopy=True)

In [6]:
partners_full.show()

BAM:
    bam_s1:
        company: 'BAM'
        contact: 'philipp.beckmann@bam.de'
        dns:
            - 'pmd-s1.bam.de'
        services:
            ontodocker_internal:
                address: 'ontodocker-internal.bam-s1.pmd.internal'
                name: 'ontodocker-internal'
                token: <SECRET>
        wg_mesh_dns_zone: 'bam-s1.pmd.internal'
        wg_mesh_subnet: 'fd51:0:3:1::/64'
Fraunhofer_AISEC:
    aisec:
        company: 'Fraunhofer AISEC'
        contact: 'pmd@aisec.fraunhofer.de'
        dns:
            - 'material-digital.aisec.fraunhofer.de'
        services:
        wg_mesh_dns_zone: 'aisec.pmd.internal'
        wg_mesh_subnet: 'fd51:0:2:1::/64'
Fraunhofer_ISC:
    isc_dev:
        company: 'Fraunhofer ISC'
        contact: 'simon.stier@isc.fraunhofer.de'
        dns:
            - 'pmd-s.open-semantic-lab.org'
        services:
        wg_mesh_dns_zone: 'isc-dev.pmd.internal'
        wg_mesh_subnet: 'fd51:0:7:1::/64'
Fraunhofer_IWM:
    iwm:
      

## Fundamental API calls to Ontodocker

### A single instance

First, we store the address and token to some variables and define some parameters for `https` requests:

In [7]:
address = partners_full.MPISusMat.mpi_susmat.services.ontodocker.address
token = partners_full.MPISusMat.mpi_susmat.services.ontodocker.token

headers = {"Authorization": f"Bearer {token}"}
timeout=(5, 5)

####  Get a list of all SPARQL endpoints exposed by an ontodocker instance:

In [8]:
result = requests.get(f'https://{address}/api/v1/endpoints', headers=headers, timeout=timeout, verify=VERIFY).content.decode()

In [9]:
print(f'Available SPARQL-endpoints at "{address}":')
print(result)

Available SPARQL-endpoints at "ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/newset/sparql","http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/not4all/sparql"]


`pmd_demo_tools` provides some helpful abstractions. E.g. `sparql_tools.list_sparql_endpoints()`. Atm, this method only works with the company -> server -> service structured registry. So for a single instance, we have to make a reduction first:

In [10]:
selection = ["MPISusMat"]
mpi_susmat_rns = mesh_tools.select_toplevel(partners_full, selection, deepcopy=True)

In [11]:
from pmd_demo_tools import sparql_tools

ep = sparql_tools.list_sparql_endpoints(mpi_susmat_rns)

MPISusMat
Available SPARQL-endpoints at "ontodocker.mpi-susmat.pmd.internal":
http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/newset/sparql
http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/not4all/sparql



In [12]:
type(ep)

dict

In [13]:
ep

{'MPISusMat_mpi_susmat': ['http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/newset/sparql',
  'http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/not4all/sparql']}

#### Create a dataset

In [14]:
dataset_name = "test_dataset"
endpoint = f'https://{address}/api/v1/jena/{dataset_name}'
result = requests.put(endpoint, headers=headers, timeout=timeout, verify=VERIFY).content.decode()
print(f'Creating empty dataset at "{address}":')
print(result)

Creating empty dataset at "ontodocker.mpi-susmat.pmd.internal":
"Dataset name test_dataset created"


#### Upload a turtle-file to to dataset:

In [15]:
turtle_file_path = "../datasets/test_dataset.ttl"
result = requests.post(endpoint, headers=headers, files={'file': open(turtle_file_path, 'rb')}, timeout=timeout, verify=VERIFY).content.decode()
print(f'Upload "{turtle_file_path}" to dataset "{dataset_name}" at "{address}"')
print(result)

Upload "../datasets/test_dataset.ttl" to dataset "test_dataset" at "ontodocker.mpi-susmat.pmd.internal"
"Upload succeeded { \n  \"count\" : 13 ,\n  \"tripleCount\" : 13 ,\n  \"quadCount\" : 0\n}\n"


#### Query a single SPARQL endpoint

Explicitly:

In [16]:
from SPARQLWrapper import SPARQLWrapper
from pmd_demo_tools import sparql_tools
import json

result = requests.get(f'https://{address}/api/v1/endpoints', headers=headers, timeout=timeout, verify=VERIFY).content.decode()
endpoints_native = requests.get(f'https://{address}/api/v1/endpoints', headers=headers, timeout=timeout, verify=VERIFY).content.decode()
endpoints_rectified = sparql_tools.rectify_endpoints(endpoints_native)
print(f"Return value has type {type(endpoints_native)}:")
print(endpoints_native)
print(f"Rectified endpoints have type {type(endpoints_rectified)}:")
print(endpoints_rectified)

Return value has type <class 'str'>:
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/newset/sparql","http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/test_dataset/sparql","http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/not4all/sparql"]
Rectified endpoints have type <class 'list'>:
['http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/newset/sparql', 'http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/test_dataset/sparql', 'http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/not4all/sparql']


In [17]:
query ="""
SELECT * WHERE { ?s ?p ?o } LIMIT 10
""" 

sparql_endpoint = endpoints_rectified[1]
sparql = SPARQLWrapper(sparql_endpoint)
sparql.setReturnFormat('json')
sparql.addCustomHttpHeader("Authorization", f'Bearer {token}')
sparql.setQuery(query)
result = sparql.queryAndConvert()
print(f'Sending query to "{sparql_endpoint}". Result:')
print(str(result)[:1500]+" ... ") # shortened only for demonstration purposes

Sending query to "http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/test_dataset/sparql". Result:
{'head': {'vars': ['s', 'p', 'o']}, 'results': {'bindings': [{'s': {'type': 'uri', 'value': 'http://example.org/Person'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}, 'o': {'type': 'uri', 'value': 'http://www.w3.org/2000/01/rdf-schema#Class'}}, {'s': {'type': 'uri', 'value': 'http://example.org/Person'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/2000/01/rdf-schema#label'}, 'o': {'type': 'literal', 'xml:lang': 'en', 'value': 'Person'}}, {'s': {'type': 'uri', 'value': 'http://example.org/Project'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}, 'o': {'type': 'uri', 'value': 'http://www.w3.org/2000/01/rdf-schema#Class'}}, {'s': {'type': 'uri', 'value': 'http://example.org/Project'}, 'p': {'type': 'uri', 'value': 'http://www.w3.org/2000/01/rdf-schema#label'}, 'o': {'type': 'literal', 'xml:lang': 'en', 'value': '

Using `sparql_tools`and `query_collection`:

In [18]:
from pmd_demo_tools import query_collection

query = query_collection.TestQueries().query_graph()

print("query ="+query.query)
print("")
print("query variables = "+str(query.qvars))
print("column headers  = "+str(query.headers))

query =SELECT ?s ?p ?o
            WHERE { ?s ?p ?o }

query variables = ['s', 'p', 'o']
column headers  = ['subject', 'predicate', 'object']


In [19]:
result = sparql_tools.send_query(sparql_endpoint, token, query.query, query.headers)

Sending query to "http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/test_dataset/sparql". Result:
                        subject  \
0     http://example.org/Person   
1     http://example.org/Person   
2    http://example.org/Project   
3    http://example.org/Project   
4    http://example.org/worksOn   
5    http://example.org/worksOn   
6    http://example.org/worksOn   
7    http://example.org/worksOn   
8      http://example.org/alice   
9      http://example.org/alice   
10     http://example.org/alice   
11  http://example.org/projectX   
12  http://example.org/projectX   

                                          predicate  \
0   http://www.w3.org/1999/02/22-rdf-syntax-ns#type   
1        http://www.w3.org/2000/01/rdf-schema#label   
2   http://www.w3.org/1999/02/22-rdf-syntax-ns#type   
3        http://www.w3.org/2000/01/rdf-schema#label   
4   http://www.w3.org/1999/02/22-rdf-syntax-ns#type   
5        http://www.w3.org/2000/01/rdf-schema#label   
6       http://www.w3.

#### Query all SPARQL endpoints on an instance

In [20]:
for sparql_endpoint in endpoints_rectified:
    result = sparql_tools.send_query(sparql_endpoint, token, query.query, query.headers)

Sending query to "http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/newset/sparql". Result:
Empty DataFrame
Columns: [subject, predicate, object]
Index: []

Sending query to "http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/test_dataset/sparql". Result:
                        subject  \
0     http://example.org/Person   
1     http://example.org/Person   
2    http://example.org/Project   
3    http://example.org/Project   
4    http://example.org/worksOn   
5    http://example.org/worksOn   
6    http://example.org/worksOn   
7    http://example.org/worksOn   
8      http://example.org/alice   
9      http://example.org/alice   
10     http://example.org/alice   
11  http://example.org/projectX   
12  http://example.org/projectX   

                                          predicate  \
0   http://www.w3.org/1999/02/22-rdf-syntax-ns#type   
1        http://www.w3.org/2000/01/rdf-schema#label   
2   http://www.w3.org/1999/02/22-rdf-syntax-ns#type   
3        http://www.w3.org

#### Dataset deletion

In [21]:
endpoint = f'https://{address}/api/v1/jena/{dataset_name}'
result = requests.delete(endpoint, headers=headers, verify=False).content.decode()
result

'"Dataset name test_dataset destroyed"'

### Working with multiple instances

Iterating over all ontodocker instances on the mesh is straight forward by iterating over the respective services. You can just manually (quite simple via the tab-completion) create a list of all 

#### Listing all SPARQL endpoints on the mesh

In [22]:
import fnmatch # shell-like pattern matching

In [23]:
patterns = ["*ontodocker*"]
for company_name, company in vars(partners_full).items():
    print(company_name)
    for server_name, server in vars(company).items():
        svcs = getattr(server, "services", None)
        if not isinstance(svcs, mesh_tools.RecursiveNamespace):
            continue  # None or missing (e.g., 500 -> set_none)
        for svc_name, svc in vars(svcs).items():  # svc_name is sanitized
            if any(fnmatch.fnmatchcase(svc_name.casefold(), p.casefold()) for p in patterns):
                address = svc.address
                token = svc.token
                try:
                    headers = {"Authorization": f"Bearer {token}"}
                    timeout=(3, 3)
                    result = requests.get(f'https://{address}/api/v1/endpoints', headers=headers, timeout=timeout, verify=VERIFY).content.decode()
                    print(f'Available SPARQL-endpoints at "{address}":')
                    print(result)
                    print("")
                except Exception as e:
                    print (f"An error occurred for the service with address '{address}':\n")
                    print(str(type(e))+"\n"+str(e)+"\n\n")

Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT
Available SPARQL-endpoints at "ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql"]

KIT
Available SPARQL-endpoints at "ontodocker-proxy.kit-3.pmd.internal":
["http://ontodocker-proxy.kit-3.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker-proxy.kit-3.pmd.internal:None/api/jena/tt_test/sparql"]

Fraunhofer_AISEC
MPISusMat
Available SPARQL-endpoints at "ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/newset/sparql","http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/not4all/sparql"]

BAM
An error occurred for the service with address 'ontodocker-internal.bam-s1.pmd.internal':

<class 'requests.exceptions.ConnectionError'>
HTTPSConnectionPool(host='ontodocker-internal.bam-s1.pmd.internal', port=443): Max retries exceeded with url: /api/v1/endpoints (Caused by NewConnectionError

That's basically the required pattern: Check if the service-section is at all populated, then check if there is some service called "ontodocker" or similar, then fill in the API call in the `try: ... except: ...` block.

Of course, this look clumsy and complicated. There is an abstraction in `mesh_tools`, making the loop easier to read and giving the option to select only certain services based on matching their name against a set of patterns:

In [24]:
for company, server, service_key, service in mesh_tools.iter_servers_with_services_matching(partners_full, ["*ontodocker*"]):
    print(company)
    address = service.address
    token = service.token
    try:
        headers = {"Authorization": f"Bearer {token}"}
        timeout=(3, 3)
        result = requests.get(f'https://{address}/api/v1/endpoints', headers=headers, timeout=timeout, verify=VERIFY).content.decode()
        print(f'Available SPARQL-endpoints at "{address}":')
        print(result)
        print("")
    except Exception as e:
        print (f"An error occurred for the service with address '{address}':\n")
        print(str(type(e))+"\n"+str(e)+"\n\n")

Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT
Available SPARQL-endpoints at "ontodocker.iwt.pmd.internal":
["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql"]

KIT
Available SPARQL-endpoints at "ontodocker-proxy.kit-3.pmd.internal":
["http://ontodocker-proxy.kit-3.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker-proxy.kit-3.pmd.internal:None/api/jena/tt_test/sparql"]

MPISusMat
Available SPARQL-endpoints at "ontodocker.mpi-susmat.pmd.internal":
["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/newset/sparql","http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/not4all/sparql"]

BAM
An error occurred for the service with address 'ontodocker-internal.bam-s1.pmd.internal':

<class 'requests.exceptions.ConnectionError'>
HTTPSConnectionPool(host='ontodocker-internal.bam-s1.pmd.internal', port=443): Max retries exceeded with url: /api/v1/endpoints (Caused by NewConnectionError('<urllib3.connec

Or even more compact:

In [25]:
endpoints = sparql_tools.list_sparql_endpoints(partners_full)

Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT
Available SPARQL-endpoints at "ontodocker.iwt.pmd.internal":
https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql

KIT
Available SPARQL-endpoints at "ontodocker-proxy.kit-3.pmd.internal":
http://ontodocker-proxy.kit-3.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql
http://ontodocker-proxy.kit-3.pmd.internal/api/v1/jena/tt_test/sparql

MPISusMat
Available SPARQL-endpoints at "ontodocker.mpi-susmat.pmd.internal":
http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/newset/sparql
http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/not4all/sparql





In [26]:
type(endpoints)

dict

In [27]:
endpoints

{'Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT_iwt': ['https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql'],
 'KIT_kit_3': ['http://ontodocker-proxy.kit-3.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql',
  'http://ontodocker-proxy.kit-3.pmd.internal/api/v1/jena/tt_test/sparql'],
 'MPISusMat_mpi_susmat': ['http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/newset/sparql',
  'http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/not4all/sparql']}

#### Creating datasets on all instances

In [None]:
for company, server, service_key, service in mesh_tools.iter_servers_with_services_matching(partners_full, ["*ontodocker*"]):
    print(company)
    address = service.address
    token = service.token
    try:
        headers = {"Authorization": f"Bearer {token}"}
        timeout=(3, 3)
        result = requests.put(f'https://{address}/api/v1/jena/{dataset_name}', headers=headers, timeout=timeout, verify=VERIFY).content.decode()
        print(f'Creating empty dataset "{dataset_name}" at "{address}":')
        print(result)
        print("")
    except Exception as e:
        print (f"An error occurred for the service with address '{address}':\n")
        print(str(type(e))+"\n"+str(e)+"\n\n")

#### Uploading turtle files into the datasets

In [None]:
for company, server, service_key, service in mesh_tools.iter_servers_with_services_matching(partners_full, ["*ontodocker*"]):
    print(company)
    address = service.address
    token = service.token
    try:
        headers = {"Authorization": f"Bearer {token}"}
        timeout=(3, 3)
        result = requests.post(f'https://{address}/api/v1/jena/{dataset_name}', headers=headers, files={'file': open(turtle_file_path, 'rb')}, timeout=timeout, verify=VERIFY).content.decode()
        print(f'Upload "{turtle_file_path}" to dataset "{dataset_name}" at "{address}"')
        print(result)
        print("")
    except Exception as e:
        print (f"An error occurred for the service with address '{address}':\n")
        print(str(type(e))+"\n"+str(e)+"\n\n")

#### Federating SPARQL queries

##### Querying all endpoints

In [None]:
results = sparql_tools.federated_query(partners=partners_full, query=query.query, columns=query.headers)

Results can be converted to a `RecursiveNamespace` object:

In [None]:
results_rns = mesh_tools.RecursiveNamespace(**results)

In [None]:
results_rns.Fraunhofer_IWM_iwm.ontodocker.test_dataset.result

##### Filtering Endpoints via patterns
Federated queries can also be narrowed by specifying potential shell-like patterns for the dataset names:

In [None]:
results = sparql_tools.federated_query(partners=partners_full, query=query.query, columns=query.headers, datasets=["*test*"], print_to_screen=False)

In [None]:
results_rns = mesh_tools.RecursiveNamespace(**results)

In [None]:
results_rns.MPISusMat_mpi_susmat.ontodocker.test_dataset.result

Note/ todo: `federated_query` should return a dict with the structure company -> server -> ontodocker-instance -> dataset_name

#### Dataset deletion

In [None]:
for company, server, service_key, service in mesh_tools.iter_servers_with_services_matching(partners_full, ["*ontodocker*"]):
    print(company)
    address = service.address
    token = service.token
    try:
        headers = {"Authorization": f"Bearer {token}"}
        timeout=(3, 3)
        result = requests.delete(f'https://{address}/api/v1/jena/{dataset_name}', headers=headers, timeout=timeout, verify=VERIFY).content.decode()
        print(f'Deleting dataset "{dataset_name}" at "{address}":')
        print(result)
        print("")
    except Exception as e:
        print (f"An error occurred for the service with address '{address}':\n")
        print(str(type(e))+"\n"+str(e)+"\n\n")