# Exploring the distributed information via SPARQL
Next, we answer a few typical quations a user would ask when working with the datasets. A user has to know the addresses of the SPARQL endpoints of interest.  
We basically have 2 options:
1. Define the SPARQL-query, then loop over all SPARQL-Endpoints and digest the response
2. Collect the graphs from the Endpoints, put the together into one graph-object and then operate on this object.
   1. collect via SPARQL, store, put together
   2. directly get the turtlefile via the ontodocker API, store locally, then load from the files and put together into one object. 

Set up things to use the mesh:

In [1]:
import os
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

Also, as a preparation we load the (mesh-) participant information:

In [2]:
import json
import helpers
with open('../secrets/endpoints.json') as f:
    partners = json.load(f, object_hook=lambda d: helpers.RecursiveNamespace(**d))

## SPARQL queries
To make the following notebook easierly readable, we define some queries in `queries.py`.

In [3]:
from queries import QueryCollection
queries = QueryCollection()

There you can find queries which prompt for various information:
- `queries.query_materialDesignation()` For which material is data available?  
This reads out the `pmdco:value` stored under `pmdco:materialDesignation`
- `queries.query_processType()`: What process type was performed on S355?
- `queries.query_orientation()`: In what orientation relative to the rolling direction was each specimen cut?
- `queries.query_deviceAndStandard()`: Which device and standard was used for the measurements?
- `queries.query_specimen()`: What is the specimen ID
- `queries.query_csvurl()`: Under which URL can we find the csv-file?

The queries are defined as class methods. They return a dataclass instance, allowing to easily add new methods(queries) and enabling a dot notation and tab completion. They return the query (as a string) and column headers (as a list of strings), corresponding to the returned values of the resp. query.

## Explore the datasets: directly on the instances
Here, we demonstrate how a simple data exploration can be done by sending queries directly to the intances' SPARQL endpoints and looping over instances. We perform queries using the `SPARQLWrapper` module.

First, we need to have a look at the accessible endpoints on the instances referenced in `partners`, to know whihc endpoint we have to query. We do this via http requests to the ontodocker API (sending `GET` to `<address>/api/v1/endpoints`): 

In [4]:
import requests
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    headers = {"Authorization": f"Bearer {token}"}
    result = requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode()
    print(result+"\n"+str(type(result))+"\n")

["https://ontodocker.iwt.pmd.internal:443/api/jena/pmdco2_tto_example_parallel/sparql"]
<class 'str'>

["http://ontodocker.iwm.pmd.internal:None/api/jena/pmdco2_tto_example_perpendicular/sparql","http://ontodocker.iwm.pmd.internal:None/api/jena/test/sparql"]
<class 'str'>

["http://ontodocker.mpi-susmat.pmd.internal:None/api/jena/pmdco2_tto_example_diagonal/sparql"]
<class 'str'>



From the printed results, we see that the datasets we're interested in are `pmdco2_tto_example_parallel`, `pmdco2_tto_example_perpendicular` and `pmdco2_tto_example_diagonal`.  
Note: these are just plain strings, that merely look like lists of strings. To actually get a referencible list, this hat to be cast to a proper list, e.g. using the `ast` module.  To make things easier, ther is a function `rectify_endpoints()` defined in `helpers.py` which perfoms all necessary steps.

In [5]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    headers = {"Authorization": f"Bearer {token}"}
    endpoints = helpers.rectify_endpoints(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print(endpoints)
    print(str(type(endpoints))+ "\n")

['https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql']
<class 'list'>

['http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql', 'http://ontodocker.iwm.pmd.internal/api/v1/jena/test/sparql']
<class 'list'>

['http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/pmdco2_tto_example_diagonal/sparql']
<class 'list'>



Now we can perform the exploration

In [6]:
# required for fixing a bug in SPARQLWrapper
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

from SPARQLWrapper import SPARQLWrapper

Let's perform a query, prompting for the materialsystem (`materialDesignation`), as defined above:

In [7]:
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token

    headers = {"Authorization": f"Bearer {token}"}
    endpoints = helpers.rectify_endpoints(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    
    for ep in endpoints:
        sparql = SPARQLWrapper(ep)
        sparql.setReturnFormat('json')
        sparql.addCustomHttpHeader("Authorization", f'Bearer {token}')
        print(f'Sending query to "{ep}". Result:')

        query = queries.material_designation()
        sparql.setQuery(query.query)
        result = sparql.queryAndConvert()
        print(result)
        print("")

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
{'head': {'vars': ['p', 'matDesVal']}, 'results': {'bindings': [{'p': {'type': 'uri', 'value': 'https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process'}, 'matDesVal': {'type': 'literal', 'value': 'S355'}}, {'p': {'type': 'uri', 'value': 'https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-2_process'}, 'matDesVal': {'type': 'literal', 'value': 'S355'}}, {'p': {'type': 'uri', 'value': 'https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-3_process'}, 'matDesVal': {'type': 'literal', 'value': 'S355'}}, {'p': {'type': 'uri', 'value': 'https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-4_process'}, 'matDesVal': {'type': 'literal', 'value': 'S355'}}]}}

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
{'head': {'vars': ['p', 'matDesVal']}, 'results': {'bindings': [{'p': {'type': 'uri', 'value': 'https://w3id.org/pmd/ao/tte/pmdao-tto-t

This returns the queried triples in a json respresentation. Let's improve the readability:

**NOTE:** what the matter with other return-formats (e.g. csv, json-ld, ...)?

We can format the results a bit more human-friendly using `pandas` dataframes. Again, to keep things better readable, ther is a function `make_dataframe` defined in `helpers.py`:

In [8]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.colheader_justify', 'left')
pd.set_option('display.width', 200)

datasets = ["pmdco2_tto_example_parallel", "pmdco2_tto_example_perpendicular","pmdco2_tto_example_diagonal"]

for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token

    headers = {"Authorization": f"Bearer {token}"}
    endpoints = helpers.rectify_endpoints(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    
    for ep in endpoints:
        if any(substring in ep for substring in datasets): # only check endpoints, whose address contains any of the dataset names we found above
            sparql = SPARQLWrapper(ep)
            sparql.setReturnFormat('json')
            sparql.addCustomHttpHeader("Authorization", f'Bearer {token}')
            print(f'Sending query to "{ep}". Result:')
            query = queries.material_designation()
            sparql.setQuery(query.query)
            result = sparql.queryAndConvert()
            result_df = helpers.make_dataframe(result, query.columns)
            print(result_df)
            print("")

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
  URI                                                      materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process  S355              
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-2_process  S355              
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-3_process  S355              
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-4_process  S355              

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
  URI                                                      materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-5_process  S355              
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-6_process  S355              
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-7_process  S355              
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-8_process  S3

In [9]:
def distributed_query(partners, datasets, query, columns):
    
    for key in partners.__dict__:
        address = getattr(partners, key).ontodocker.address
        token = getattr(partners, key).ontodocker.token

        headers = {"Authorization": f"Bearer {token}"}
        endpoints = helpers.rectify_endpoints(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
        
        for ep in endpoints:
            if any(substring in ep for substring in datasets): # only query endpoints, whose address contains any of the dataset names we found above
                sparql = SPARQLWrapper(ep)
                sparql.setReturnFormat('json')
                sparql.addCustomHttpHeader("Authorization", f'Bearer {token}')
                print(f'Sending query to "{ep}". Result:')
                sparql.setQuery(query)
                result = sparql.queryAndConvert()
                result_df = helpers.make_dataframe(result, columns)
                print(result_df)
                print("")

In [10]:
query = queries.material_designation()
distributed_query(partners, datasets, query.query, query.columns)

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
  URI                                                      materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process  S355              
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-2_process  S355              
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-3_process  S355              
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-4_process  S355              

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
  URI                                                      materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-5_process  S355              
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-6_process  S355              
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-7_process  S355              
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-8_process  S3

This can be further improved, by using the `federated_query()` method. There, we query all ontodocker instances which can be found in `endpoints.json`, but only endpoints, which addresses (which include the dataset names) contain any of the strings provided via `datasets`; so it's basically a federated query method + simple search.

In [11]:
matDes_results = helpers.federated_query(partners, datasets, query.query, query.columns, False)

Transforming this obect (ist a dict) allows for browsing some information - including the results of the query - via dot-notation and tab completion: 

In [12]:
matDes_results_rns = helpers.RecursiveNamespace(**matDes_results)

In [13]:
matDes_results_rns.iwt.ontodocker_iwt.pmdco2_tto_example_parallel.result

Unnamed: 0,URI,materialDesignation
0,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process,S355
1,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-2_process,S355
2,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-3_process,S355
3,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-4_process,S355


In [14]:
matDes_results_rns.iwm.ontodocker_iwm.pmdco2_tto_example_perpendicular.endpoint

'http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql'

In [15]:
testURI = "https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process"
query = queries.primary_data(testURI)
distributed_query(partners, datasets, query.query, query.columns)

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
   URI                                                      Quantity                                                                          value    unit                               
0   https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process               https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_crossSectionArea_S0  120.636  http://qudt.org/vocab/unit/MilliM2
1   https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process               https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_crossSectionArea_Su   52.659  http://qudt.org/vocab/unit/MilliM2
2   https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process               https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_crossSectionArea_s1  120.675  http://qudt.org/vocab/unit/MilliM2
3   https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process               https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_crossSectionArea_s2

In [16]:
query = queries.secondary_data(testURI)
distributed_query(partners, datasets, query.query, query.columns)

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
  URI                                                      Quantity                                                                           value   unit                               
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process                       https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_maximumForce  62.007    http://qudt.org/vocab/unit/kiloN
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process                https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_modulusOfElasticity   194.0   http://qudt.org/vocab/unit/GigaPa
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_percentageElongationAfterFracture     NaN  http://qudt.org/vocab/unit/PERCENT
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process          https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_percentageReductionOfArea    

In [17]:
query = queries.metadata(testURI)
distributed_query(partners, datasets, query.query, query.columns)

Sending query to "https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
  URI                                                      Quantity                                                                  value unit                               
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_environmentalTemperature  20.0    http://qudt.org/vocab/unit/DEG_C
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process   https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_extensometerGaugeLength  50.0   http://qudt.org/vocab/unit/MilliM
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process           https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_transitionPoint   1.6  http://qudt.org/vocab/unit/PERCENT

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
Empty DataFrame
Columns: [URI, Quantity, value, unit]
Index: []

Sending 