# Exploring the distributed information via SPARQL and helper methods
Next, we answer a few typical quations a user would ask when working with the datasets. A user has to know the addresses of the SPARQL endpoints of interest.  
We basically have 2 options:
1. Define the SPARQL-query, then loop over all SPARQL-Endpoints and digest the response
2. Collect the graphs from the Endpoints, put the together into one graph-object and then operate on this object.
   1. collect via SPARQL, store, put together
   2. directly get the turtlefile via the ontodocker API, store locally, then load from the files and put together into one object. 

Set up things to use the mesh and load participanf information:

In [1]:
import os
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"

import json
import sys
sys.path.append("..")
import helpers

with open('../secrets/participant_registry.json') as f:
    partners = json.load(f, object_hook=lambda d: helpers.RecursiveNamespace(**d))

## SPARQL queries
To make the following notebook easierly readable, we define some queries in `queries.py`. At the moment, they are hard-coded and will only work for these specific versions of pmdCO and the TTO.

In [2]:
from queries import QueryCollection
queries = QueryCollection()

You can find queries which prompt for various information:
- `queries.query_materialDesignation()` For which material is data available?  
This reads out the `pmdco:value` stored under `pmdco:materialDesignation`
- `queries.query_processType()`: What process type was performed on S355?
- `queries.query_orientation()`: In what orientation relative to the rolling direction was each specimen cut?
- `queries.query_deviceAndStandard()`: Which device and standard was used for the measurements?
- `queries.query_specimen()`: What is the specimen ID
- `queries.query_csvurl()`: Under which URL can we find the csv-file?

The queries are defined as class methods. They return a dataclass instance, allowing to easily add new methods(queries) and enabling a dot notation and tab completion. They return the query (as a string) and column headers (as a list of strings), corresponding to the returned values of the resp. query.  
Because a `dataclass` is uesd, it is very easy to make returned contents attributes directly accessible via dot-notation while reducing boilerplate code (defining `__init__` etc.):

In [3]:
query = queries.material_designation()
print("query ="+query.query)
print("")
print("columns = "+str(query.columns))

query =
        PREFIX pmd: <https://w3id.org/pmd/co/>
        SELECT DISTINCT ?p ?matDesVal
        WHERE {
            ?s a pmd:TestPiece .
            ?p pmd:input ?s .
            ?p pmd:characteristic ?matDes .
            ?matDes a pmd:materialDesignation .
            ?matDes pmd:value ?matDesVal .
        }
        ORDER BY ?p
        

columns = ['URI', 'materialDesignation']


## Explore the datasets
Here, we demonstrate how a simple data exploration can be done by sending queries directly to the intances' SPARQL endpoints and looping over instances. We perform queries using the `SPARQLWrapper` module, wrpeed in some convenience methods (from `../helpers.py`)

### Get a first idea
First, we have a look at the accessible endpoints on the instances referenced in `partners`, to know which endpoints are accessible. We do this via http requests to the ontodocker API (sending `GET` to `<address>/api/v1/endpoints`): 

In [4]:
import requests
for key in partners.__dict__:
    address = getattr(partners, key).ontodocker.address
    token = getattr(partners, key).ontodocker.token
    
    headers = {"Authorization": f"Bearer {token}"}
    result = helpers.rectify_endpoints(requests.get(f'{address}/api/v1/endpoints', headers=headers).content.decode())
    print(result)

['http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql', 'http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql', 'http://ontodocker.iwm.pmd.internal/api/v1/jena/test/sparql']
['https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql', 'https://ontodocker.iwt.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql']
['http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/pmdco2_tto_example_diagonal/sparql']


This gives us an idea of the dataset names. In `heleprs.py`, ther is a method `federated_query()`, which helps to distribute the query among endpoints. To avoid alway querying all endpoints, eventhough they do not contain the dataset of interest, you can pass a list of strings, of which you expect to be in the dataset names (e.g. `"_tto_example_"` in this case). You can also just list the exact names of the datasets to query precisely the correct endpoints. if `datasets = None`, all endpoints are queried.

In [5]:
datasets = None

In [6]:
query = queries.material_designation()
matDes_results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
                                                 URI materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
                                                 URI materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355

Sending query to "http://ontodocker.iwm.pmd

You can suppress the output to screen by stting `print_to_screen=False`. Note the empty oputput for datasets which do not fit the query. This can be avoided by specifying `datasets` more precisely:

In [7]:
datasets = ["pmdco2_tto_example_parallel", "pmdco2_tto_example_perpendicular","pmdco2_tto_example_diagonal"]
matDes_results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
                                                 URI materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
                                                 URI materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-...                S355

Sending query to "https://ontodocker.iwt.pm

Again, we can again use the `RecusiveNamespace` class to make this browsable via dot-notation and tab-completion:

In [8]:
matDes_results_rns = helpers.RecursiveNamespace(**matDes_results)

In [9]:
print("endpoint =\n"+str(matDes_results_rns.mpi_susmat.ontodocker_mpisusmat.pmdco2_tto_example_diagonal.endpoint))
print("")
print("query ="+str(matDes_results_rns.mpi_susmat.ontodocker_mpisusmat.pmdco2_tto_example_diagonal.query))

endpoint =
http://ontodocker.mpi-susmat.pmd.internal/api/v1/jena/pmdco2_tto_example_diagonal/sparql

query =
        PREFIX pmd: <https://w3id.org/pmd/co/>
        SELECT DISTINCT ?p ?matDesVal
        WHERE {
            ?s a pmd:TestPiece .
            ?p pmd:input ?s .
            ?p pmd:characteristic ?matDes .
            ?matDes a pmd:materialDesignation .
            ?matDes pmd:value ?matDesVal .
        }
        ORDER BY ?p
        


And of course the result of the query is also accessible (as a `pandas` dataframe):

In [13]:
matDes_results_rns.mpi_susmat.ontodocker_mpisusmat.pmdco2_tto_example_diagonal.result

Unnamed: 0,URI,materialDesignation
0,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-10_process,S355
1,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-9_process,S355


Adjust `pandas` to improve readability:

In [14]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.colheader_justify', 'left')
pd.set_option('display.width', 200)

matDes_results_rns.mpi_susmat.ontodocker_mpisusmat.pmdco2_tto_example_diagonal.result

Unnamed: 0,URI,materialDesignation
0,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-10_process,S355
1,https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-9_process,S355


## The actual exploration:
### 1.) What material was tested?

In [15]:
query = queries.material_designation()
results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)
results_rns = helpers.RecursiveNamespace(**matDes_results)

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
  URI                                                      materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process  S355              
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-2_process  S355              
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-3_process  S355              
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-4_process  S355              

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
  URI                                                      materialDesignation
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-5_process  S355              
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-6_process  S355              
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-7_process  S355              
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-8_process  S35

### 2.) What processes were performed on the specimen? 

In [16]:
query = queries.process_type()
results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)
results_rns = helpers.RecursiveNamespace(**matDes_results)

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_parallel/sparql". Result:
  URI                                                      Process type                        
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process  https://w3id.org/pmd/co/TensileTest
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-2_process  https://w3id.org/pmd/co/TensileTest
2  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-3_process  https://w3id.org/pmd/co/TensileTest
3  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-4_process  https://w3id.org/pmd/co/TensileTest

Sending query to "http://ontodocker.iwm.pmd.internal/api/v1/jena/pmdco2_tto_example_perpendicular/sparql". Result:
  URI                                                      Process type                        
0  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-5_process  https://w3id.org/pmd/co/TensileTest
1  https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-6_process  https://w3id.org/pmd/co/TensileTest
2  htt

### 3.) In what orientation relative to the rolling direction was each specimen cut?

In [None]:
query = queries.orientation()
results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)
results_rns = helpers.RecursiveNamespace(**matDes_results)

### 4.) Which device was used for the measurements?

In [None]:
query = queries.extensiometer()
results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)
results_rns = helpers.RecursiveNamespace(**matDes_results)

### 5.) Which standard was applied?

In [None]:
query = queries.standard()
results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)
results_rns = helpers.RecursiveNamespace(**matDes_results)

### 6.) Which ID was assigned to the specimen?

In [None]:
query = queries.specimen_id()
results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)
results_rns = helpers.RecursiveNamespace(**matDes_results)

### 7.) For each process, show the URL to the data file.

In [None]:
query = queries.csv_url()
results = helpers.federated_query(partners, datasets, query.query, query.columns, print_to_screen=True)
results_rns = helpers.RecursiveNamespace(**matDes_results)

**WIP:** Metadata. The queries are ther, but it makes sense to opionally only perform these queries for specific URI to reduce output.

In [None]:
testURI = "https://w3id.org/pmd/ao/tte/pmdao-tto-tt-S355-1_process"
query = queries.primary_data(testURI)
#distributed_query(partners, datasets, query.query, query.columns)

In [None]:
query = queries.secondary_data(testURI)
#distributed_query(partners, datasets, query.query, query.columns)

In [None]:
query = queries.metadata(testURI)
#distributed_query(partners, datasets, query.query, query.columns)