# S355 Tensile Test SPARQL Queries

This Jupyter Notebook provides some examples of SPARQL queries that can be performed to obtain information relevant to tensile testing. 
An [example dataset of tensile tests performed on an S355 steel](https://github.com/materialdigital/tensile-test-ontology/blob/main/tensile_test_data/S355_data_tto.rdf) is used as a basis. In this Jupyter Notebook, a local triple store is created using the OWLready2 Python package. Within this triple store, the respective ontology and the data are loaded and can be queried afterwards.
Accordingly, necessary and useful libraries are imported and helper functions are implemented.
The SPARQL queries are read in from especially created files that contain only the SPARQL query body (text of SPARQL query) and can be found in a dedicated [sparql folder](https://github.com/materialdigital/tensile-test-ontology/tree/main/tensile_test_data/sparql).

The queries follow the general pattern of SPARQL queries:

```SPARQL
PREFIX ex: <https://example.org/my/namespace/>

SELECT ?s ?p ?o
WHERE {
    ?s ?p ?o
}
```

## Import of relevant packages | Definition of helper functions

In [1]:
%%capture
# Import relevant and useful packages
import requests
from io import BytesIO
import os
import numpy as np
import pandas as pd
import owlready2 as or2
from owlready2 import World
import re
from tabulate import tabulate

# Definition of helper functions
# Function to transform inputs to IRIs.
def to_iri(input):
    try:
        return input.iri
    except:
        pass
    return input

# Function to write the result of a SPARQL query into a (pandas) data frame.
def sparql_result_to_df(res):
    l = []
    for row in res:
        r = [ to_iri(item)  for item in row]
        l.append(r)
    return pd.DataFrame(l)


def load_ontologies_to_world(*ontology_urls):
    """
    Loads ontologies from the given URLs into an OWLready2 World instance.
    
    Parameters:
        ontology_urls: A variable number of URLs pointing to ontologies.
    
    Returns:
        An OWLready2 World instance containing the loaded ontologies.
    """
    # Create a new World instance for loading ontologies
    world = World()
    
    # Iterate over each provided ontology URL
    for url in ontology_urls:
        try:
            # Fetch the ontology content, following redirects
            response = requests.get(url, allow_redirects=True)
            response.raise_for_status()  # Check for HTTP errors

            # Load the ontology from the response content
            world.get_ontology(url).load(fileobj=BytesIO(response.content))
        
        except requests.exceptions.RequestException as e:
            print(f"Failed to load ontology from {url}: {e}")
    
    return world


import requests

def load_sparql(query_name: str) -> str:
    """
    Loads a SPARQL query file directly from GitHub (raw URL).
    
    Parameters
    ---------
    query_name : str
        Name of the SPARQL file without extension (.sparql).
    
    Return
    --------
    str
        Content of the SPARQL file as a string.
    """
    base_url = "https://raw.githubusercontent.com/materialdigital/tensile-test-ontology/main/tensile_test_data/sparql"
    url = f"{base_url}/{query_name}.sparql"
    
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    else:
        raise FileNotFoundError(f"Datei konnte nicht geladen werden: {url} (Status {response.status_code})")


## Definition of Sources

In the following cell, the sources of ontologies to be parsed as well as the source of the A-Box (example dataset of tensile tests performed on an S355 steel) are specified.

In [None]:
# Definition of links to ontologies, files, etc. to be loaded in the local triple store
link_ontology_1 = "https://w3id.org/pmd/co/" # PMD Core Ontology (PMDco) as basis for tensile test ontology
link_ontology_2 = "https://w3id.org/pmd/ao/tto/" # Tensile Test Ontology (TTO)
link_data = "https://raw.githubusercontent.com/materialdigital/tensile-test-ontology/refs/heads/main/tensile_test_data/S355_data_tto.rdf" # Example data on S355 steel

# Loading ontologies and data files (A-Box) in the local triple store
triple_store = load_ontologies_to_world(link_ontology_1, link_ontology_2)
triple_store.get_ontology(link_data).load()

## SPARQL Query

In the following cell, the source, meaning the name, of the SPARQL query file **is to be selected / specified by users**. 

The query contained in this file will be used for querying in the subsequent cell.

### Depiction of Results 

For a depiction / visualization of results in table format, the module tabulate is used in the following.
Furthermore, as the SPARQL query is defined by a dedicated SPARQL query file (link_SPARQL_query), the headers of the result table can be read from the select clause in the query. This way, the result can be double-checked manually and consistency is ensured (did the SPARQL query select statement really address the information I wanted to obtain?). Hence, the following code includes a read in of the information queried for (the terms / concepts / entities addressed using the select clause).

In [None]:
# Specification of the SPARQL query of interest
# Which SPARQL query is to be performed?
# Please insert the name of the query (to be found in the "sparql" folder)

query_name = 'count_all_entities'

In [None]:
# Load the file from the resource and read the SPARQL query
query = load_sparql(query_name)

# Execute the SPARQL query
res = triple_store.sparql(query)

# Convert the result to a DataFrame
data = sparql_result_to_df(res)

# Visualization Part
# Step: Extract the terms from the SELECT clause
# This regular expression looks for the SELECT or SELECT DISTINCT clause and captures the terms.
select_clause_match = re.search(r'SELECT\s+(DISTINCT\s+)?(.*?)\s+WHERE', query, re.DOTALL)

if select_clause_match:
    select_clause = select_clause_match.group(2)  # Use group(2) to capture the variables
    # Split the terms by whitespace and strip any leading or trailing spaces
    headers = [term.strip().lstrip('?') for term in select_clause.split() if term.strip().startswith('?')]
else:
    print("No headers were found. Please check the select clause within the SPARQL query.")

# Step: Use the headers in the tabulate print statement
# Print the data with tabulate
print(tabulate(data, headers=headers, tablefmt='psql', showindex=True))

## Perform all SPARQL Queries

Using the following cell, all SPARQL queries available in the [example sparql folder]() will be performed one after the other automatically. All results are depicted. 

In [None]:
# GitHub API URL to list all files in the SPARQL folder
repo_api_url = "https://api.github.com/repos/materialdigital/tensile-test-ontology/contents/tensile_test_data/sparql"

# Get the JSON response
response = requests.get(repo_api_url)
files_json = response.json()

# Filter only .sparql files
query_files = [f['name'] for f in files_json if f['name'].endswith('.sparql')]

# Dictionary to store results
all_results = {}

for query_file in query_files:
    query_name = query_file.replace(".sparql", "")
    try:
        # Load SPARQL content from GitHub using your existing function
        query = load_sparql(query_name)
        
        # Execute SPARQL query on the triple store
        res = triple_store.sparql(query)
        
        # Convert to DataFrame
        df = sparql_result_to_df(res)
        
        # Store in dictionary
        all_results[query_name] = df
        
        # Extract headers from SELECT clause
        select_clause_match = re.search(r'SELECT\s+(DISTINCT\s+)?(.*?)\s+WHERE', query, re.DOTALL)
        if select_clause_match:
            select_clause = select_clause_match.group(2)
            headers = [term.strip().lstrip('?') for term in select_clause.split() if term.strip().startswith('?')]
        else:
            headers = None
        
        # Print results nicely
        print(f"\n=== Results for Query: {query_name} ===")
        print(tabulate(df, headers=headers, tablefmt='psql', showindex=True))
    
    except Exception as e:
        print(f"Error executing query '{query_name}': {e}")