# BattINFO Example: Load and Query an Electrolyte Composition

This Jupyter Notebook demonstrates how to process JSON-LD (Linked Data in JSON format) using `rdflib`, a Python library for working with RDF (Resource Description Framework) graphs. We will load an example JSON-LD file containing battery time-series test metadata, convert it into an RDF graph, and perform SPARQL queries to extract structured information.

But first, let's load the packages we will need.

In [15]:
import json
from rdflib import Graph
from ontopy import get_ontology

## Step 1: Load the Battery Ontology
Ontologies define a structured vocabulary for describing data in a machine-readable way. Here, we use `ontopy` to retrieve the `battinfo` ontology, which provides standardized definitions for battery-related test data concepts such as time, voltage, and current.

In [16]:
# Loading from web
battinfo = get_ontology('https://w3id.org/emmo/domain/battery/inferred').load()


## Step 2: Load the JSON-LD File
The JSON-LD file contains battery-related metadata in a structured format. We will load this file into Python so that we can convert it into an RDF graph.

In [17]:
# Load JSON-LD file
file_path = "battinfo_example_timeseries_test_data.metadata.json"
with open(file_path, "r") as file:
    jsonld_data = json.load(file)

## Step 3: Initialize and Parse the RDF Graph
RDF (Resource Description Framework) is a standard for structuring linked data. Here, we create an RDF graph and populate it with data from the JSON-LD file.

In [18]:
g = Graph()
g.parse(data=json.dumps(jsonld_data), format="json-ld")

<Graph identifier=N605b45e594ae4031921ab3cb2a453e3b (<class 'rdflib.graph.Graph'>)>

## Step 4: Query the Graph using SPARQL
SPARQL is a query language for retrieving information from RDF graphs. In this query, we retrieve:
- The measurement types (test time, voltage, and current)
- Their associated units
- Whether they are required fields

In [19]:
query = f"""
PREFIX emmo: <https://w3id.org/emmo/domain/battery/>
PREFIX csvw: <http://www.w3.org/ns/csvw#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?columnName ?columnTitle ?propertyType ?propertyClass ?unit ?required WHERE {{
    ?column a csvw:Column ;
            csvw:name ?columnName ;
            csvw:titles ?columnTitle ;
            csvw:propertyUrl ?propertyType .
    OPTIONAL {{ ?column csvw:required ?required . }}
    OPTIONAL {{ ?column <{battinfo.hasMeasurementUnit.iri}> ?unit . }}
    OPTIONAL {{ ?propertyType a ?propertyClass . }}
}}
"""


## Step 5: Execute the SPARQL Query
This retrieves the relevant data from our RDF graph, ensuring that we get human-readable labels instead of cryptic UUIDs or blank nodes.

In [20]:
# Execute query
results = g.query(query)

## Step 6: Display Results
We process the query results and format them for easy readability.

In [34]:
output = []
for row in results:
    required_status = "Required" if row.required and row.required.lower() == "true" else "Optional"
    property_name = battinfo.search_one(iri=row.propertyClass).prefLabel[0]
    unit_name = battinfo.search_one(iri=row.unit).prefLabel[0]
    output.append(f"Column: {row.columnName} ({row.columnTitle}), "
                  f"Property Class: {property_name}, Unit: {unit_name}, Required: {required_status}")

print("Time-Series Test Data Schema:")
for line in output:
    print(line)

Time-Series Test Data Schema:
Column: test_time_millisecond (Test Time  /  ms), Property Class: TestTime, Unit: MilliSecond, Required: Required
Column: voltage_volt (Voltage  /  V), Property Class: Voltage, Unit: Volt, Required: Required
Column: current_ampere (Current  /  A), Property Class: ElectricCurrent, Unit: Ampere, Required: Required


## Summary
This notebook demonstrated how to process and query JSON-LD data representing battery test metadata. We loaded the data into an RDF graph, executed a SPARQL query to extract structured information, and retrieved human-readable labels for better interpretability. This approach highlights the power of semantic data models in making battery-related test data more accessible to both humans and machine agents.