# BattINFO Example: Load and Query an Electrolyte Composition

This Jupyter Notebook demonstrates how to process JSON-LD (Linked Data in JSON format) using `rdflib`, a Python library for working with RDF (Resource Description Framework) graphs. We will load an example JSON-LD file containing battery electrolyte metadata, convert it into an RDF graph, and perform SPARQL queries to extract structured information.

But first, let's load the packages we will need.

In [3]:
import json
from rdflib import Graph
from ontopy import get_ontology
from pathlib import Path

## Step 1: Load the Battery Ontology
Ontologies define a structured vocabulary for describing data in a machine-readable way. Here, we use `ontopy` to retrieve the `battinfo` ontology, which provides standardized definitions for battery-related concepts such as electrolytes, solvents, and numerical properties.

In [2]:
# Loading from web
battinfo = get_ontology('https://w3id.org/emmo/domain/battery/inferred').load()


## Step 2: Load the JSON-LD File
The JSON-LD file contains battery-related metadata in a structured format. We will load this file into Python so that we can convert it into an RDF graph.

In [4]:
# Load JSON-LD file
file_path = Path().resolve().parent / "metadata" / "battinfo_example_electrolyte.metadata.json"
with open(file_path, "r") as file:
    jsonld_data = json.load(file)

## Step 3: Initialize and Parse the RDF Graph
RDF (Resource Description Framework) is a standard for structuring linked data. Here, we create an RDF graph and populate it with data from the JSON-LD file.

In [5]:
g = Graph()
g.parse(data=json.dumps(jsonld_data), format="json-ld")

<Graph identifier=N63f9e8b2379a4cc79b8d2a92f112e375 (<class 'rdflib.graph.Graph'>)>

## Step 4: Query the Graph using SPARQL
SPARQL is a query language for retrieving information from RDF graphs. In this query, we retrieve:
- The electrolyte identifier and its human-readable label
- The components of the electrolyte (e.g., solvents, additives)
- The concentration of each component

In [6]:
# Query the graph for electrolyte composition using UUID IRIs
query = f"""
PREFIX emmo: <https://w3id.org/emmo/domain/battery/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?electrolyte ?electrolyteLabel ?component ?amount ?componentLabel WHERE {{
    ?electrolyte a <{battinfo.Electrolyte.iri}> ;
                rdfs:label ?electrolyteLabel ;
                <{battinfo.hasSolvent.iri}> ?solvent .
    ?solvent <{battinfo.hasConstituent.iri}> ?component .
    ?component <{battinfo.hasProperty.iri}> ?property .
    ?component rdfs:label ?componentLabel .
    ?property <{battinfo.hasNumericalPart.iri}> ?value .
    ?value <{battinfo.hasNumberValue.iri}> ?amount .
}}
"""

## Step 5: Execute the SPARQL Query
This retrieves the relevant data from our RDF graph, ensuring that we get human-readable labels instead of cryptic UUIDs or blank nodes.

In [7]:
# Execute query
results = g.query(query)

## Step 6: Display Results
We process the query results and format them for easy readability.

In [8]:
# Retrieve prefLabels from EMMOntoPy for the UUID-based IRIs
output = []
for row in results:
    component_label = row.componentLabel if row.componentLabel else row.component
    electrolyte_label = row.electrolyteLabel if row.electrolyteLabel else row.electrolyte
    output.append(f"Component: {component_label}, Volume Fraction: {row.amount}")

# Print results
print(f"Electrolyte: {electrolyte_label}")
for line in output:
    print(line)

Electrolyte: 1M LiPF6 in DMC:EC:EMC 1:1:1 (vol.) + 2 wt% VC
Component: ethylene carbonate, Volume Fraction: 0.334
Component: diethylene carbonate, Volume Fraction: 0.333
Component: dimethyl carbonate, Volume Fraction: 0.333


## Summary
This notebook demonstrated how to process and query JSON-LD data representing battery electrolyte compositions. We loaded the data into an RDF graph, executed a SPARQL query to extract structured information, and retrieved human-readable labels for better interpretability. This approach highlights the power of semantic data models in making battery-related knowledge more accessible to both humans and machine agents.