# BattINFO Example: Load and Query an Electrolyte Composition

This Jupyter Notebook demonstrates how to process JSON-LD (Linked Data in JSON format) using `rdflib`, a Python library for working with RDF (Resource Description Framework) graphs. We will load an example JSON-LD file containing battery coin cell metadata, convert it into an RDF graph, and perform SPARQL queries to extract structured information.

But first, let's load the packages we will need.

In [34]:
import json
from rdflib import Graph
from ontopy import get_ontology

## Step 1: Load the Battery Ontology
Ontologies define a structured vocabulary for describing data in a machine-readable way. Here, we use `ontopy` to retrieve the `battinfo` ontology, which provides standardized definitions for battery-related concepts such as electrolytes, solvents, and numerical properties.

In [35]:
# Loading from web
battinfo = get_ontology('https://w3id.org/emmo/domain/battery/inferred').load()


## Step 2: Load the JSON-LD File
The JSON-LD file contains battery-related metadata in a structured format. We will load this file into Python so that we can convert it into an RDF graph.

In [36]:
# Load JSON-LD file
file_path = "battinfo_example_coin_cell.metadata.json"
with open(file_path, "r") as file:
    jsonld_data = json.load(file)

## Step 3: Initialize and Parse the RDF Graph
RDF (Resource Description Framework) is a standard for structuring linked data. Here, we create an RDF graph and populate it with data from the JSON-LD file.

In [37]:
g = Graph()
g.parse(data=json.dumps(jsonld_data), format="json-ld")

<Graph identifier=N3b5e2e035b5249eda8b0f86719a82031 (<class 'rdflib.graph.Graph'>)>

## Step 4: Query the Graph using SPARQL
SPARQL is a query language for retrieving information from RDF graphs. In this query, we retrieve:
- The coin cell identifier and its human-readable label
- The nominal voltage 
- The mass

In [38]:
query = f"""
PREFIX emmo: <https://w3id.org/emmo/domain/battery/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?cell ?cellLabel ?mass ?voltage WHERE {{
    ?cell a <{battinfo.CoinCell.iri}> ;
          rdfs:label ?cellLabel .
    OPTIONAL {{
        ?cell <{battinfo.hasProperty.iri}> ?massProperty .
        ?massProperty a <{battinfo.Mass.iri}> ;
                     <{battinfo.hasNumericalPart.iri}> ?massValue .
        ?massValue <{battinfo.hasNumberValue.iri}> ?mass .
    }}
    OPTIONAL {{
        ?cell <{battinfo.hasProperty.iri}> ?voltageProperty .
        ?voltageProperty a <{battinfo.NominalVoltage.iri}> ;
                        <{battinfo.hasNumericalPart.iri}> ?voltageValue .
        ?voltageValue <{battinfo.hasNumberValue.iri}> ?voltage .
    }}
}}
"""

## Step 5: Execute the SPARQL Query
This retrieves the relevant data from our RDF graph, ensuring that we get human-readable labels instead of cryptic UUIDs or blank nodes.

In [39]:
# Execute query
results = g.query(query)

## Step 6: Display Results
We process the query results and format them for easy readability.

In [40]:
output = []
for row in results:
    output.append(f"Label: {row.cellLabel}, "
                  f"Mass: {row.mass} g, Voltage: {row.voltage} V")

print("Coin Cell Metadata:")
for line in output:
    print(line)

Coin Cell Metadata:
Label: IFpR2032, Mass: 3 g, Voltage: 3.2 V


## Summary
This notebook demonstrated how to process and query JSON-LD data representing battery coin cell metadata. We loaded the data into an RDF graph, executed a SPARQL query to extract structured information, and retrieved human-readable labels for better interpretability. This approach highlights the power of semantic data models in making battery-related knowledge more accessible to both humans and machine agents.