# Using SPARQL with lamindb

SPARQL is a query language used to retrieve and manipulate data stored in Resource Description Framework (RDF) format.
In this tutorial, we demonstrate how Bionty ontologies can be queried with SPARQL.

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
!lamin load laminlabs/cellxgene

In [None]:
import json
import bionty as bt

from rdflib import Graph, Literal, RDF, URIRef, Namespace, OWL, RDFS

Generally, we need to build a directed RDF Graph composed of triple statements.
Such a  graph statement is represented by: 1) a node for the subject, 2) an arc that goes from a subject to an object for the predicate, and 3) a node for the object.
Each of the three parts can be identified by a URI.

To obtain the necessary information to build a graph from a lamindb registry, we can either use the registry DataFrame or a public registry Pronto object as an intermediate.

## DataFrame

In [None]:
diseases = bt.Disease.df()
diseases.head()

In [None]:
rdf_graph = Graph()

namespace = URIRef("http://sparql-example.org/")

# Convert DataFrame to RDF by generating triples
for _, row in diseases.iterrows():
    subject = URIRef(namespace + str(row['ontology_id']))
    rdf_graph.add((subject, RDF.type, URIRef(namespace + "Disease")))
    rdf_graph.add((subject, URIRef(namespace + "name"), Literal(row['name'])))
    rdf_graph.add((subject, URIRef(namespace + "description"), Literal(row['description'])))

query = """
SELECT ?name ?description
WHERE {
  ?disease a <http://sparql-example.org/Disease> .
  ?disease <http://sparql-example.org/name> ?name .
  ?disease <http://sparql-example.org/description> ?description .
}
LIMIT 5
"""

for row in rdf_graph.query(query):
    print(f"Name: {row.name}, Description: {row.description}")

## Pronto

In [None]:
# Currently only public supports `to_pronto`. Maybe pydantic helps with serializing all registry to json?
disease_pronto = bt.Disease.public().to_pronto()

with open("disease.json", "wb") as f:
    disease_pronto.dump(f, format="json")
    
with open("disease.json", "r") as f:
    disease_data = json.load(f)

In [None]:
rdf_graph = Graph()

namespace = Namespace("http://example.org/ontology/")
DEF = Namespace("http://example.org/ontology/definition/")

# Convert the JSON data into RDF by generating triplets
for graph in disease_data["graphs"]:
    for node in graph["nodes"]:
        term_uri = URIRef(node["id"])
        rdf_graph.add((term_uri, RDF.type, OWL.Class))

        if "lbl" in node:
            rdf_graph.add((term_uri, RDFS.label, Literal(node["lbl"])))

        if node.get("meta") and node["meta"].get("definition") and node["meta"]["definition"].get("val"):
            definition = node["meta"]["definition"]["val"]
            rdf_graph.add((term_uri, DEF.definition, Literal(definition)))

query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX def: <http://example.org/ontology/definition/>

SELECT ?label ?definition
WHERE {
  ?term rdf:type owl:Class .
  OPTIONAL { ?term rdfs:label ?label }
  OPTIONAL { ?term def:definition ?definition }
}
LIMIT 5
OFFSET 9999
"""

results = rdf_graph.query(query)

for row in results:
    label = row.label if row.label else "No label available"
    definition = row.definition if row.definition else "None"
    print(f"Name: {label}, Description: {definition}")