# SPARQL Query Interface with grlc

This notebook demonstrates how to work with [grlc](https://grlc.io/) (GRLC - Generate SPARQL queries from a GitHub repository), a tool that automatically builds a REST API from SPARQL query files (`.rq` files). It allows you to:

- Define reusable SPARQL queries
- Automatically generate OpenAPI specifications
- Dispatch and execute queries programmatically with parameters
- Integrate with external data sources

You can try out the queries on https://grlc.io/api/haddocking/protein-quest/src/protein_quest/sparql .

The rest of the notebooks shows how you can do this locally.

## Installation

You need to install the grlc extra with:

```bash
pip install protein-quest[grlc]
```

Or in development mode:

```bash
uv sync --extra grlc --group docs
```


## Configure grlc to use local directory with SPARQL files

There are SPQRQL files in `src/protein_quest/sparql/` directory.

Normally you would add that directory to a config.ini, but I do not want to pollute the root directory any further.
So I trick grlc to look in directory at runtime.

In [None]:
from pathlib import Path

import grlc.static

sparql_dir = Path("../../src/protein_quest/sparql/").resolve()
grlc.static.LOCAL_SPARQL_DIR = str(sparql_dir)
grlc.static.LOCAL_SPARQL_DIR

'/home/verhoes/git/protein-detective/protein-quest/src/protein_quest/sparql'

## Generate OpenAPI Specification

Build OpenAPI spec from available `.rq` files and poke around.

In [37]:
from grlc.swagger import build_spec

In [38]:
spec, warning = build_spec(user=None, repo=None)
warning

[]

Available queries are:

In [39]:
[(p["call_name"]) for p in spec]

['/search_pdb',
 '/search_complexes',
 '/search_alphafold',
 '/get_uniprot_details',
 '/search_emdb',
 '/search_uniprot']

Lets find the input parameters and SPARQL query for the first path.

In [None]:
spec[0]["params"]

[{'name': 'endpoint',
  'type': 'string',
  'in': 'query',
  'description': 'Alternative endpoint for SPARQL query',
  'default': 'https://sparql.uniprot.org/sparql'},
 {'name': 'uniprot',
  'type': 'string',
  'required': True,
  'in': 'query',
  'description': 'A value of type string@acc that will substitute ?_uniprot_acc in the original query'}]

In [None]:
print(spec[0]["query"])

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?pdb_db ?pdb_method ?pdb_resolution (GROUP_CONCAT(DISTINCT ?pdb_chain; separator=",") AS ?pdb_chains)
WHERE {
  BIND (IRI(CONCAT("http://purl.uniprot.org/uniprot/",?_uniprot_acc)) AS ?protein)
  ?protein a up:Protein .
  ?protein rdfs:seeAlso ?pdb_db .
  ?pdb_db up:database <http://purl.uniprot.org/database/PDB> .
  ?pdb_db up:method ?pdb_method .
  ?pdb_db up:chainSequenceMapping ?chainSequenceMapping .
  BIND(STRAFTER(STR(?chainSequenceMapping), "isoforms/") AS ?isoformPart)
  FILTER(STRSTARTS(?isoformPart, CONCAT(?_uniprot_acc, "-")))
  ?chainSequenceMapping up:chain ?pdb_chain .
  OPTIONAL { ?pdb_db up:resolution ?pdb_resolution . }
}
GROUP BY ?pdb_db ?pdb_method ?pdb_resolution
LIMIT 10000



## Execute SPARQL Query Programmatically

Dispatch and run a query with parameters

We'll execute the `search_pdb` query with UniProt accession **P05067** (Amyloid beta A4 protein) on Uniprot SPARQL endpoint. This query searches for PDB (Protein Data Bank) structures that contain this protein, returning information about the structure method, resolution, and chain mappings.

In [41]:
import json

from grlc.utils import dispatch_query

In [42]:
resp, code, headers = dispatch_query(user=None, repo=None, query_name="search_pdb", requestArgs={"uniprot": "P05067"})
code, headers

(200,
 {'Content-Type': 'application/sparql-results+json', 'Server': 'grlc/1.3.10'})

By default the response is a JSON string, we need to parse.

In [43]:
assert isinstance(resp, str)
raw = json.loads(resp)
data = raw["results"]["bindings"]
print(len(data))
data[0]

221


{'pdb_db': {'type': 'uri', 'value': 'http://rdf.wwpdb.org/pdb/1AAP'},
 'pdb_method': {'type': 'uri',
  'value': 'http://purl.uniprot.org/core/X-Ray_Crystallography'},
 'pdb_resolution': {'datatype': 'http://www.w3.org/2001/XMLSchema#decimal',
  'type': 'literal',
  'value': '1.5'},
 'pdb_chains': {'type': 'literal', 'value': 'A/B=287-344'}}

## Run grlc server locally

In terminal, run:
```bash
cd src/protein_quest/sparql
grlc-server
```

In web browser go to http://0.0.0.0:8088/api-local/ where you can try out the queries.

Then you can run the server from Python with

In [52]:
import urllib.request

request = urllib.request.Request(
    "http://0.0.0.0:8088/api-local/search_pdb?uniprot=P05067",
    headers={"Accept": "text/csv"}
)
with urllib.request.urlopen(request) as resp:
    data = resp.read().decode(":utf-8")
    print(data)


pdb_db, pdb_method, pdb_resolution, pdb_chains
http://rdf.wwpdb.org/pdb/1AAP,http://purl.uniprot.org/core/X-Ray_Crystallography,"1.5"^^<http://www.w3.org/2001/XMLSchema#decimal>,"A/B=287-344"
http://rdf.wwpdb.org/pdb/1AMB,http://purl.uniprot.org/core/NMR_Spectroscopy,,"A=672-699"
http://rdf.wwpdb.org/pdb/1AMC,http://purl.uniprot.org/core/NMR_Spectroscopy,,"A=672-699"
http://rdf.wwpdb.org/pdb/1AML,http://purl.uniprot.org/core/NMR_Spectroscopy,,"A=672-711"
http://rdf.wwpdb.org/pdb/1BA4,http://purl.uniprot.org/core/NMR_Spectroscopy,,"A=672-711"
http://rdf.wwpdb.org/pdb/1BA6,http://purl.uniprot.org/core/NMR_Spectroscopy,,"A=672-711"
http://rdf.wwpdb.org/pdb/1BJB,http://purl.uniprot.org/core/NMR_Spectroscopy,,"A=672-699"
http://rdf.wwpdb.org/pdb/1BJC,http://purl.uniprot.org/core/NMR_Spectroscopy,,"A=672-699"
http://rdf.wwpdb.org/pdb/1BRC,http://purl.uniprot.org/core/X-Ray_Crystallography,"2.5"^^<http://www.w3.org/2001/XMLSchema#decimal>,"I=287-342"
http://rdf.wwpdb.org/pdb/1CA0,http://purl.