# SPARQL Graph RAG Demo


An OpenAI API key will be required to run this, as well as read/write access to a [SPARQL](https://en.wikipedia.org/wiki/SPARQL) store. Such a store is available with the details as below.

This replicates the parts of [Wey Gu](https://siwei.io/en/)'s [Jupyter Notebook](https://www.siwei.io/en/demos/graph-rag/), where [LlamaIndex](https://www.llamaindex.ai/) uses a graph store. In that Notebook a NebulaGraph store is use, here a SPARQL store is used. This is only a proof-of-concept. 

The current version of `sparql.py` involves a simple mapping of the *subject-relation-object* text snippets into an RDF model and from there SPARQL queries. There are clear issues in terms of performance etc. with the way it currently works which need fixing. But the steps from there to using Linked Data from the Web at large are straightforward. (Later, exploiting structured data/content from HTML pages relevance-targeted by the system shouldn't be difficult using existing libs).    

For how the Graph RAG system as a whole works see the original [Notebook](https://www.siwei.io/en/demos/graph-rag/) and the [LlamaIndex Documentation](https://gpt-index.readthedocs.io/en/latest/).


### Preparation 

* install llama_index, openai and sparqlwrapper
* make a SPARQL endpoint available, add URL below  UPDATE, /llama_index_sparql-test/
* For clean start DROP GRAPH <http://purl.org/stuff/guardians>
* Add OpenAI API key below

#### 1. Imports, LLM Configuration

In [1]:
from llama_index import download_loader
import os
import logging
from llama_index import (
    KnowledgeGraphIndex,
    ServiceContext,
)

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import SparqlGraphStore
from llama_index.llms import OpenAI
from IPython.display import Markdown, display
from llama_index import load_index_from_storage
import os
import openai

############
# LLM Config
############

# two ways, at least one will work
os.environ["OPENAI_API_KEY"] = ""

openai.api_key = ""

llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)

ImportError: cannot import name 'download_loader' from 'llama_index' (unknown location)

#### 1.1 SPARQL Store Configuration

SPARQL Stores may vary in implementation. The [Fuseki](https://jena.apache.org/documentation/fuseki2/) server follows [specifications](https://www.w3.org/TR/sparql11-query/) closely and uses the following scheme :

* multiple datasets (= DBs) are supported
* each dataset can contain a default graph as well as multiple named graphs
* each dataset can be configured with various endpoints, each providing facilities as required (query, update etc)

*Fuseki does include basic access control facilities, but the dataset used here is wide open.*


In [None]:
###############
# SPARQL Config
###############
ENDPOINT = 'https://fuseki.hyperdata.it/llama_index_sparql-test/'
GRAPH = 'http://purl.org/stuff/guardians'
BASE_URI = 'http://purl.org/stuff/data'

graph_store = SparqlGraphStore(
    sparql_endpoint=ENDPOINT,
    sparql_graph=GRAPH,
    sparql_base_uri=BASE_URI,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

#### 2.1 Load Augmentation Data

In [None]:
WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
documents = loader.load_data(
    pages=['Guardians of the Galaxy Vol. 3'], auto_suggest=False)

#### 2.2 Create Index from Augmentation Data 

In [None]:
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    service_context=service_context,
    max_triplets_per_chunk=10,
    sparql_endpoint=ENDPOINT,
    sparql_graph=GRAPH,
    sparql_base_uri=BASE_URI,
    include_embeddings=True,
)

In [None]:
kg_rag_query_engine = kg_index.as_query_engine(
    include_text=False,
    retriever_mode="keyword",
    response_mode="tree_summarize",
)

In [None]:
response_graph_rag = kg_rag_query_engine.query(
    "Who is Quill?")
display(Markdown(f"<b>{response_graph_rag}</b>"))