# SPARQL Graph RAG Demo


**This is only a minimal, special case, proof-of-concept.**

An OpenAI API key will be required to run this, as well as read/write access to a [SPARQL](https://en.wikipedia.org/wiki/SPARQL) store. For now at least such a store is available with the details as below.

The aim is to replicate the parts of [Wey Gu](https://siwei.io/en/)'s Jupyter [Notebook](https://www.siwei.io/en/demos/graph-rag/), where [LlamaIndex](https://www.llamaindex.ai/) uses a graph store. In that Notebook a NebulaGraph store is use, here a SPARQL store is used. 

The description here will only cover the SPARQL-specific details, for how the system as a whole works see the original [Notebook](https://www.siwei.io/en/demos/graph-rag/) and the [LlamaIndex Documentation](https://gpt-index.readthedocs.io/en/latest/).


### Preparation 

*in general, right now you should only need an API key*

* pip install sparqlwrapper
* Make a SPARQL endpoint available, add URL below
* (make sure endpoint supports UPDATE, /llama_index_sparql-test/)
* For clean start DROP GRAPH <http://purl.org/stuff/guardians>

* **Add OpenAI API key below**

#### 1. Imports, LLM Configuration

In [None]:
from llama_index import download_loader
import os
import logging
from llama_index import (
    KnowledgeGraphIndex,
    ServiceContext,
)

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import SparqlGraphStore
from llama_index.llms import OpenAI
from IPython.display import Markdown, display
from llama_index import load_index_from_storage
import os
import openai

logging.basicConfig(filename='loggy.log', filemode='w', level=logging.DEBUG)
logger = logging.getLogger(__name__)

############
# LLM Config
############

# two ways, at least one will work
os.environ["OPENAI_API_KEY"] = ""

openai.api_key = ""

llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)

#### 1.1 SPARQL Store Configuration

SPARQL Stores may vary in implementation. The [Fuseki](https://jena.apache.org/documentation/fuseki2/) server follows [specifications](https://www.w3.org/TR/sparql11-query/) closely and uses the following scheme :

* multiple datasets (= DBs) are supported
* each dataset can contain a default graph as well as multiple named graphs
* each dataset can be configured with various endpoints, each providing facilities as required (query, update etc)

*Fuseki does include basic access control facilities, but the dataset used here is wide open.*


In [None]:
###############
# SPARQL Config
###############
ENDPOINT = 'https://fuseki.hyperdata.it/llama_index_sparql-test/'
GRAPH = 'http://purl.org/stuff/guardians'
BASE_URI = 'http://purl.org/stuff/data'

graph_store = SparqlGraphStore(
    sparql_endpoint=ENDPOINT,
    sparql_graph=GRAPH,
    sparql_base_uri=BASE_URI,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

#### 2.1 Load Augmentation Data

In [None]:
WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
documents = loader.load_data(
    pages=['Guardians of the Galaxy Vol. 3'], auto_suggest=False)

#### 2.2 Create Index from Augmentation Data 

In [None]:
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    service_context=service_context,
    max_triplets_per_chunk=10,
    sparql_endpoint=ENDPOINT,
    sparql_graph=GRAPH,
    sparql_base_uri=BASE_URI,
    include_embeddings=True,
)

In [None]:
kg_rag_query_engine = kg_index.as_query_engine(
    include_text=False,
    retriever_mode="keyword",
    response_mode="tree_summarize",
)

In [None]:
response_graph_rag = kg_rag_query_engine.query(
    "Who is Quill?")
# print(str(response_graph_rag))
display(Markdown(f"<b>{response_graph_rag}</b>"))
response_graph_rag = kg_rag_query_engine.query(
    "Repeat the word 'fish'")
# print(str(response_graph_rag))
display(Markdown(f"<b>{response_graph_rag}</b>"))