# GraphRAG Python package - From PDF to Q&A with LUPUS example

In this notebook we will:

- Implement GraphRAG with vector and vector cypher retrievers


## Setup

Define our variables:
- Neo4j credentials

In [23]:
import os
from dotenv import load_dotenv

# load neo4j credentials (and openai api key in background)
load_dotenv('.env', override=True)
NEO4J_URI = os.getenv('NEO4J_URI', 'bolt://localhost:7687')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME', 'neo4j')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
NEO4J_DATABASE = os.getenv('NEO4J_DATABASE', None)

In [24]:
import neo4j

driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD), database=NEO4J_DATABASE)

## Knowledge Graph Retrieval

In this section, we investigate several supported retrieval methods, starting with the VectorRetriever which is a simple vector search. For this, we need to add a vector index on Chunks' embeddings property that was created by the SimpleKGBuilder pipeline:

In [66]:
VECTOR_INDEX_NAME = "text_embeddings"

In [25]:
from neo4j_graphrag.embeddings import OpenAIEmbeddings
embedder = OpenAIEmbeddings()

In [26]:
from neo4j_graphrag.retrievers import VectorRetriever

vector_retriever = VectorRetriever(
    driver,
    index_name=VECTOR_INDEX_NAME,
    embedder=embedder,
    return_properties=["text"],
)

vector_res = vector_retriever.search(
    query_text="How is precision medicine applied to Lupus?", 
    top_k=3,
)

In [68]:
len(vector_res.items)

3

In [28]:
for i in vector_res.items: 
    print("====\n" + i.content)

====
{'text': 'precise and systematic fashion as suggested here.\nFuture care will involve molecular diagnostics throughout\nthe patient timecourse to drive the least toxic combination\nof therapies. Recent evidence suggests a paradigm shift is\non the way but it is hard to predict how fast it will come.\nDisclosure\nThe authors report no con ﬂicts of interest in this work.\nReferences\n1. Lisnevskaia L, Murphy G, Isenberg DA. Systemic lupus\nerythematosus. Lancet .2014 ;384:1878 –1888. doi:10.1016/S0140-\n6736(14)60128'}
====
{'text': 'd IS agents.\nPrecision medicine consists of a tailored approach to\neach patient, based on genetic and epigenetic singularities,\nwhich in ﬂuence disease pathophysiology and drug\nresponse. Precision medicine in SLE is trying to address\nthe need to assess SLE patients optimally, predict disease\ncourse and treatment response at diagnosis. Ideally every\npatient would undergo an initial evaluation that would\nproﬁle his/her disease, assessing the main 

The GraphRAG Python Package offers a whole host of other useful retrieval covering different patterns.

Below we will use the VectorCypherRetriever which allows you to run a graph traversal after finding text chunks. We will use the Cypher Query language to define the logic to traverse the graph.

As a simple starting point, lets traverse up to 2 hops out from each chunk and textualize the different relationships we pick up. We will use something called a quantified path pattern to accomplish in this.


In [29]:
from neo4j_graphrag.retrievers import VectorCypherRetriever

vc_retriever = VectorCypherRetriever(
    driver,
    index_name="text_embeddings",
    embedder=embedder,
    retrieval_query="""
//1) Go out 2-3 hops in the entity graph and get relationships
WITH node AS chunk
MATCH (chunk)<-[:FROM_CHUNK]-()-[relList:!FROM_CHUNK]-{1,2}(:__Entity__)
UNWIND relList AS rel

//2) collect relationships and text chunks
WITH collect(DISTINCT chunk) AS chunks, 
  collect(DISTINCT rel) AS rels

//3) format and return context
RETURN '=== text ===\n' + apoc.text.join([c in chunks | c.text], '\n---\n') + '\n\n=== kg_rels ===\n' +
  apoc.text.join([r in rels | startNode(r).name + ' - ' + type(r) + '(' + coalesce(r.details, '') + ')' +  ' -> ' + endNode(r).name ], '\n---\n') AS info
"""
)

In [65]:
vc_res = vc_retriever.search(query_text = "How is precision medicine applied to Lupus?", top_k=3)

# print output
context = vc_res.items[0].content
kg_rel_pos = context.find('\\n\\n=== kg_rels ===\\n')
print("# Text Chunk Context:")
print(context[:kg_rel_pos][:500])
print()
print("# KG Context From Relationships:")
print(context[kg_rel_pos:][:500])

# Text Chunk Context:
<Record info="=== text ===\nprecise and systematic fashion as suggested here.\nFuture care will involve molecular diagnostics throughout\nthe patient timecourse to drive the least toxic combination\nof therapies. Recent evidence suggests a paradigm shift is\non the way but it is hard to predict how fast it will come.\nDisclosure\nThe authors report no con ﬂicts of interest in this work.\nReferences\n1. Lisnevskaia L, Murphy G, Isenberg DA. Systemic lupus\nerythematosus. Lancet .2014 ;384:1878 –1

# KG Context From Relationships:
\n\n=== kg_rels ===\nSystemic lupus erythematosus - AUTHORED(Published in) -> N. Engl. J. Med.\n---\nLisnevskaia L - AUTHORED() -> Systemic lupus erythematosus\n---\nMurphy G - AUTHORED() -> Systemic lupus erythematosus\n---\nIsenberg DA - AUTHORED() -> Systemic lupus erythematosus\n---\nSystemic lupus erythematosus - CITES(Published in) -> Lancet\n---\nSystemic lupus erythematosus - CITES(Systemic lupus erythematosus is discussed in the L

## Q&A with GraphRAG

You can construct GraphRAG pipelines with the GraphRAG class. At minimum, you will need to pass the constructor an LLM and a retriever. Optionally, you can also pass a custom prompt template.

In [35]:
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import RagTemplate
from neo4j_graphrag.generation.graphrag import GraphRAG

llm = OpenAILLM(model_name="gpt-4o",  model_params={"temperature": 0.0, "seed": 100})

rag_template = RagTemplate(template='''Answer the Question using the following Context. Only respond with information mentioned in the Context. Do not inject any speculative information not mentioned. 

# Question:
{query_text}
 
# Context:
{context}

# Answer:
''', expected_inputs=['query_text', 'context'])

v_rag  = GraphRAG(llm=llm, retriever=vector_retriever, prompt_template=rag_template)
vc_rag = GraphRAG(llm=llm, retriever=vc_retriever, prompt_template=rag_template)

In [70]:
q = "How is precision medicine applied to Lupus? provide in list format."
print(f"Vector Response: \n{v_rag.search(q, retriever_config={'top_k':5}).answer}")
print("\n===========================\n")
print(f"Vector + Cypher Response: \n{vc_rag.search(q, retriever_config={'top_k':5}).answer}")

Vector Response: 
- Precision medicine in lupus involves a tailored approach based on genetic and epigenetic singularities.
- It aims to assess lupus patients optimally and predict disease course and treatment response at diagnosis.
- Ideally, each patient would undergo an initial evaluation to profile their disease, assessing the main pathophysiologic pathway through biomarkers.


Vector + Cypher Response: 
- Precision medicine in lupus involves a tailored approach to each patient based on genetic and epigenetic singularities.
- It aims to assess lupus patients optimally, predict disease course, and treatment response at diagnosis.
- Ideally, every patient would undergo an initial evaluation that profiles their disease, assessing the main pathophysiologic pathway through biomarkers.


In [63]:
q = "What are the most frequent symptoms of lupus and why is it difficult to diagnose? Show result in a list"

v_rag_result = v_rag.search(q, retriever_config={'top_k': 2}, return_context=True)
vc_rag_result = vc_rag.search(q, retriever_config={'top_k': 2}, return_context=True)

print(f"Vector Response: \n{v_rag_result.answer}")
print("\n===========================\n")
print(f"Vector + Cypher Response: \n{vc_rag_result.answer}")

Vector Response: 
- Most frequent symptoms of lupus:
  - Generalized pain
  - Fatigue
  - Depression

- Difficulty in diagnosing lupus:
  - Symptoms like generalized pain, fatigue, and depression are often considered unrelated to SLE by physicians and may not be well addressed during clinical evaluations.
  - Lupus has many different expressions, making it more complex to discuss and diagnose compared to simpler conditions like a cold.


Vector + Cypher Response: 
- Most frequent symptoms of lupus:
  - Skin lesions
  - Renal symptoms
  - Dermatological symptoms
  - Neuropsychiatric symptoms
  - Cardiovascular symptoms
  - Thrombocytopenia
  - Haemolytic anaemia
  - Accelerated atherosclerosis
  - Active disease
  - Previous damage
  - Complications of therapy

- Reasons for difficulty in diagnosing lupus:
  - Symptoms like generalized pain, fatigue, and depression are often considered unrelated to SLE and not well addressed during clinical evaluation.
  - The presence of ANA-negative c

In [37]:
q = "Can you summarize systemic lupus erythematosus (SLE)? including common effects, biomarkers, and treatments? Provide in detailed list format."

v_rag_result = v_rag.search(q, retriever_config={'top_k': 5}, return_context=True)
vc_rag_result = vc_rag.search(q, retriever_config={'top_k': 5}, return_context=True)

print(f"Vector Response: \n{v_rag_result.answer}")
print("\n===========================\n")
print(f"Vector + Cypher Response: \n{vc_rag_result.answer}")

Vector Response: 
- **Systemic Lupus Erythematosus (SLE) Overview:**
  - SLE is a systemic autoimmune disease characterized by aberrant activity of the immune system.
  - It presents with a wide range of clinical manifestations and can cause damage to various organs.

- **Common Effects:**
  - SLE imposes a significant burden on patients' lives.
  - It affects health-related quality of life (HRQoL) due to its symptoms and disease activity.

- **Biomarkers:**
  - SLE is diagnosed and classified based on clinical symptoms, signs, and laboratory biomarkers.
  - Biomarkers reflect immune reactivity and inflammation in various organs.
  - Novel biomarkers have been discovered through "omics" research.
  - One particular biomarker may only reflect a specific aspect of SLE and not the overall state of the disease.

- **Treatments:**
  - Physicians focus on controlling disease activity to prevent damage accrual.
  - There is a gap between physicians' focus on disease control and patients' focu