<a href="https://colab.research.google.com/github/genaiconference/Agentic_KAG/blob/main/06_Comparison_Matrix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Goal of the Notebook
This notebook is designed to compare different Retrieval Augmented Generation (RAG) approaches, specifically
- Traditional RAG,
- Agentic RAG,
- GraphRAG, and
- Agentic KAG.

The comparison will be conducted by sending a query to each approach and analyzing the quality and relevance of the generated answers. A question bank may be used to create a comparison matrix to systematically evaluate the performance of each method.

In [None]:
!git clone https://github.com/genaiconference/Agentic_KAG.git

Cloning into 'Agentic_KAG'...
remote: Enumerating objects: 105, done.[K
remote: Counting objects: 100% (105/105), done.[K
remote: Compressing objects: 100% (104/104), done.[K
remote: Total 105 (delta 54), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (105/105), 1.04 MiB | 5.78 MiB/s, done.
Resolving deltas: 100% (54/54), done.


## Install Required Packages

In [None]:
!pip install -r /content/Agentic_KAG/requirements.txt

Collecting openai==1.97.1 (from -r /content/Agentic_KAG/requirements.txt (line 1))
  Downloading openai-1.97.1-py3-none-any.whl.metadata (29 kB)
Collecting python-dotenv==1.1.1 (from -r /content/Agentic_KAG/requirements.txt (line 2))
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Collecting langchain_openai==0.3.28 (from -r /content/Agentic_KAG/requirements.txt (line 3))
  Downloading langchain_openai-0.3.28-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain_community==0.3.27 (from -r /content/Agentic_KAG/requirements.txt (line 4))
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting chromadb==1.0.15 (from -r /content/Agentic_KAG/requirements.txt (line 6))
  Downloading chromadb-1.0.15-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting neo4j==5.28.2 (from -r /content/Agentic_KAG/requirements.txt (line 7))
  Downloading neo4j-5.28.2-py3-none-any.whl.metadata (5.9 kB)
Collecting neo4j-graphrag=

## Load credentials from .env

In [None]:
import os

os.chdir("/content/Agentic_KAG")

from dotenv import load_dotenv

load_dotenv()  # This loads .env at project root

NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

# Set OPENAI_API_KEY as env variable for openai/neo4j-graphrag compatibility
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

## Initialize the Neo4j Driver
The Neo4j driver allows you to connect and perform read and write transactions with the database.

In [None]:
from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    NEO4J_URI,
    auth=(NEO4J_USERNAME, NEO4J_PASSWORD)
)

# (Optional) Test the connection
driver.verify_connectivity()

## Initialize OpenAI LLM and Embeddings via neo4j-graphrag
We will use OpenAI **GPT-4.1**. The GraphRAG Python package supports any LLM model, including models from OpenAI, Google VertexAI, Anthropic, Cohere, Azure OpenAI, local Ollama models, and any chat model that works with LangChain. You can also implement a custom interface for any other LLM.

Likewise, we will use OpenAI’s **text-embedding-3-small** for the embedding model, but you can use other embedders from different providers.

In [None]:
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings

neo4j_llm = OpenAILLM(
    model_name="gpt-4.1",
    model_params={"temperature": 0}
)

embedder = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

## Initialize OpenAI LLM and Embeddings via langchain_openai
We will use OpenAI GPT-4.1 and text-embedding-3-small for the embedding model.

In [None]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# Initialize OpenAI LLM using LangChain
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY,
                 model_name="gpt-4.1",
                 temperature=0)

# Initialize OpenAI Embedding model using LangChain
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY,
                              model="text-embedding-3-small")

## Initialize Langfuse Handler for Tracing

In [None]:
from langfuse.langchain import CallbackHandler
from langfuse import get_client

os.environ["LANGFUSE_PUBLIC_KEY"] = os.getenv("LANGFUSE_PUBLIC_KEY")
os.environ["LANGFUSE_SECRET_KEY"] = os.getenv("LANGFUSE_SECRET_KEY")
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"

langfuse = get_client()

# Verify connection
if langfuse.auth_check():
    print("Langfuse client is authenticated and ready!")
else:
    print("Authentication failed. Please check your credentials and host.")

langfuse_handler = CallbackHandler()

## Load data
Load the data from the page_data.pkl file.

In [None]:
import pickle

with open("page_data.pkl", "rb") as f:
    pages = pickle.load(f)

## Traditional RAG

Uses both BM25 (sparse) and vector (dense) retrieval.

Chain combines retrieved docs and generates answer via LLM.

In [None]:
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.documents import Document
from urllib.parse import urlparse
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from IPython.display import display, Markdown


documents = []
for page in pages:
    doc = Document(page_content=page['content'], metadata={'url': page['url']})
    documents.append(doc)

print(f"Created {len(documents)} LangChain documents.")

def extract_section_from_url(url: str) -> str:
    path = urlparse(url).path
    parts = [p for p in path.strip("/").split("/") if p]

    if "datahacksummit-2025" in parts:
        parts.remove("datahacksummit-2025")

    if parts:
        return parts[0].replace("-", " ").title()
    else:
        return "Overview"

for item in documents:
  section = extract_section_from_url(item.metadata["url"])
  item.metadata['section'] = section

persist_directory = os.getcwd() +'/vectorstore/chroma/'

# Create the vector store
vectordb = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
    persist_directory=persist_directory
)

print(vectordb._collection.count())

# Setup
similarity_search_retriever = vectordb.as_retriever(search_type="similarity", search_kwargs={"k": 5})
bm25_retriever = BM25Retriever.from_documents(documents=documents, k=5)
ensemble_retriever = EnsembleRetriever(retrievers=[similarity_search_retriever, bm25_retriever], weights=[0.5, 0.5])


template = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.
Avoid using generic phrases like "Provide context" or "as per context.

Question: {input}

Context: {context}

Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

# Combine docs and chain
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(ensemble_retriever, combine_docs_chain)

def run_traditional_rag(question):
    response = rag_chain.invoke({"input": question}, config={"callbacks": [langfuse_handler]})
    return response["answer"]

Created 81 LangChain documents.
243


## Agentic RAG

An agent tool uses the ensemble retriever, wrapped via a ReAct agent.

Can plan, retry, and reason through sub-steps.

In [None]:
from langchain.agents import create_react_agent, AgentExecutor
import prompts
from langchain.tools import Tool
from langchain.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    HumanMessagePromptTemplate,
    AIMessagePromptTemplate,
    PromptTemplate,
)

# Define the retriever as a tool
retriever_tool = Tool(
    name="AV_Agentic_RAG_tool",
    description="Useful for retrieving relevant documents based on input questions.",
    func=lambda query: ensemble_retriever.invoke(query),
)

tools = [retriever_tool]

# Get the ReAct prompt
prompt = prompts.REACT_PROMPT


# Create the ReAct agent
def get_react_agent(llm, tools, system_prompt, verbose=False):
    """Helper function for creating agent executor"""
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            MessagesPlaceholder(variable_name="conversation_history", optional=True),
            HumanMessagePromptTemplate(
                prompt=PromptTemplate(input_variables=["input"], template="{input}")
            ),
            AIMessagePromptTemplate(
                prompt=PromptTemplate(
                    input_variables=["agent_scratchpad"], template="{agent_scratchpad}"
                )
            ),
        ]
    )
    agent = create_react_agent(llm, tools, prompt)
    return AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=verbose,
        stream_runnable=True,
        handle_parsing_errors=True,
        max_iterations=5,
        return_intermediate_steps=True,
    )

generate_agent = get_react_agent(
        llm,
        tools,
        prompt,
        verbose=False,
    )


def run_agentic_rag(question):
    answer = generate_agent.invoke({"input": question}, config={"callbacks": [langfuse_handler]})
    return answer["output"]


## GraphRAG

In [None]:
from neo4j_graphrag.retrievers import Text2CypherRetriever
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.generation import GraphRAG
import prompts
from examples import examples
from neo4j_graphrag.schema import get_schema

# Initialize the Text2Cypher retriever
t2c_retriever = Text2CypherRetriever(
    llm=neo4j_llm,
    neo4j_schema=get_schema(driver),
    driver=driver,
    custom_prompt=prompts.custom_text2cypher_prompt,
    examples=examples
)


def run_graphrag(question):
  response = t2c_retriever.search(query_text=question)
  cypher_query = response.metadata['cypher']
  print(cypher_query)
  hc_retriever = HybridCypherRetriever(
    driver=driver,
    vector_index_name="entity_vector_index",
    fulltext_index_name="entity_fulltext_index",
    retrieval_query=cypher_query,
    embedder=embedder,
)
  rag = GraphRAG(retriever=hc_retriever, llm=neo4j_llm)

  # Query the graph
  response = rag.search(query_text=question,
                        retriever_config={"top_k": 10},
                        return_context=True,
                        response_fallback="I can not answer this question because I have no relevant context.",
                        config={"callbacks": [langfuse_handler]}
                        )
  return response.answer

## Agentic KAG

In [None]:
import tools
from IPython.display import Markdown
import json
import agent_utils

def run_agentic_KAG(query):
    """
    Runs the answer generation flow: Text2Cypher -> Hybrid Retrieval -> Final Answer.
    """
    # Run Hybrid Retrieval using all the available tools
    hybrid_agent = agent_utils.get_react_agent(
        llm,
        [tools.av_hybrid_tool, tools.global_retriever_tool, tools.local_retriever_tool],
        prompts.REACT_PROMPT,
        verbose=False
    )
    hybrid_input = json.dumps({
        "query": query,
    })

    print(f"Running Hybrid Retrieval with input: {hybrid_input}")
    answer = hybrid_agent.invoke({"input": hybrid_input}, config={"callbacks": [langfuse_handler]})
    final_answer = answer['output']
    return final_answer

Got 14 community summaries


## Compare the approaches

Create a comparison of the four approaches (Traditional RAG, Agentic RAG, GraphRAG, Agentic KAG)


In [None]:
def compare_rag_answers(question: str) -> dict:
    return {
        "question": question,
        "traditional_rag": run_traditional_rag(question),
        "agentic_rag": run_agentic_rag(question),
        #"graphrag": run_graphrag(question),
        "agentic_kag": run_agentic_KAG(question),
    }

def show_rag_answers(result: dict):
    """
    Pretty-prints the results from each RAG pipeline in a structured Markdown format.
    """
    sections = [
        ("❓ Question", result["question"]),
        ("🔵 Traditional RAG", result["traditional_rag"]),
        ("🟢 Agentic RAG", result["agentic_rag"]),
        #("🟣 GraphRAG", result["graphrag"]),
        ("🟠 Agentic KAG", result["agentic_kag"]),
    ]

    for title, content in sections:
        display(Markdown(f"## {title}\n\n{content}"))
        display(Markdown("---"))

In [None]:
question_bank = [#"List down all the sessions or workshops which uses LangGraph?",
                 #"when is Arun delivering a session",
                 #"How many sessions are happening? Give me a breakdown by Session Type"
                 #"give me all the sessions that happen between 12-1 on day 1 in which auditorium should I attend Arun's session?",
                 #"I like only sessions related to knowledge graphs tell me when and where and who is delivering such sessions",
                 #"how many tea breaks do we have and when are they ?",
                 #"I am hungry now when do they serve lunch?"
                 #"I am a big lover of ai agents and I am interested in putting things in production. Suggest me a tailor made agenda, mention the name of the Session or workshop along with Instructor name?"
                 "I am a big lover of knowledge graphs (not Langgraphs). Suggest me a tailor made agenda, mention the name of the Session or workshop along with Instructor name. Use Global search?"
 ]

for q in question_bank:
    result = compare_rag_answers(q)
    show_rag_answers(result)

Running Hybrid Retrieval with input: {"query": "I am a big lover of knowledge graphs (not Langgraphs). Suggest me a tailor made agenda, mention the name of the Session or workshop along with Instructor name. Use Global search?"}


Processing communities:   0%|          | 0/14 [00:00<?, ?it/s]

## ❓ Question

I am a big lover of knowledge graphs (not Langgraphs). Suggest me a tailor made agenda, mention the name of the Session or workshop along with Instructor name. Use Global search?

---

## 🔵 Traditional RAG

For a knowledge graph enthusiast, the ideal agenda is:

**Session:** Agentic Knowledge Augmented Generation: The Next Leap After RAG  
**Instructor:** Arun Prakash Asokan  
This session focuses on building Knowledge Graphs from unstructured data, using graph databases, and designing autonomous AI agents to reason over these graphs—perfect for deepening your expertise in knowledge graphs.

---

## 🟢 Agentic RAG

**Tailor-Made Agenda for Knowledge Graph Enthusiasts**

---

**1. Session: Agentic Knowledge Augmented Generation: The Next Leap After RAG**  
- **Instructor:** Arun Prakash Asokan (Associate Director Data Science)  
- **Description:**  
  - Dive into building Knowledge Graphs from unstructured data.
  - Learn to use Graph Databases to organize and connect information meaningfully.
  - Design autonomous AI agents to navigate and reason over these graphs.
  - Focus on supercharging LLM applications with agents and knowledge graphs.
- **More Info:** [Session Details](https://www.analyticsvidhya.com/datahacksummit-2025/sessions/agentic-knowledge-augmented-generation-the-next-leap-after-rag)

---

**2. Workshop: Agentic RAG Workshop: From Fundamentals to Real-World Implementations**  
- **Instructor:** Arun Prakash Asokan (Associate Director Data Science)  
- **Description:**  
  - Hands-on workshop covering Agentic RAG (Retrieval-Augmented Generation) and its evolution.
  - Includes modules on building smarter AI applications, with a focus on integrating knowledge graphs and agentic behavior.
  - Practice with Google Colab notebooks and frameworks like LangGraph and LangChain.
- **More Info:** [Workshop Details](https://www.analyticsvidhya.com/datahacksummit-2025/workshops/agentic-rag-workshop-from-fundamentals-to-real-world-implemenno-title)

---

**Summary Table**

| Session/Workshop Name                                                                 | Instructor                | Focus Area                                  | Link                                                                                   |
|--------------------------------------------------------------------------------------|---------------------------|----------------------------------------------|----------------------------------------------------------------------------------------|
| Agentic Knowledge Augmented Generation: The Next Leap After RAG                      | Arun Prakash Asokan       | Building and reasoning over Knowledge Graphs | [Details](https://www.analyticsvidhya.com/datahacksummit-2025/sessions/agentic-knowledge-augmented-generation-the-next-leap-after-rag) |
| Agentic RAG Workshop: From Fundamentals to Real-World Implementations                | Arun Prakash Asokan       | Agentic RAG, Knowledge Graph Integration     | [Details](https://www.analyticsvidhya.com/datahacksummit-2025/workshops/agentic-rag-workshop-from-fundamentals-to-real-world-implemenno-title) |

---

- These sessions are highly relevant for anyone passionate about knowledge graphs, focusing on both the theory and hands-on implementation in modern AI systems.
- Both are led by Arun Prakash Asokan, a recognized expert in AI and knowledge graph applications.

**For a knowledge graph lover, these sessions will provide deep technical insights and practical skills!**

---

## 🟠 Agentic KAG

**Tailor-Made Agenda for Knowledge Graph Enthusiasts**

---

**1. Core Knowledge Graph Sessions & Workshops**

- **Session/Workshop:** Knowledge Graphs in Agentic AI Systems  
  - **Instructors:** Kartik Nighania, Anuj Saini, Miguel Otero Pedrido  
  - **Focus:** Integration of knowledge graphs within agentic AI frameworks, practical implementations, and real-world use cases  
  - **Notes:** These sessions are part of the broader workshop ecosystem and are ideal for those interested in technical depth and applications of knowledge graphs [Data: Reports].

- **Session/Workshop:** Advanced Applications of Knowledge Graphs in LLMs  
  - **Instructors:** Raghav Bali, Luis Serrano, Avinash Pathak, Saurav Agarwal  
  - **Focus:** Leveraging knowledge graphs for enhancing large language models, semantic search, and reasoning  
  - **Notes:** These sessions cover the intersection of knowledge graphs and modern AI, with hands-on demonstrations [Data: Reports].

- **Session/Workshop:** Building Agentic Systems with Knowledge Graphs  
  - **Instructors:** Arun Prakash Asokan, Sanathraj Narayan, Dipanjan Sarkar, Mayank Aggarwal, Praneeth Paikray  
  - **Focus:** End-to-end workflows for constructing agentic systems using knowledge graphs, including data ingestion, querying, and deployment  
  - **Notes:** Practical, tool-driven sessions with a focus on responsible AI and real-world deployment [Data: Reports].

---

**2. Related Technical Sessions**

- **Session:** Red Teaming GenAI: Securing Systems from the Inside Out  
  - **Instructors:** Satnam Singh, Shivaraj Mulimani  
  - **Focus:** While not exclusively about knowledge graphs, this session covers security and robustness in AI systems, which often leverage knowledge graphs for threat modeling and analysis [Data: Reports].

---

**3. Suggested Agenda Structure**

| Time Slot         | Session/Workshop Name                                   | Instructor(s)                                      |
|-------------------|--------------------------------------------------------|----------------------------------------------------|
| 09:00 - 10:30 AM  | Knowledge Graphs in Agentic AI Systems                 | Kartik Nighania, Anuj Saini, Miguel Otero Pedrido  |
| 11:00 - 12:30 PM  | Advanced Applications of Knowledge Graphs in LLMs       | Raghav Bali, Luis Serrano, Avinash Pathak, Saurav Agarwal |
| 01:30 - 03:00 PM  | Building Agentic Systems with Knowledge Graphs          | Arun Prakash Asokan, Sanathraj Narayan, Dipanjan Sarkar, Mayank Aggarwal, Praneeth Paikray |
| 03:30 - 05:00 PM  | Red Teaming GenAI: Securing Systems from the Inside Out | Satnam Singh, Shivaraj Mulimani                    |

---

**4. Notes**

- The sessions above are curated for maximum exposure to knowledge graph concepts, tools, and applications.
- While some sessions are part of broader agentic AI or LLM tracks, they include substantial knowledge graph content and hands-on components [Data: Reports].

---

**Summary**

- For a knowledge graph enthusiast, the above agenda ensures deep dives into technical, practical, and security aspects of knowledge graphs, led by top instructors in the field [Data: Reports].

---

In [None]:
t2c_retriever = Text2CypherRetriever(
    llm=neo4j_llm,
    neo4j_schema=get_schema(driver),
    driver=driver,
    custom_prompt=prompts.custom_text2cypher_prompt,
    examples=examples
)

response = t2c_retriever.search(query_text="when is Arun delivering a session")

In [None]:
response

RetrieverResult(items=[RetrieverResultItem(content="<Record session_title='Agentic Knowledge Augmented Generation: The Next Leap After RAG' start_time='12:00' date='2025-08-20' venue='ELON'>", metadata=None)], metadata={'cypher': "MATCH (p:Person)-[:PRESENTS]->(s:Session)\nWHERE toLower(p.name) CONTAINS toLower('Arun')\nRETURN DISTINCT s.title AS session_title, s.start_time AS start_time, s.date AS date, s.venue AS venue", '__retriever': 'Text2CypherRetriever'})