<a href="https://colab.research.google.com/github/genaiconference/Agentic_RAG_Workshop/blob/main/05_Agentic_RAG_Components.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Traditional RAG

In [1]:
!git clone https://github.com/genaiconference/Agentic_RAG_Workshop.git

Cloning into 'Agentic_RAG_Workshop'...
remote: Enumerating objects: 83, done.[K
remote: Counting objects: 100% (83/83), done.[K
remote: Compressing objects: 100% (80/80), done.[K
remote: Total 83 (delta 36), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (83/83), 2.25 MiB | 7.25 MiB/s, done.
Resolving deltas: 100% (36/36), done.


## Setup and Installations
Install necessary libraries for document processing, data handling, and interacting with Azure Document Intelligence and OpenAI.

In [1]:
!pip install -r /content/Agentic_RAG_Workshop/requirements.txt

Collecting langchain_tavily (from -r /content/Agentic_RAG_Workshop/requirements.txt (line 14))
  Downloading langchain_tavily-0.2.11-py3-none-any.whl.metadata (22 kB)
Downloading langchain_tavily-0.2.11-py3-none-any.whl (26 kB)
Installing collected packages: langchain_tavily
Successfully installed langchain_tavily-0.2.11


## Load Environment Variables and Initialize Clients
Load environment variables containing API keys and endpoint information, and initialize the Azure Document Intelligence and OpenAI clients.

In [None]:
import os

os.chdir("/content/Agentic_RAG_Workshop/")

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

load_dotenv()

llm = ChatOpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4.1",
    temperature=0,
)

embeddings = OpenAIEmbeddings(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="text-embedding-3-small"
)

## Load Parent Docs

In [None]:
import pickle

with open("parent_docs_and_ids.pkl", "rb") as f:
    parent_data = pickle.load(f)

parent_docs = parent_data["parent_docs"]
doc_ids = parent_data["doc_ids"]

## Load Child docs

In [None]:
from multivector_utils import create_child_documents, generate_summaries, generate_hypothetical_questions

id_key = "doc_id"

child_docs = create_child_documents(parent_docs, doc_ids, id_key)
summaries = generate_summaries(parent_docs, llm, id_key, doc_ids)
questions = generate_hypothetical_questions(parent_docs, id_key, doc_ids)

## Define VectorStore

**Azure AI Search**

In [None]:
from langchain_community.vectorstores.azuresearch import AzureSearch

index_name: str = "agentic-rag-workshop"
search_type: str = "semantic_hybrid"

vectorstore: AzureSearch = AzureSearch(azure_search_endpoint=os.getenv("AZURE_SEARCH_ENDPOINT"),
                                            azure_search_key=os.getenv("AZURE_SEARCH_KEY"),
                                            index_name=index_name,
                                            search_type=search_type,
                                            semantic_configuration_name="agentic-rag-semantic-config",
                                            embedding_function=embeddings.embed_query)

**Chroma**

In [None]:
from langchain.vectorstores import Chroma

vectorstore = Chroma(
    collection_name="full_documents", embedding_function=embeddings
)

# Add child_docs, sumamries, questions to the vectorstore
vectorstore.add_documents(documents=child_docs)
vectorstore.add_documents(documents=summaries)
vectorstore.add_documents(documents=questions)

  vectorstore = Chroma(


['1ec7aed4-f414-4a39-a606-79ff30045f95',
 '7a3f5746-94d0-4291-94d9-dbd6be3b03b1',
 '372792d4-c5cf-4ef9-b99c-86e7e48b427b',
 '0b238e73-15f4-43bb-9281-0af337aca364',
 '2f1e611d-01cf-4510-9fb6-9c7299c9068d',
 '5edf852f-59ff-45a7-9376-2a6a96b85a72',
 'e150870b-88df-40e8-b644-dfa663852754',
 '96df5ac3-36aa-4510-8870-0e632e281901',
 'fc8cc7f3-b033-41f8-834c-9d312ea9b32d',
 'a78a4ce1-9f5f-464b-b6de-1b21941f349e',
 'ac79bcac-0a55-4b9c-8ff7-5ce425551a23',
 '7a8350d5-4ff6-4496-a537-c26b597059f7',
 'd5c3984d-96ce-4e7c-a735-b01ba901df91',
 '7998413f-0e2d-4994-8850-0c262862f95e',
 'ceef0a7e-fe58-40fb-948c-430ec0e4ad55',
 'ecb2871a-6886-4c6d-ba5d-ad7bbbef0287',
 'ede8f6bb-2db7-4428-9b29-72ce41e9d451',
 'b25322b8-42ab-4579-b686-f1e9d719d915',
 '103426f9-4fa1-4cf2-8ec9-d2cd54956f1c',
 '08faf9b4-2cb0-46fb-81b6-27b1abf83973',
 'c90687d3-5a32-44c5-9d17-7587256f7e99',
 '8a7d7a15-a655-4b79-b474-43c2379f8f92',
 'b3256393-42a5-48ff-8f3b-9e65d538c248',
 '9069548f-d3a8-41c8-b9fc-b19edc1bf98b',
 '56a695c5-66e2-

In [None]:
vectorstore.similarity_search("maternity_benefits")

[Document(metadata={'source_type': 'summary', 'doc_id': '25'}, page_content='**Summary of Maternity Benefits:**\n\n- **Eligibility:** Maternity benefits are available only for in-patient hospitalizations in India. Employees with more than two living children are not eligible. Expenses for voluntary medical termination within the first 12 weeks, infertility treatment, and sterilization are not covered.\n- **Coverage Details:**\n  - Maximum benefit: INR 75,000 for normal delivery; INR 1,00,000 for C-section (within the sum insured limit).\n  - 9-month waiting period is waived.\n  - Pre- and post-natal expenses are covered up to INR 5,000, in addition to the maternity limit.\n  - Newborns are covered from day 1, provided HR is informed within 15 days of birth.\n- **Important:** Claims must be submitted within 30 days; inform HR immediately about newborn coverage to ensure eligibility and avoid claim denial.'),
 Document(metadata={'source_type': 'Children', 'parent_id': '25', 'Header_2': '

## Create MultiVectorRetriver

In [None]:
from multivector_utils import create_MVR

Leave_Policy_retriever = create_MVR(parent_docs, doc_ids, vectorstore, f"source eq 'Leave Policy'" )

Insurance_Policy_retriever = create_MVR(parent_docs, doc_ids, vectorstore, f"source eq 'Insurance Policy'" )

[Document(metadata={'Header_1': 'Employee Benefits Manual 2023-24 Novartis Group', 'Header_2': 'Your Plan Details Maternity Benefits', 'page_number': 11, 'custom_metadata': 'Your Plan Details Maternity Benefits', 'source': 'Insurance Policy'}, page_content='##Your Plan Details Maternity Benefits\n\n · Maternity benefits are admissible only if the expenses are incurred in Hospital / Nursing Home as in-patients in India.  \n· Those Insured Persons who already have two or more living children will not be eligible for this benefit.  \n· Expenses incurred in connection with voluntary medical termination of pregnancy during the first 12 weeks from the date\nof conception are not covered. Infertility Treatment and sterilization are excluded from the policy.  \n### The maternity benefit under your Group Medical Plan  \n<table>\n<tr>\n<td>Maximum Benefit</td>\n<td>INR 75,000 for Normal &amp; INR 1 Lac (New) for C - Section within Sum Insured Limit</td>\n</tr>\n<tr>\n<td>9-months waiting period<

# Agentic RAG Components

## 1. Tool Creation
Create tools to retrieve the content from the private data sources

**List of Private Data Sources**
- Leave Policy Documents
- Company Annual Reports

**Public Data Sources**
- Tavily Web Search Tool

#### Create Retrievers as Tools

In [None]:
from langchain.tools.retriever import create_retriever_tool

Leave_Policy_tool = create_retriever_tool(retriever=Leave_Policy_retriever,
                            name = 'Leave_Policy_Retriever',
                            description="Use this tool to answer questions related to Leave Policies of a company.")

Insurance_Policy_tool = create_retriever_tool(retriever=Insurance_Policy_retriever,
                            name = 'Insurance_Policy_Retriever',
                            description="Use this tool to answer questions related to Health Insurance Policies of a company.")

#### Create WEB Tool

In [None]:
from langchain_tavily import TavilySearch

# Initialize Tavily Search Tool
tavily_search_tool = TavilySearch(
    max_results=5,
    topic="general",
)

## 2. Defining the Prompt Template
The prompt must have input keys:

- tools: contains descriptions and arguments for each tool.
- tool_names: contains all tool names.
- agent_scratchpad: contains previous agent actions and tool outputs as a string.

Here’s an example:

In [None]:
from langchain.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    HumanMessagePromptTemplate,
    AIMessagePromptTemplate,
    PromptTemplate,
)

template = '''Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}'''

prompt = ChatPromptTemplate.from_messages([
        ("system", template),
        MessagesPlaceholder(variable_name="conversation_history", optional=True),
        HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}')),
        AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=['agent_scratchpad'], template='{agent_scratchpad}')),
    ])

## 3. Initializing Chat History



## 4. Initializing the Agent and Agent Executor
An agent is created and executed to process user queries and generate responses based on the configured tools and memory.

We use create_react_agent to create a reasoning-based agent.

In [None]:
from langchain.agents import AgentExecutor, create_react_agent

tools = [Leave_Policy_tool, Insurance_Policy_tool]

# Create the agent
agent = create_react_agent(llm, tools, prompt)

# Create an executor to run the agent
agent_executor = AgentExecutor(agent=agent,
                               tools=tools,
                               verbose=True,
                               stream_runnable=True,
                               handle_parsing_errors=True,
                               max_iterations=5,
                               return_intermediate_steps=True,
                              )