# Agentic RAG

The main points of an Agent-based RAG solution are:

- **Agentic**: The system is autonomous, making decisions and taking actions based on the context of the interaction.
- **RAG (Retrieval-Augmented Generation):** Combines information retrieval from the knowledge base with the LLM’s generative capabilities.

## System architecture


![agentic rag](images/agentic_rag_0.png)
![agentic rag](images/agentic_rag_1.png)
![agentic rag](images/agentic_rag_2.png)
![agentic rag](images/agentic_rag_3.jpg)

## Langchain Code

![](images/langchain_agentic_rag.jpeg)

## Read and load PDF files

In [1]:
import os
import PyPDF2
from tqdm.notebook import tqdm
import re
import json

from dotenv import load_dotenv
from elasticsearch import Elasticsearch
from langchain_openai import AzureOpenAIEmbeddings
from openai import AzureOpenAI

load_dotenv()

ES_USER= os.getenv("ES_USER")
ES_PASSWORD = os.getenv("ES_PASSWORD")
ES_ENDPOINT = os.getenv("ES_ENDPOINT")

MODEL_NAME = os.getenv("MODEL_NAME")
AZURE_EMBEDDING_ENDPOINT = os.getenv("AZURE_EMBEDDING_ENDPOINT")
AZURE_EMBEDDING_API_KEY = os.getenv("AZURE_EMBEDDING_API_KEY")
AZURE_EMBEDDING_API_VERSION = os.getenv("AZURE_EMBEDDING_API_VERSION")

AZURE_API_KEY = os.getenv("AZURE_API_KEY")
AZURE_EDNPOINT = os.getenv("AZURE_EDNPOINT")
AZURE_API_VERSION = os.getenv("AZURE_API_VERSION")
AZURE_DEPLOYMENT_ID = os.getenv("AZURE_DEPLOYMENT_ID")

TAVILIO_API_KEY = os.getenv("TAVILIO_API_KEY")

url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

print(es.info())

# all_documents = read_pdfs_from_folder(folder_path)


{'name': 'liuxgn.local', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'ofa0pMjxSzyxatVvJBMEvQ', 'version': {'number': '9.3.0', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': '17b451d8979a29e31935fe1eb901310350b30e62', 'build_date': '2026-01-29T10:05:46.708397977Z', 'build_snapshot': False, 'lucene_version': '10.3.2', 'minimum_wire_compatibility_version': '8.19.0', 'minimum_index_compatibility_version': '8.0.0'}, 'tagline': 'You Know, for Search'}


In [2]:
elastic_index_name = "agent_rag_index"

embeddings = AzureOpenAIEmbeddings(
    model=MODEL_NAME,
    azure_endpoint=AZURE_EMBEDDING_ENDPOINT, 
    api_key= AZURE_EMBEDDING_API_KEY,
    openai_api_version=AZURE_EMBEDDING_API_VERSION
)

chat = AzureOpenAI(
  api_key = AZURE_API_KEY,  
  api_version = AZURE_API_VERSION,
  azure_endpoint = AZURE_EDNPOINT
)


def read_pdfs_from_folder(folder_path):
    pdf_list = []
    
    # Loop through all files in the specified folder
    for filename in tqdm(os.listdir(folder_path)):
        if filename.endswith('.pdf'):
            file_path = os.path.join(folder_path, filename)
            
            # Open each PDF file
            with open(file_path, 'rb') as file:
                reader = PyPDF2.PdfReader(file)
                content = ""
                
                # Read each page's content and append it to a string
                for page_num in range(len(reader.pages)):
                    page = reader.pages[page_num]
                    content += page.extract_text()
                
                # Add the PDF content to the list
                pdf_list.append({"content": content, "filename": filename})
    
    return pdf_list

folder_path = "./rag_data"

In [3]:
embeddings

AzureOpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x10eea9110>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x10ef39c10>, model='text-embedding-ada-002', dimensions=None, deployment=None, openai_api_version='2023-05-15', openai_api_base=None, openai_api_type='azure', openai_proxy='', embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=2048, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True, azure_endpoint='https://ada-embeddings1.openai.azure.com/', azure_ad_token=None, azure_ad_token_provider=None, validate_base_url=True)

## Read Web URLs

In [4]:
from typing import Optional
import requests

def fetch_url_content(url: str) -> Optional[str]:
    """
    Fetches content from a URL by performing an HTTP GET request.

    Parameters:
        url (str): The endpoint or URL to fetch content from.

    Returns:
        Optional[str]: The content retrieved from the URL as a string,
                       or None if the request fails.
    """
    prefix_url: str = "https://r.jina.ai/"
    full_url: str = prefix_url + url  # Concatenate the prefix URL with the provided URL
    
    try:
        response = requests.get(full_url)  # Perform a GET request
        if response.status_code == 200:
            return response.content.decode('utf-8')  # Return the content of the response as a string
        else:
            print(f"Error: HTTP GET request failed with status code {response.status_code}")
            return None
    except requests.RequestException as e:
        print(f"Error: Failed to fetch URL {full_url}. Exception: {e}")
        return None

In [5]:
# Replace this with the specific endpoint or URL you want to fetch
url: str = "https://em360tech.com/tech-article/what-is-llama-3"  
content: Optional[str] = fetch_url_content(url)


if content is not None:
    print("Content retrieved successfully:")
else:
    print("Failed to retrieve content from the specified URL.")

Content retrieved successfully:


In [6]:
content



## Split the texts

In [7]:
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_text_splitters import RecursiveCharacterTextSplitter
from litellm import completion

In [8]:
token_size = 150
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
            model_name="gpt-4",
            chunk_size=token_size,
            chunk_overlap=0,
        )

In [9]:
def clean_text(text):
    # Remove all newline characters
    text = text.replace('\n', ' ').replace('\r', ' ')
    
    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)
    
    # Strip leading and trailing spaces
    text = text.strip()
    
    return text

In [10]:
text_chunks = text_splitter.split_text(content)
print(f"Total chunks: {len(text_chunks)}")

Total chunks: 40


In [11]:
text_chunks[0]



In [12]:
def get_embeddings(texts, model="text-embedding-3-small", api_key="your-api-key"):
    # Define the API URL
    url = "https://api.openai.com/v1/embeddings"
    
    # Prepare headers with the API key
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    
    # Prepare the request body
    data = {
        "input": texts,
        "model": model
    }
    
    # Send a POST request to the OpenAI API
    response = requests.post(url, headers=headers, data=json.dumps(data))
    
    # Check if the request was successful
    if response.status_code == 200:
        # Return the embeddings from the response
        return response.json()["data"]
    else:
        # Print error if the request fails
        print(f"Error {response.status_code}: {response.text}")
        return None

In [13]:
from langchain_elasticsearch import ElasticsearchStore

def ingest_data_into_es(texts):
    if not es.indices.exists(index=elastic_index_name):
        print("The index does not exist, going to generate embeddings")   
        docsearch = ElasticsearchStore.from_texts( 
            texts,
            embedding = embeddings, 
            es_url = url, 
            es_connection = es,
            index_name = elastic_index_name, 
            es_user = ES_USER,
            es_password = ES_PASSWORD
    )
    else: 
        print("The index already existed")
    
        docsearch = ElasticsearchStore(
            es_connection=es,
            embedding=embeddings,
            es_url = url, 
            index_name = elastic_index_name, 
            es_user = ES_USER,
            es_password = ES_PASSWORD    
        )

    return docsearch
    

In [14]:
docsearch = ingest_data_into_es(text_chunks)

The index already existed


# Search for questions

In [15]:
def search(str):
    docs = docsearch.similarity_search(str)
    # similarity_threshold = 0.88 
    # docs = docsearch.similarity_search_with_relevance_scores(str, score_threshold=similarity_threshold)
    return docs

In [17]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Prompts

1. First prompt will check to see if the *retrieved context* can answer the user question.
2. Second prompt will get the context and question and generates the response.

## First Prompt

In [18]:
decision_system_prompt = """Your job is decide if a given question can be answered with a given context. 
If context can answer the question return 1.
If not return 0.

Do not return anything else except for 0 or 1.

Context: {context}
"""

user_prompt = """
Question: {question}

Answer:"""

## Second Prompt

In [19]:
system_prompt = """You are an expert for answering questions. Answer the question according only to the given context.
If question cannot be answered using the context, simply say I don't know. Do not make stuff up.
Your answer MUST be informative, concise, and action driven. Your response must be in Markdown.

Context: {context}
"""

user_prompt = """
Question: {question}

Answer:"""

## Ask questions

In [20]:
def azure_openai_completion(question, context, is_system_prompt=False):
    prompt = system_prompt if is_system_prompt else decision_system_prompt
    summary = chat.chat.completions.create(
    model = AZURE_DEPLOYMENT_ID,
    messages=[
            {"role": "system", "content": prompt.format(context=context) },
            {"role": "user", "content": user_prompt.format(question=question)},
        ]
    )

    # print(summary)
    return summary

In [21]:
question = "what is openai o1 model"
results = search(question)
context = format_docs(results)
response = azure_openai_completion(question, context)

has_answer = response.choices[0].message.content
has_answer

'0'

In [22]:
# question = "what is Llama 3?"
# question = "what is openai o1 model"
# question = "中国最长的河流是那条河？"
question = "What is the latest version of Elastic Stack？"
results = search(question)
context = format_docs(results)
response = azure_openai_completion(question, context)

has_answer = response.choices[0].message.content
has_answer

'0'

In [23]:
from IPython.display import Markdown, display
from duckduckgo_search import DDGS

import requests

def tavily_search(query, max_results=5):
    """
    Call Tavily Search API and return search results.

    :param query: search query string
    :param api_key: Tavily API key
    :param max_results: number of results to return
    :return: response JSON (dict)
    """
    url = "https://api.tavily.com/search"

    payload = {
        "api_key": TAVILIO_API_KEY,
        "query": query,
        "max_results": max_results
    }

    response = requests.post(url, json=payload)
    response.raise_for_status()

    return response.json()


# Check to see if retrieved context can answer the question or not

In [27]:
def format_search_results(tavily_response):
    """
    Extract and format content from Tavily search response
    """
    docs = tavily_response.get("results", [])
    return "\n\n".join(doc.get("content", "") for doc in docs)
    

print(f"Question: {question}")
if has_answer == '1':
    print("Context can answer the question")
    response = azure_openai_completion(question, context, True)
    print("Answer:")
    display(Markdown(response.choices[0].message.content))
else:
    print("Context is NOT relevant. Searching online...")
    results = tavily_search(question)
    print(results)
    context = format_search_results(results)
    print("Found online sources. Generating the response...")
    response = azure_openai_completion(question, context, True)
    print("Answer:")
    display(Markdown(response.choices[0].message.content))
    

Question: What is the latest version of Elastic Stack？
Context is NOT relevant. Searching online...
{'query': 'What is the latest version of Elastic Stack？', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://www.elastic.co/blog/elastic-stack-9-2-5-released', 'title': 'Elastic Stack 9.2.5 released', 'content': 'Version 9.2.5 of the Elastic Stack was released today. We recommend you upgrade to this latest version. We recommend 9.2.5 over the previous', 'score': 0.88420963, 'raw_content': None}, {'url': 'https://en.wikipedia.org/wiki/Elasticsearch', 'title': 'Elasticsearch', 'content': 'In January 2021, Elastic announced that starting with version 7.11, they ... "Elastic brings order to its product line with Elastic Stack". TechCrunch', 'score': 0.71995544, 'raw_content': None}, {'url': 'https://logz.io/learn/complete-guide-elk-stack/', 'title': 'The Complete Guide to the ELK Stack', 'content': 'In early 2021, Elastic announced a bombshell in the open

The latest version of Elastic Stack mentioned in the context is version 9.3.0, with a release date of February 03, 2026.

In [None]:
# !pip install -U duckduckgo_search

In [38]:
print(results)

[{'title': '中华人民共和国 - 知乎', 'href': 'https://www.zhihu.com/topic/19586942', 'body': '文化大革命之后开始改革开放，逐步确立了中国特色社会主义制度。 中华人民共和国陆地面积约960万平方公里，大陆海岸线1.8万多千米，岛屿岸线1.4万多千米，内海和边海的水域面积约470多万平方千 …'}, {'title': '2025 胡润中国 500 强发布，台积电、腾讯、字节位列前三 ...', 'href': 'https://www.zhihu.com/question/2002714528765457419', 'body': '2026年2月5日 · “如果你想了解中国民营经济的发展，胡润中国500强是一个很好的切入点。 这些企业是中国经济的支柱。 ”胡润集团董事长兼首席调研官胡润介绍，这500家企业吸纳了约1300万名员工， …'}, {'title': '中国的三个缩写 PRC CHN CN，各用在什么场合或领域？', 'href': 'https://www.zhihu.com/question/22379997/answers/updated', 'body': "2013年12月27日 · PRC, ZRG是国家字母 缩写： PRC是中国 英文 全称the People's Republic of China的缩写，主要用于外交等场合，强调一个中国原则； ZRG是 汉语拼音 全称Zhonghua Renmin Gongheguo …"}, {'title': '如何看待权威第三方公布的2026年1月华为中国手机市场份额 ...', 'href': 'https://www.zhihu.com/question/2001595690313348791', 'body': '如何看待权威第三方公布的2026年1月华为中国手机市场份额第一？ 2月1日，根据权威第三方销量数据（Sell out）, 华为领跑新年手机市场，市场份额提升至18.6% ，华为再夺1月中国手机市场份额第一。 [ …'}, {'title': '荷兰安世停止向中国工厂供应晶圆，安世中国称「已建立充足 ...', 'href': 'https://www.zhihu.com/question/196828013522