# Agentic RAG

The main points of an Agent-based RAG solution are:

- **Agentic**: The system is autonomous, making decisions and taking actions based on the context of the interaction.
- **RAG (Retrieval-Augmented Generation):** Combines information retrieval from the knowledge base with the LLM’s generative capabilities.

## System architecture


![agentic rag](images/agentic_rag_0.png)
![agentic rag](images/agentic_rag_1.png)
![agentic rag](images/agentic_rag_2.png)
![agentic rag](images/agentic_rag_3.jpg)

## Langchain Code

![](images/langchain_agentic_rag.jpeg)

## Read and load PDF files

In [47]:
import os
import PyPDF2
from tqdm.notebook import tqdm
import re
import json

from dotenv import load_dotenv
from elasticsearch import Elasticsearch
from langchain_openai import AzureOpenAIEmbeddings
# from langchain.chat_models import AzureChatOpenAI
from openai import AzureOpenAI
from openai import OpenAI

load_dotenv()

ES_USER= os.getenv("ES_USER")
ES_PASSWORD = os.getenv("ES_PASSWORD")
ES_ENDPOINT = os.getenv("ES_ENDPOINT")

MODEL_NAME = os.getenv("MODEL_NAME")
AZURE_EMBEDDING_ENDPOINT = os.getenv("AZURE_EMBEDDING_ENDPOINT")
AZURE_EMBEDDING_API_KEY = os.getenv("AZURE_EMBEDDING_API_KEY")
AZURE_EMBEDDING_API_VERSION = os.getenv("AZURE_EMBEDDING_API_VERSION")

AZURE_API_KEY = os.getenv("AZURE_API_KEY")
AZURE_EDNPOINT = os.getenv("AZURE_EDNPOINT")
AZURE_API_VERSION = os.getenv("AZURE_API_VERSION")
AZURE_DEPLOYMENT_ID = os.getenv("AZURE_DEPLOYMENT_ID")

OPENAI_API_KEY=os.getenv("OPENAI_API_KEY")
DEEPSEEK_URL=os.getenv("DEEPSEEK_URL")

url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

print(es.info())

elastic_index_name = "agent_rag_index"

embeddings = AzureOpenAIEmbeddings(
    model=MODEL_NAME,
    azure_endpoint=AZURE_EMBEDDING_ENDPOINT, 
    api_key= AZURE_EMBEDDING_API_KEY,
    openai_api_version=AZURE_EMBEDDING_API_VERSION
)

openai_client = OpenAI(
    api_key=OPENAI_API_KEY,
    base_url=DEEPSEEK_URL
)

# chat = AzureOpenAI(
#   api_key = AZURE_API_KEY,  
#   api_version = AZURE_API_VERSION,
#   azure_endpoint = AZURE_EDNPOINT
# )


def read_pdfs_from_folder(folder_path):
    pdf_list = []
    
    # Loop through all files in the specified folder
    for filename in tqdm(os.listdir(folder_path)):
        if filename.endswith('.pdf'):
            file_path = os.path.join(folder_path, filename)
            
            # Open each PDF file
            with open(file_path, 'rb') as file:
                reader = PyPDF2.PdfReader(file)
                content = ""
                
                # Read each page's content and append it to a string
                for page_num in range(len(reader.pages)):
                    page = reader.pages[page_num]
                    content += page.extract_text()
                
                # Add the PDF content to the list
                pdf_list.append({"content": content, "filename": filename})
    
    return pdf_list

folder_path = "./rag_data"

# all_documents = read_pdfs_from_folder(folder_path)


{'name': 'liuxgn.local', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'h66jNmrlQoGZ0j1RdU0j8Q', 'version': {'number': '8.17.2', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': '747663ddda3421467150de0e4301e8d4bc636b0c', 'build_date': '2025-02-05T22:10:57.067596412Z', 'build_snapshot': False, 'lucene_version': '9.12.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


In [48]:
embeddings

AzureOpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x11cb705d0>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x11cb5bdd0>, model='text-embedding-ada-002', dimensions=None, deployment=None, openai_api_version='2023-05-15', openai_api_base=None, openai_api_type='azure', openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=2048, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True, azure_endpoint='https://ada-embeddings1.openai.azure.com/', azure_ad_token=None, azure_ad_token_provider=None, azure_ad_async_token_provider=None, validate_base_url=True)

## Read Web URLs

In [49]:
from typing import Optional
import requests

def fetch_url_content(url: str) -> Optional[str]:
    """
    Fetches content from a URL by performing an HTTP GET request.

    Parameters:
        url (str): The endpoint or URL to fetch content from.

    Returns:
        Optional[str]: The content retrieved from the URL as a string,
                       or None if the request fails.
    """
    prefix_url: str = "https://r.jina.ai/"
    full_url: str = prefix_url + url  # Concatenate the prefix URL with the provided URL
    
    try:
        response = requests.get(full_url)  # Perform a GET request
        if response.status_code == 200:
            return response.content.decode('utf-8')  # Return the content of the response as a string
        else:
            print(f"Error: HTTP GET request failed with status code {response.status_code}")
            return None
    except requests.RequestException as e:
        print(f"Error: Failed to fetch URL {full_url}. Exception: {e}")
        return None

In [50]:
# Replace this with the specific endpoint or URL you want to fetch
url: str = "https://em360tech.com/tech-article/what-is-llama-3"  
content: Optional[str] = fetch_url_content(url)


if content is not None:
    print("Content retrieved successfully:")
else:
    print("Failed to retrieve content from the specified URL.")

Content retrieved successfully:


In [51]:
content



## Split the texts

In [52]:
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_text_splitters import RecursiveCharacterTextSplitter
from litellm import completion

In [53]:
token_size = 150
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
            model_name="gpt-4",
            chunk_size=token_size,
            chunk_overlap=0,
        )

In [54]:
def clean_text(text):
    # Remove all newline characters
    text = text.replace('\n', ' ').replace('\r', ' ')
    
    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)
    
    # Strip leading and trailing spaces
    text = text.strip()
    
    return text

In [55]:
text_chunks = text_splitter.split_text(content)
print(f"Total chunks: {len(text_chunks)}")

Total chunks: 79


In [56]:
text_chunks[0]



In [57]:
def get_embeddings(texts, model="text-embedding-3-small", api_key="your-api-key"):
    # Define the API URL
    url = "https://api.openai.com/v1/embeddings"
    
    # Prepare headers with the API key
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    
    # Prepare the request body
    data = {
        "input": texts,
        "model": model
    }
    
    # Send a POST request to the OpenAI API
    response = requests.post(url, headers=headers, data=json.dumps(data))
    
    # Check if the request was successful
    if response.status_code == 200:
        # Return the embeddings from the response
        return response.json()["data"]
    else:
        # Print error if the request fails
        print(f"Error {response.status_code}: {response.text}")
        return None

In [58]:
from langchain_elasticsearch import ElasticsearchStore

def ingest_data_into_es(texts):
    if not es.indices.exists(index=elastic_index_name):
        print("The index does not exist, going to generate embeddings")   
        docsearch = ElasticsearchStore.from_texts( 
            texts,
            embedding = embeddings, 
            es_url = url, 
            es_connection = es,
            index_name = elastic_index_name, 
            es_user = ES_USER,
            es_password = ES_PASSWORD
    )
    else: 
        print("The index already existed")
    
        docsearch = ElasticsearchStore(
            es_connection=es,
            embedding=embeddings,
            es_url = url, 
            index_name = elastic_index_name, 
            es_user = ES_USER,
            es_password = ES_PASSWORD    
        )

    return docsearch
    

In [59]:
docsearch = ingest_data_into_es(text_chunks)

The index already existed


# Search for questions

In [60]:
def search(str):
    docs = docsearch.similarity_search(str)
    return docs

In [61]:
question = "what is openai o1 model?"
docs = search(question)
print("Found docs: ", len(docs))
print(docs)

Found docs:  4
[Document(metadata={}, page_content="##### [What is GPT-4.5? A Guide to OpenAI's Latest Model](https://em360tech.com/tech-articles/what-gpt-45-guide-openais-latest-model)\n\nby Katie Baker\n\n2 min\n\n![Image 18: what is gpt-4.5](https://em360tech.com/sites/default/files/styles/content_card_secondary/public/2025-03/what-is-gpt-4.5.jpg.webp?itok=X-U2My4X)"), Document(metadata={}, page_content="[What is GPT-4.5? A Guide to OpenAI's Latest Model](https://em360tech.com/tech-articles/what-gpt-45-guide-openais-latest-model)\n\nTech Article\n\n#### What is GPT-4.5? A Guide to OpenAI's Latest Model\n\nby Katie Baker\n\n2 min\n\n[What is Alexa+? Amazon’s AI Upgrade](https://em360tech.com/tech-article/what-is-alexa%2B)\n\nTech Article\n\n#### What is Alexa+? Amazon’s AI Upgrade\n\nby Katie Baker\n\n3 min\n\n[Explore all](https://em360tech.com/ai-feed)"), Document(metadata={}, page_content='![Image 17: ema](https://em360tech.com/sites/default/files/styles/content_card_primary/publi

In [62]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

print(format_docs(docs))

##### [What is GPT-4.5? A Guide to OpenAI's Latest Model](https://em360tech.com/tech-articles/what-gpt-45-guide-openais-latest-model)

by Katie Baker

2 min

![Image 18: what is gpt-4.5](https://em360tech.com/sites/default/files/styles/content_card_secondary/public/2025-03/what-is-gpt-4.5.jpg.webp?itok=X-U2My4X)

[What is GPT-4.5? A Guide to OpenAI's Latest Model](https://em360tech.com/tech-articles/what-gpt-45-guide-openais-latest-model)

Tech Article

#### What is GPT-4.5? A Guide to OpenAI's Latest Model

by Katie Baker

2 min

[What is Alexa+? Amazon’s AI Upgrade](https://em360tech.com/tech-article/what-is-alexa%2B)

Tech Article

#### What is Alexa+? Amazon’s AI Upgrade

by Katie Baker

3 min

[Explore all](https://em360tech.com/ai-feed)

![Image 17: ema](https://em360tech.com/sites/default/files/styles/content_card_primary/public/2025-03/eae-podcast-logo.jpg.webp?itok=ih7-bInu)

Podcast

AI

#### Episode 10 – AI and Observability in Workload Automation and Orchestration

by Dan T

# Prompts

1. First prompt will check to see if the *retrieved context* can answer the user question.
2. Second prompt will get the context and question and generates the response.

## First Prompt

In [63]:
decision_system_prompt = """Your job is decide if a given question can be answered with a given context. 
If context can answer the question return 1.
If not return 0.

Do not return anything else except for 0 or 1.

Context: {context}
"""

user_prompt = """
Question: {question}

Answer:"""

## Second Prompt

In [64]:
system_prompt = """You are an expert for answering questions. Answer the question according only to the given context.
If question cannot be answered using the context, simply say I don't know. Do not make stuff up.
Your answer MUST be informative, concise, and action driven. Your response must be in Markdown.

Context: {context}
"""

user_prompt = """
Question: {question}

Answer:"""

## Ask questions

In [65]:
def azure_openai_completion(question, context, is_system_prompt=False):
    prompt = system_prompt if is_system_prompt else decision_system_prompt
    summary = chat.chat.completions.create(
        model = AZURE_DEPLOYMENT_ID,
        messages=[
                {"role": "system", "content": prompt.format(context=context) },
                {"role": "user", "content": user_prompt.format(question=question)},
            ]
    )

    print(summary)
    return summary

def deepseek_openai_completion(question, context, is_system_prompt=False):
    prompt = system_prompt if is_system_prompt else decision_system_prompt
    summary = openai_client.chat.completions.create(
        model='deepseek-chat',
        messages=[
            {"role": "system", "content": prompt.format(context=context) },
            {"role": "user", "content":user_prompt.format(question=question)},
        ],
        stream=False
    )
    print(summary)
    return summary


In [66]:
# question = "what is openai o1 model"
question = "what is Llama 3?"
results = search(question)
context = format_docs(results)
# response = azure_openai_completion(question, context)
response = deepseek_openai_completion(question, context)

has_answer = response.choices[0].message.content
has_answer

ChatCompletion(id='b713e48a-cdae-4519-89d1-548954bf098d', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))], created=1741763822, model='deepseek-chat', object='chat.completion', service_tier=None, system_fingerprint='fp_3a5770e1b4_prod0225', usage=CompletionUsage(completion_tokens=1, prompt_tokens=541, total_tokens=542, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0), prompt_cache_hit_tokens=0, prompt_cache_miss_tokens=541))


'1'

In [67]:
# question = "what is Llama 3?"
# question = "what is openai o1 model"
# question = "中国最长的河流是那条河？"
question = "What is the latest version of Elastic Stack？"
results = search(question)
context = format_docs(results)
response = deepseek_openai_completion(question, context)

has_answer = response.choices[0].message.content
has_answer

ChatCompletion(id='7212506a-3a65-43b3-b155-93468e8e2146', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='0', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))], created=1741763838, model='deepseek-chat', object='chat.completion', service_tier=None, system_fingerprint='fp_3a5770e1b4_prod0225', usage=CompletionUsage(completion_tokens=1, prompt_tokens=563, total_tokens=564, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=512), prompt_cache_hit_tokens=512, prompt_cache_miss_tokens=51))


'0'

# Check to see if retrieved context can answer the question or not

In [68]:
from IPython.display import Markdown, display
from duckduckgo_search import DDGS

In [69]:
def format_search_results(results):
    return "\n\n".join(doc["body"] for doc in results)
    

print(f"Question: {question}")
if has_answer == '1':
    print("Context can answer the question")
    response = deepseek_openai_completion(question, context, True)
    print("Answer:")
    display(Markdown(response.choices[0].message.content))
else:
    print("Context is NOT relevant. Searching online...")
    results = DDGS().text(question, max_results=5)
    context = format_search_results(results)
    print("Found online sources. Generating the response...")
    response = deepseek_openai_completion(question, context, True)
    print("Answer:")
    display(Markdown(response.choices[0].message.content))
    

Question: What is the latest version of Elastic Stack？
Context is NOT relevant. Searching online...
Found online sources. Generating the response...


NameError: name 'chat' is not defined

In [None]:
# !pip install -U duckduckgo_search

In [182]:
print(results)

[{'title': '北京天气预报,北京7天天气预报,北京15天天气预报,北京天气查询', 'href': 'https://www.weather.com.cn/weather/101010100.shtml', 'body': '北京天气预报，及时准确发布中央气象台天气信息，便捷查询北京今日天气，北京周末天气，北京一周天气预报，北京蓝天预报，北京天气预报，北京40日天气预报，还提供北京的生活指数、健康指数、交通指数、旅游指数，及时发布北京气象预警信号、各类气象资讯。'}, {'title': '【北京今天天气预报】北京天气预报24小时详情_北京天气网', 'href': 'https://www.tianqi.com/beijing/today/', 'body': '北京天气网为您提供北京天气预报24小时详情、北京今日天气预报，包括今日实时温度、24小时降水概率、湿度、pm2.5、风向、紫外线强度等，助您放心出行。 ... 宫保鸡丁做法家常简单做法 宫保鸡丁的最简单做法 ... 元宵节是怎么来的故事传说 元宵节的由来 ...'}, {'title': '北京今天阴有零星小雨，山区小雨夹雪，最高气温9℃_腾讯新闻', 'href': 'https://news.qq.com/rain/a/20250307A01F8900', 'body': '北京市气象台3月7日6时发布 天气预报：今天白天阴有零星小雨，山区小雨夹雪，北转南风二三级，最高气温9℃；夜间阴转晴，南转北风一二级 ...'}, {'title': '【北京天气】北京今天天气预报,今天,今天天气,7天,15天天气预报,天气预报一周,天气预报15天查询', 'href': 'https://www.weather.com.cn/weather1d/999999999.shtml', 'body': '北京天气预报，及时准确发布中央气象台天气信息，便捷查询北京今日天气，北京周末天气，北京一周天气预报，北京15日天气预报，北京40日天气预报，北京天气预报还提供北京各区县的生活指数、健康指数、交通指数、旅游指数，及时发布北京气象预警信号、各类气象资讯。'}, {'title': '【北京天气查询】_北京天气怎么样_2345天气预报', 'href': 'https://tianqi.2345.com/four

In [183]:
import requests

# URL of the file
url = 'https://chrt.fm/track/46DD7B/media.transistor.fm/7387a8a4/cefc95d5.mp3?download=true&src=player'

# Send a HTTP request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Open a local file in binary write mode
    with open('audio_file.mp3', 'wb') as file:
        # Write the content of the response to the file
        file.write(response.content)
    print('File downloaded successfully')
else:
    print('Failed to download file. Status code:', response.status_code)


File downloaded successfully
