# Agentic RAG

The main points of an Agent-based RAG solution are:

- **Agentic**: The system is autonomous, making decisions and taking actions based on the context of the interaction.
- **RAG (Retrieval-Augmented Generation):** Combines information retrieval from the knowledge base with the LLM’s generative capabilities.

## System architecture


![agentic rag](images/agentic_rag_0.png)
![agentic rag](images/agentic_rag_1.png)
![agentic rag](images/agentic_rag_2.png)
![agentic rag](images/agentic_rag_3.jpg)

## Langchain Code

![](images/langchain_agentic_rag.jpeg)

## Read and load PDF files

In [12]:
import os
import PyPDF2
from tqdm.notebook import tqdm
import re
import json

def read_pdfs_from_folder(folder_path):
    pdf_list = []
    
    # Loop through all files in the specified folder
    for filename in tqdm(os.listdir(folder_path)):
        if filename.endswith('.pdf'):
            file_path = os.path.join(folder_path, filename)
            
            # Open each PDF file
            with open(file_path, 'rb') as file:
                reader = PyPDF2.PdfReader(file)
                content = ""
                
                # Read each page's content and append it to a string
                for page_num in range(len(reader.pages)):
                    page = reader.pages[page_num]
                    content += page.extract_text()
                
                # Add the PDF content to the list
                pdf_list.append({"content": content, "filename": filename})
    
    return pdf_list

folder_path = "./rag_data"

# all_documents = read_pdfs_from_folder(folder_path)


## Read Web URLs

In [1]:
from typing import Optional
import requests

def fetch_url_content(url: str) -> Optional[str]:
    """
    Fetches content from a URL by performing an HTTP GET request.

    Parameters:
        url (str): The endpoint or URL to fetch content from.

    Returns:
        Optional[str]: The content retrieved from the URL as a string,
                       or None if the request fails.
    """
    prefix_url: str = "https://r.jina.ai/"
    full_url: str = prefix_url + url  # Concatenate the prefix URL with the provided URL
    
    try:
        response = requests.get(full_url)  # Perform a GET request
        if response.status_code == 200:
            return response.content.decode('utf-8')  # Return the content of the response as a string
        else:
            print(f"Error: HTTP GET request failed with status code {response.status_code}")
            return None
    except requests.RequestException as e:
        print(f"Error: Failed to fetch URL {full_url}. Exception: {e}")
        return None

In [2]:
# Replace this with the specific endpoint or URL you want to fetch
url: str = "https://em360tech.com/tech-article/what-is-llama-3"  
content: Optional[str] = fetch_url_content(url)


if content is not None:
    print("Content retrieved successfully:")
else:
    print("Failed to retrieve content from the specified URL.")

Content retrieved successfully:


## Split the texts

In [5]:
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_text_splitters import RecursiveCharacterTextSplitter
from litellm import completion

In [6]:
token_size = 150
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
            model_name="gpt-4",
            chunk_size=token_size,
            chunk_overlap=0,
        )

In [7]:
def clean_text(text):
    # Remove all newline characters
    text = text.replace('\n', ' ').replace('\r', ' ')
    
    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)
    
    # Strip leading and trailing spaces
    text = text.strip()
    
    return text

In [8]:
text_chunks = text_splitter.split_text(content)
print(f"Total chunks: {len(text_chunks)}")

Total chunks: 58


In [134]:
text_chunks[0]



In [9]:
def get_embeddings(texts, model="text-embedding-3-small", api_key="your-api-key"):
    # Define the API URL
    url = "https://api.openai.com/v1/embeddings"
    
    # Prepare headers with the API key
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    
    # Prepare the request body
    data = {
        "input": texts,
        "model": model
    }
    
    # Send a POST request to the OpenAI API
    response = requests.post(url, headers=headers, data=json.dumps(data))
    
    # Check if the request was successful
    if response.status_code == 200:
        # Return the embeddings from the response
        return response.json()["data"]
    else:
        # Print error if the request fails
        print(f"Error {response.status_code}: {response.text}")
        return None

In [14]:
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

embeddings_objects = get_embeddings(text_chunks, api_key=OPENAI_API_KEY)
assert len(embeddings_objects) == len(text_chunks)

In [15]:
embeddings = [obj["embedding"] for obj in embeddings_objects]
len(embeddings[0])

1536

In [16]:
# !pip install qdrant-client

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

client = QdrantClient("http://localhost:6333")

In [17]:
collection_name = "agent_rag_index"
VECTOR_SIZE = 1536

client.delete_collection(collection_name)

client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=VECTOR_SIZE, distance=Distance.COSINE),
    
)

True

In [18]:
ids = []
payload = []

for id, text in enumerate(text_chunks):
    ids.append(id)
    payload.append({"ul": url, "content": text})

payload[0]

{'ul': 'https://em360tech.com/tech-article/what-is-llama-3',

In [19]:
client.upload_collection(
    collection_name=collection_name,
    vectors=embeddings,
    payload=payload,
    ids=ids,
    batch_size=256,
)

In [20]:
client.count(collection_name)

CountResult(count=58)

In [127]:
def search(text: str, top_k: int):
    query_embedding = get_embeddings(text, api_key=OPENAI_API_KEY)[0]["embedding"]
    
    search_result = client.search(
        collection_name=collection_name,
        query_vector=query_embedding,
        query_filter=None,  
        limit=top_k
    )
    return search_result


def format_docs(docs):
    return "\n\n".join(doc.payload["content"] for doc in docs)

# Prompts

1. First prompt will check to see if the *retrieved context* can answer the user question.
2. Second prompt will get the context and question and generates the response.

## First Prompt

In [104]:
decision_system_prompt = """Your job is decide if a given question can be answered with a given context. 
If context can answer the question return 1.
If not return 0.

Do not return anything else except for 0 or 1.

Context: {context}
"""

user_prompt = """
Question: {question}

Answer:"""

## Second Prompt

In [107]:
system_prompt = """You are an expert for answering questions. Answer the question according only to the given context.
If question cannot be answered using the context, simply say I don't know. Do not make stuff up.
Your answer MUST be informative, concise, and action driven. Your response must be in Markdown.

Context: {context}
"""

user_prompt = """
Question: {question}

Answer:"""

## Ask questions

In [140]:
question = "what is openai o1 model"
results = search(question, top_k=3)
context = format_docs(results)

In [141]:
response = completion(
    model="gpt-4o-mini",
    messages=[{"content": decision_system_prompt.format(context=context),"role": "system"}, {"content": user_prompt.format(question=question),"role": "user"}],
    max_tokens=500,
    # format="json"
    
)
has_answer = response.choices[0].message.content
has_answer

'0'

# Check to see if retrieved context can answer the question or not

In [124]:
from IPython.display import Markdown, display
from duckduckgo_search import DDGS

In [142]:
def format_search_results(results):
    return "\n\n".join(doc["body"] for doc in results)
    

print(f"Question: {question}")
if has_answer == '1':
    print("Context can answer the question")
    response = completion(
        model="gpt-4o-mini",
        messages=[{"content": system_prompt.format(context=context),"role": "system"}, {"content": user_prompt.format(question=question),"role": "user"}],
        max_tokens=500
    )
    print("Answer:")
    display(Markdown(response.choices[0].message.content))
else:
    print("Context is NOT relevant. Searching online...")
    results = DDGS().text(question, max_results=5)
    context = format_search_results(results)
    print("Found online sources. Generating the response...")
    response = completion(
        model="gpt-4o-mini",
        messages=[{"content": system_prompt.format(context=context),"role": "system"}, {"content": user_prompt.format(question=question),"role": "user"}],
        max_tokens=500
    )
    print("Answer:")
    display(Markdown(response.choices[0].message.content))
    
    

Question: what is openai o1 model
Context is NOT relevant. Searching online...
Found online sources. Generating the response...
Answer:


OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning tasks. It is designed to take more time to think before responding, allowing it to generate a thorough internal thought process. This model excels in advanced reasoning and is suitable for solving complex problems, including those in math and science. It aims to provide deep contextual understanding and support agentic workflows.

In [110]:
# !pip install -U duckduckgo_search

In [143]:
print(results)

[{'title': 'Introducing OpenAI o1', 'href': 'https://openai.com/index/introducing-openai-o1-preview/', 'body': "OpenAI o1-mini. The o1 series excels at accurately generating and debugging complex code. To offer a more efficient solution for developers, we're also releasing OpenAI o1-mini, a faster, cheaper reasoning model that is particularly effective at coding. As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost ..."}, {'title': 'Learning to Reason with LLMs - OpenAI', 'href': 'https://openai.com/index/learning-to-reason-with-llms/', 'body': 'We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding to the user. We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex ...'}, {'title': 'OpenAI o1 Hub | OpenAI', 'href': 'https://openai.com/o

In [144]:
import requests

# URL of the file
url = 'https://chrt.fm/track/46DD7B/media.transistor.fm/7387a8a4/cefc95d5.mp3?download=true&src=player'

# Send a HTTP request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Open a local file in binary write mode
    with open('audio_file.mp3', 'wb') as file:
        # Write the content of the response to the file
        file.write(response.content)
    print('File downloaded successfully')
else:
    print('Failed to download file. Status code:', response.status_code)


File downloaded successfully
