## Retrieval Augmented Agents



Retrieval-Augmented Generation (RAG) agents combine the power of information retrieval and text generation to create more factual and contextually aware AI responses. Unlike traditional generative models that rely solely on their pre-trained knowledge, RAG systems dynamically retrieve relevant documents from an external knowledge base and incorporate that information into their responses. This process ensures that the AI can provide up-to-date, verifiable, and domain-specific answers rather than relying only on its training data, which may be outdated or limited.

The core mechanism of a RAG agent involves two main steps: retrieval and synthesis. First, a retrieval model searches a structured or unstructured knowledge base (such as databases, documents, or APIs) to fetch the most relevant information based on the user’s query. Then, a generative model (e.g., GPT) processes this retrieved data and integrates it into a coherent, context-rich response. This approach is particularly useful in applications like customer support, research assistants, coding helpers, and medical or legal AI advisors, where accuracy and contextual awareness are critical.

By incorporating external knowledge sources, retrieval-augmented agents reduce hallucinations, improve response reliability, and adapt to evolving information. They can be fine-tuned to retrieve domain-specific knowledge, making them more effective in specialized fields. Additionally, they offer a practical solution to the limitations of static language models, ensuring that AI systems remain scalable, factually accurate, and continuously improving in their performance.

### What is Parallelization?
Parallelization refers to enabling multiple agents or tasks to run simultaneously, improving efficiency and responsiveness. This is especially useful in multi-agent systems, reinforcement learning, and AI-powered automation


#### 1. Importing Dependencies

In [24]:
import requests
import json
import numpy as np
import concurrent.futures
from sentence_transformers import SentenceTransformer
import faiss

In [None]:
OPENAI_API_KEY = "your_openai_api_key"


In [26]:
# Load embedding model
embedder = SentenceTransformer('all-MiniLM-L6-v2')

In [27]:
#document store
documents = []

In [28]:
# faiss setup for parallel vector search
dimension = 384  # dimension 
index = faiss.IndexFlatIP(dimension)
faiss.omp_set_num_threads(4)  # Use 4 threads for FAISS

#### 2. Dynamic Document Addition with Embedding Update
This function **add_documents** dynamically updates the knowledge base by adding new documents and computing their embeddings, ensuring efficient retrieval in a retrieval-augmented generation (RAG) system.

In [29]:
def add_documents(new_docs):
    """Add documents with parallel embedding processing"""
    global documents, index
    
    # Determine the best processing approach based on document count
    with concurrent.futures.ThreadPoolExecutor() as executor:
        if len(new_docs) > 10:
            # For larger sets, split into batches of 32 for parallel processing
            batch_size = 32
            batches = [new_docs[i:i+batch_size] for i in range(0, len(new_docs), batch_size)]
            
            # Process batches in parallel
            embeddings_list = list(executor.map(embedder.encode, batches))
            
            # Combine results if multiple batches were processed
            if len(embeddings_list) > 1:
                embeddings = np.vstack(embeddings_list)
            else:
                embeddings = embeddings_list[0]
        else:
            # For smaller sets, process directly
            embeddings = embedder.encode(new_docs)
    
    # Normalize vectors for cosine similarity
    faiss.normalize_L2(embeddings)
    
    # Update the database
    documents.extend(new_docs)
    index.add(embeddings)
    
    print(f"Added {len(new_docs)} documents. Knowledge base now has {len(documents)} documents.")

#### 3. Efficient Document Retrieval Using Semantic Search
The **search_documents** function retrieves the most relevant documents from the knowledge base based on semantic similarity. It encodes the query, computes cosine similarities with stored embeddings, and returns the top k most relevant documents.

In [30]:
def search_documents(query, top_k=3):
    """Search documents using FAISS for parallel vector similarity"""
    if not documents:
        return ["No documents in the knowledge base."]
    
    # Encode and normalize query
    query_embedding = embedder.encode([query])
    faiss.normalize_L2(query_embedding)
    
    # Search in FAISS (parallel operation)
    distances, indices = index.search(query_embedding, top_k)
    
    # Fix: Properly check indices and distances
    results = []
    for i in range(len(indices[0])):
        idx = indices[0][i]
        # Only include valid indices and positive similarity scores
        if idx < len(documents) and distances[0][i] > 0:
            results.append(documents[idx])
    
    return results if results else ["No relevant documents found."]

#### 4. Context-Aware RAG Agent for Intelligent Question Answering
The **rag_agent function** is an advanced Retrieval-Augmented Generation (RAG) agent that enhances question answering by combining document retrieval and language model synthesis. It retrieves relevant documents from the knowledge base, integrates them into a structured system prompt, and queries OpenAI's API for a well-informed response. This ensures answers are fact-based, contextually rich, and free from hallucinations.

In [31]:
def rag_agent(question):
    """RAG agent with parallel retrieval and generation"""
    print("Retrieving and generating...")
    
    # Use ThreadPoolExecutor for parallel operations
    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Start retrieval process
        retrieval_future = executor.submit(search_documents, question, 3)
        
        # Get retrieved documents
        docs = retrieval_future.result()
        context = "\n\n".join([f"Document {i+1}: {doc}" for i, doc in enumerate(docs)])
        
        # Build system prompt
        system_prompt = f"""You are an intelligent assistant. Use this relevant information:

{context}

When answering:
1. Synthesize information from sources
2. Use your own words for a coherent response
3. If information is insufficient , acknowledge this
4. Never hallucinate information"""
        
        # Make API call in parallel thread
        api_future = executor.submit(
            requests.post,
            "https://api.openai.com/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {OPENAI_API_KEY}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-3.5-turbo",
                "messages": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": question}
                ],
                "temperature": 0.3
            }
        )
        
        # Get API response
        response = api_future.result()
        
        if response.status_code != 200:
            return f"Error: {response.status_code}, {response.text}"
        
        return response.json()["choices"][0]["message"]["content"]

In [32]:
if __name__ == "__main__":
    # Add sample documents
    sample_docs = [
        "OpenAI was founded in December 2015 by Sam Altman, Elon Musk, and others with the mission to ensure that artificial general intelligence benefits all of humanity.",
        "GPT-4 is a multimodal large language model created by OpenAI in 2023, capable of processing both text and image inputs.",
        "RAG stands for Retrieval-Augmented Generation, a technique to enhance LLM responses with external knowledge by retrieving relevant information and incorporating it into the generation process.",
        "Vector databases store embeddings of text which can be searched by similarity using mathematical operations like cosine similarity.",
        "Retrieval-Augmented Generation (RAG) helps address hallucination problems in language models by grounding responses in factual information from reliable sources.",
        "The key components of a RAG system include an embedding model, a vector database, a retrieval mechanism, and a text generation model."
    ]
    add_documents(sample_docs)
    
    question = "What are the key components of a RAG"
    print(f"Question: {question}")
    answer = rag_agent(question)
    print(f"Answer: {answer}")

Added 6 documents. Knowledge base now has 6 documents.
Question: What are the key components of a RAG
Retrieving and generating...
Answer: The key components of a Retrieval-Augmented Generation (RAG) system include an embedding model, a vector database, a retrieval mechanism, and a text generation model. RAG is a technique that enhances language model responses by incorporating external knowledge retrieved from reliable sources to improve the accuracy and factual grounding of generated text.
