# Agentic RAG with Instantly

This notebook demonstrates how to build an advanced Retrieval-Augmented Generation (RAG) system using the Instantly library for inference. We'll create an intelligent agent that can retrieve and reason over documentation to answer questions accurately.

Key features of this implementation:
- Multi-step reasoning
- Iterative retrieval
- Query optimization
- Self-critique and refinement

## 1. Setup and Dependencies

First, let's install the required packages and configure our environment:

In [None]:
# Install required packages
!pip install instantly langchain langchain-community sentence-transformers datasets python-dotenv rank_bm25 --upgrade

# Import necessary libraries
import os
from dotenv import load_dotenv
from instantly import OpenAIClient
import datasets
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.retrievers import BM25Retriever

# Load environment variables (HF_TOKEN)
load_dotenv()

# Initialize Instantly client
client = OpenAIClient(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"]
)

## 2. Knowledge Base Preparation

Now we'll prepare our knowledge base by loading and processing documents. We'll use the Hugging Face documentation dataset as our source:

In [None]:
# Load the Hugging Face documentation dataset
knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")

# Filter to include only Transformers documentation
knowledge_base = knowledge_base.filter(lambda row: row["source"].startswith("huggingface/transformers"))

# Convert dataset entries to Document objects with metadata
source_docs = [
    Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]})
    for doc in knowledge_base
]

# Create text splitter for processing documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Characters per chunk
    chunk_overlap=50,  # Overlap between chunks to maintain context
    add_start_index=True,
    strip_whitespace=True,
    separators=["\n\n", "\n", ".", " ", ""]  # Priority order for splitting
)

# Process documents into chunks
docs_processed = text_splitter.split_documents(source_docs)

print(f"Knowledge base prepared with {len(docs_processed)} document chunks")

## 3. Build Retriever Tool

We'll create a custom retriever tool that our agent can use to search the knowledge base. This tool will use BM25 for fast and effective retrieval:

In [None]:
class RetrieverTool:
    """A tool for retrieving relevant documents from the knowledge base."""
    
    def __init__(self, docs, max_results=5):
        self.retriever = BM25Retriever.from_documents(docs, k=max_results)
        
    def search(self, query: str) -> str:
        """Search the knowledge base with the given query."""
        # Retrieve relevant documents
        docs = self.retriever.get_relevant_documents(query)
        
        # Format results
        results = []
        for i, doc in enumerate(docs, 1):
            results.append(f"\n=== Document {i} ===\n{doc.page_content}\nSource: {doc.metadata['source']}\n")
            
        return "\n".join(results)

# Initialize retriever tool
retriever = RetrieverTool(docs_processed)

# Test the retriever
test_query = "How does model training work?"
print(retriever.search(test_query))

## 4. Create Agent with RAG Capabilities

Now we'll create an agent that can use our retriever tool to answer questions. We'll use Instantly's OpenAI-compatible interface to interact with Hugging Face models:

In [None]:
class RAGAgent:
    """An agent that combines retrieval with language model reasoning."""
    
    def __init__(self, client: OpenAIClient, retriever: RetrieverTool):
        self.client = client
        self.retriever = retriever
        
    def _format_prompt(self, query: str, context: str) -> list:
        """Format the conversation prompt with retrieved context."""
        return [
            {"role": "system", "content": "You are a helpful AI assistant. Use the provided context to answer questions accurately. If you're not sure about something, say so."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
        ]
    
    def answer(self, query: str) -> str:
        """Answer a question using retrieved context and LLM reasoning."""
        # First, get relevant context
        context = self.retriever.search(query)
        
        # Generate response using the LLM
        messages = self._format_prompt(query, context)
        response = self.client.chat.completions.create(
            model="HuggingFaceH4/zephyr-7b-beta",  # You can change this to other models
            messages=messages,
            temperature=0.7
        )
        
        return response.choices[0].message.content

# Initialize the RAG agent
agent = RAGAgent(client, retriever)

# Test the agent
test_question = "What are the key steps in fine-tuning a transformer model?"
print(f"Question: {test_question}\n")
print(f"Answer: {agent.answer(test_question)}")

## 5. Advanced RAG Features: HyDE and Query Refinement

Let's enhance our RAG system with two advanced techniques:
1. Hypothetical Document Embedding (HyDE): Generate a hypothetical answer first to improve retrieval
2. Self-Query Refinement: Analyze and refine queries based on initial results

In [None]:
class AdvancedRAGAgent(RAGAgent):
    """Enhanced RAG agent with HyDE and query refinement capabilities."""
    
    def _generate_hypothetical_document(self, query: str) -> str:
        """Generate a hypothetical document that might contain the answer (HyDE)."""
        messages = [
            {"role": "system", "content": "Given a question, write a short technical document that would contain its answer."},
            {"role": "user", "content": query}
        ]
        
        response = self.client.chat.completions.create(
            model="HuggingFaceH4/zephyr-7b-beta",
            messages=messages,
            temperature=0.7
        )
        
        return response.choices[0].message.content
    
    def _refine_query(self, query: str, initial_results: str) -> str:
        """Analyze initial results and generate a refined query."""
        messages = [
            {"role": "system", "content": "Analyze the search results and suggest a refined search query to find more relevant information."},
            {"role": "user", "content": f"Original query: {query}\n\nInitial results:\n{initial_results}"}
        ]
        
        response = self.client.chat.completions.create(
            model="HuggingFaceH4/zephyr-7b-beta",
            messages=messages,
            temperature=0.7
        )
        
        return response.choices[0].message.content
    
    def answer(self, query: str) -> str:
        """Enhanced answer method using HyDE and query refinement."""
        # Step 1: Generate hypothetical document
        hyde_doc = self._generate_hypothetical_document(query)
        
        # Step 2: Use hypothetical document to get initial results
        initial_results = self.retriever.search(hyde_doc)
        
        # Step 3: Refine the query based on initial results
        refined_query = self._refine_query(query, initial_results)
        
        # Step 4: Get additional results with refined query
        additional_results = self.retriever.search(refined_query)
        
        # Step 5: Combine all context
        combined_context = f"Initial results:\n{initial_results}\n\nAdditional results:\n{additional_results}"
        
        # Step 6: Generate final answer
        messages = self._format_prompt(query, combined_context)
        response = self.client.chat.completions.create(
            model="HuggingFaceH4/zephyr-7b-beta",
            messages=messages,
            temperature=0.7
        )
        
        return response.choices[0].message.content

# Initialize the advanced RAG agent
advanced_agent = AdvancedRAGAgent(client, retriever)

# Test the advanced agent with a complex question
complex_question = "What are the trade-offs between different attention mechanisms in transformers?"
print(f"Question: {complex_question}\n")
print(f"Answer: {advanced_agent.answer(complex_question)}")

## Conclusion

In this notebook, we've built an advanced RAG system that:
1. Uses Instantly to interface with powerful language models
2. Implements sophisticated retrieval with BM25
3. Enhances results using HyDE and query refinement
4. Provides transparent, source-backed responses

Try experimenting with different:
- Language models (via Instantly's HuggingFace integration)
- Retrieval methods (e.g., embedding-based search)
- Prompt strategies
- Document processing approaches

The combination of retrieval and reasoning enables more accurate, trustworthy, and capable AI systems.