# ü§ñ Agentic RAG Tutorial

Welcome! In this notebook, you'll learn how to build an **Agentic Retrieval-Augmented Generation (RAG)** system.

## üìö What You'll Learn
- How to set up a vector store for document retrieval
- How to create a retrieval tool
- How to build an AI agent that can search documents intelligently
- How to integrate everything into an interactive chatbot

---

## üì¶ Step 1: Import Required Libraries

First, let's import all the tools we need for our RAG system.

In [1]:
import os
from pathlib import Path

from dotenv import load_dotenv
from langchain_core.documents import Document
from langchain_core.messages.base import BaseMessage
from langchain.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode
from langfuse import Langfuse
from qdrant_client import QdrantClient
from typing import Optional
import gradio as gr

## ‚öôÔ∏è Step 2: Configure Your Model

Here we set up the key parameters for our AI model. **Don't change these values!**

In [3]:
# üéØ Model Configuration (DO NOT MODIFY)
MODEL_NAME = "gpt-5-nano"
REASONING_EFFORT = "minimal"  # "minimal" | "low" | "medium" | "high"
TEMPERATURE = 0
K_RETRIEVAL = 4

## üîß Step 3: Helper Functions

These functions help us:
- Connect to Langfuse for observability
- Load our vector database
- Format documents for display

In [4]:
def create_langfuse_client() -> Langfuse:
    """Initialize Langfuse for observability tracking."""
    load_dotenv()
    return Langfuse(
        public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
        secret_key=os.environ["LANGFUSE_SECRET_KEY"],
        host=os.environ["LANGFUSE_BASE_URL"],
    )


def load_vector_store(path_to_vector_store: Optional[Path] = None):
    """Load the vector database containing FAQ documents."""
    sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")
    load_dotenv()

    this_dir = Path.cwd().parent
    path_to_vector_store = path_to_vector_store or this_dir / "vector_store"

    print(f"üìÇ Loading vector store from: {path_to_vector_store}")

    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

    # Remove lock file if exists
    lock_file = path_to_vector_store / ".lock"
    if lock_file.exists():
        os.remove(lock_file)

    client = QdrantClient(path=path_to_vector_store)
    return QdrantVectorStore(
        client=client,
        collection_name="faq_collection",
        embedding=embeddings,
        sparse_embedding=sparse_embeddings,
        vector_name="dense",
        sparse_vector_name="sparse",
        retrieval_mode=RetrievalMode.HYBRID,
    )


def format_docs_alternative(docs: list[Document]):
    """Format retrieved documents for the prompt."""
    formatted = []
    for i, doc in enumerate(docs, 1):
        formatted.append(
            f"Document {i}:\n{doc.metadata['faq_body']}\nSource: {doc.metadata['faq_id']}"
        )
    return "\n\n".join(formatted)

---

## üéØ YOUR TASK: Create a Retrieval Tool (adapt what we had but for using with langChain Agents)

### üìñ Instructions

Your goal is to implement a `retrieve_context` tool that:
1. Takes a user query as input
2. Searches the vector store for relevant documents
3. Returns formatted results

### üìö Documentation
Refer to: [LangChain RAG Documentation](https://docs.langchain.com/oss/python/langchain/rag)

### üí° Hint
Use the `@tool` decorator with `response_format="content_and_artifact"`

In [5]:
from langchain.tools import tool

load_dotenv()

# üìÇ Set up paths
notebook_dir = Path.cwd()
PATH_TO_VECTOR_STORE = notebook_dir.parent / "5_Evaluation" / "vector_store"

vector_store = load_vector_store(path_to_vector_store=PATH_TO_VECTOR_STORE)


@tool(response_format="content_and_artifact")
def retrieve_context(query: str, k: int = K_RETRIEVAL):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=k)
    serialized = format_docs_alternative(retrieved_docs)
    return serialized, retrieved_docs

üìÇ Loading vector store from: /Users/aniszakari/Documents/ai-training-for-devs/5_Evaluation/vector_store


### üß™ Test Your Retrieval Tool

Let's verify that the retrieval tool works correctly!

In [None]:
# Test the retrieve_context function ( you can comment the @tool decorator if you want to test the code)
serialized, retrieved_docs = retrieve_context(query="How do I post an ad?")
print("‚úÖ Retrieval test successful!")
print(f"Retrieved {len(retrieved_docs)} documents")

---

## ü§ñ YOUR TASK: Build the Agentic RAG System

### üìñ Instructions

Now create an agent that:
1. Uses the `retrieve_context` tool
2. Follows specific instructions for answering questions
3. Only provides information from retrieved documents

### üìö Documentation
Follow: [LangChain Agentic RAG Guide](https://docs.langchain.com/oss/python/langchain/rag)

### ‚úÖ Solution (Uncomment to see)

In [7]:
# ============================================
# SOLUTION: Create the Agent
# ============================================

from langchain.agents import create_agent

model = ChatOpenAI(model=MODEL_NAME, temperature=TEMPERATURE)
tools = [retrieve_context]

# Define the system prompt
prompt = """\
You are a helpful assistant answering questions about customer care for AI-Bay.

Use the context documents to answer the user's question. If the answer is not in the provided documents, say "I don't have that information in the provided documents."

Instructions:
1. Answer based ONLY on the provided documents
2. Be specific and cite which document(s) you used
3. If information is unclear or missing, say so
4. Keep answers concise but complete
5. Use a friendly, informative tone
"""

agent = create_agent(model, tools, system_prompt=prompt)

---

## üß™ Test the Agent

### Test 1: Stream Response

In [8]:
query = """The person who bought my item received a notification saying their package was available, 
but when they went to the pickup point, the order couldn't be found. Will I still get paid?"""

print("üîÑ Streaming response...\n")
for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()

üîÑ Streaming response...


The person who bought my item received a notification saying their package was available, 
but when they went to the pickup point, the order couldn't be found. Will I still get paid?
Tool Calls:
  retrieve_context (call_tKEeYf34bb1XhbH10qmsuU6I)
 Call ID: call_tKEeYf34bb1XhbH10qmsuU6I
  Args:
    query: seller asks about payment if pickup point cannot locate order; whether payment is guaranteed when notification says package available but couldn't be found at pickup point
Name: retrieve_context

Document 1:
Delivery by Colissimo  
If an issue occurs during the delivery of your parcel with Colissimo (such as tracking problems, damaged parcel or item, etc.), we invite you to contact Colissimo Customer Service to take the necessary steps.  
Learn more.  

Delivery by Mondial Relay  
If an issue occurs during the delivery of your parcel with Mondial Relay (parcel marked as delivered but unavailable, tracking stuck, etc.), please contact Mondial Relay Customer S

### Test 2: Get Complete Response and observe the nature of the output

In [None]:
output = agent.invoke({"messages": [{"role": "user", "content": query}]})

### Adapt the rag_assistant_response code below consequently
### if you want to add the tracing with langfuse see [here](https://langfuse.com/integrations/frameworks/langchain) (callbackhandler)

---

## üé® Build an Interactive Chat Interface

### üîç Optional: Add Langfuse Tracing

Langfuse helps you monitor and debug your RAG system. Learn more: [Langfuse Integration](https://langfuse.com/integrations/frameworks/langchain)

### üí¨ Create the Chat Function

In [None]:
from langfuse import get_client
from langfuse.langchain import CallbackHandler


# Initialize Langfuse CallbackHandler for Langchain (tracing)
langfuse_handler = CallbackHandler()


def rag_assistant_response(message, history):
    """Handle chat messages from Gradio interface."""
    output = agent.invoke(
        {"messages": [{"role": "user", "content": message}]},
        config={"callbacks": [langfuse_handler]},
    )
    return output["messages"][-1].content

    # for event in agent.stream({"messages": [{"role": "user", "content": message}]},stream_mode="values"):
    #     yield event["messages"][-1].content

### üöÄ Launch the Interactive Chat

In [12]:
# Launch interactive chat
demo = gr.ChatInterface(
    fn=rag_assistant_response,
    title="ü§ñ AI-Bay Customer Support Assistant",
    description="Ask me anything about AI-Bay! Powered by Langfuse prompt management.",
    examples=[
        "How do I post an ad?",
        "How can I contact a seller?",
        "Where can I see my messages?",
        "What are the posting guidelines?",
    ],
)

demo.launch()

* Running on local URL:  http://127.0.0.1:7865
* To create a public link, set `share=True` in `launch()`.




---

## üéì Next Steps

### Challenge: Integrate with Evaluation

Now that you have a working agentic RAG system, try integrating it with the evaluation framework from Exercise 5!

This will help you:
- Measure the quality of your agent's responses
- Compare different configurations
- Identify areas for improvement

### üí° Tips
- Use the same evaluation metrics
- Test with various query types
- Monitor performance with Langfuse

---

## üìö Additional Resources

- [LangChain RAG Documentation](https://docs.langchain.com/oss/python/langchain/rag)
- [Langfuse Tracing Guide](https://langfuse.com/integrations/frameworks/langchain)
- [Gradio Documentation](https://www.gradio.app/docs)

---

**Great job! You've built a complete agentic RAG system! üéâ**