<img src="https://opea.dev/wp-content/uploads/sites/9/2024/04/opea-horizontal-color.svg" alt="OPEA Logo">

<div style="text-align: center; margin: 20px 0;">
<a href="https://colab.research.google.com/github/opea-project/Course-Material/blob/main/Curriculas/Kubernetes/AgentQnA/2_Customize_rag_agent.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
</div>

> **💡 Quick Start:** Click the button above to open this notebook directly in Google Colab - no local setup required!

# <span style="color: #2E86AB; font-weight: bold;">🛠️ Customize Your RAG Agent</span>

## <span style="color: #A23B72; font-size: 1.2em;">⚡ Advanced Agent Customization with </span><span style="color: #F18F01; font-size: 1.2em;">OPEA AgentQnA</span>

---

### <span style="color: #C73E1D;">🎯 What You'll Learn</span>

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 15px; border-radius: 10px; color: white; margin: 10px 0;">
<strong>🚀 By the end of this tutorial, you'll be able to:</strong>
<ul>
<li>🧹 <strong>Clean</strong> - Reset and manage your RAG knowledge database</li>
<li>📚 <strong>Ingest</strong> - Add new knowledge sources to your agents</li>
<li>⚙️ <strong>Configure</strong> - Customize agent tools and capabilities</li>
<li>🧠 <strong>Specialize</strong> - Create domain-specific knowledge agents</li>
<li>🧪 <strong>Test</strong> - Validate your customized agent behaviors</li>
</ul>
</div>

# Prerequisites

This notebook assumes you have successfully completed **Notebook 1: Deploy AgentQnA** and have:

## 1. **Running AgentQnA Deployment**
   - All AgentQnA pods are running in your Kubernetes cluster
   - Supervisor, RAG Agent, and SQL Agent are operational
   - Port forwarding configured for accessing services

## 2. **Required Tools Available**
   - **`kubectl`** - Configured and connected to your cluster
   - **`curl`** and **`jq`** - For testing API endpoints
   - **`docker`** - If using KIND cluster for file mounting

## 3. **Agent Configuration Files**
   - Agent tools and configuration files should be accessible
   - `/mnt/tools/` directory mounted in your cluster (for KIND)

## 4. **Network Access**
   - Port forwarding capabilities for services (ports 9090, 6007)
   - Internet access for downloading new knowledge sources

---

# 1. Introduction: Why Customize Your RAG Agent?


## 1.1 🧠 From Generic to Specialized: The Power of Domain Knowledge

In [Notebook 1](./1_Deploy_AgentQnA.ipynb), we deployed a general-purpose AgentQnA system with basic music-related knowledge. While this demonstrates the core capabilities, **real-world applications require specialized knowledge bases** tailored to specific domains or use cases.

> **Think of it this way:**
> - 🏥 **Healthcare Agent**: Needs medical literature, drug databases, and clinical guidelines  
> - 💼 **Enterprise Agent**: Requires company policies, internal documentation, and workflows  
> - 🎓 **Education Agent**: Benefits from course materials, research papers, and learning resources  
> - 🏛️ **Legal Agent**: Must access legal precedents, regulations, and case studies

### 🔑 Key Benefits of Customization:

| Benefit | Description | Impact |
|---------|-------------|---------|
| **🎯 Relevance** | Domain-specific knowledge improves answer accuracy | Higher user satisfaction |
| **🚀 Performance** | Focused knowledge base reduces noise in retrieval | Faster, more precise responses |
| **🔒 Control** | Manage exactly what information your agent can access | Better security and compliance |
| **📈 Scalability** | Update knowledge without redeploying the entire system | Faster iteration cycles |

---

## 1.2 🛠️ What We'll Customize Today

In this notebook, we'll transform our music-focused RAG agent into a **1960s counterculture specialist** by:

1. **🧹 Cleaning the existing knowledge base** - Remove blueprint RAG music data
2. **📚 Ingesting new specialized content to the RAG Agent** - Add Summer of Love information  
3. **⚙️ Updating agent tools** - Modify tool descriptions for new domain
4. **🧪 Testing specialized behaviors** - Validate domain expertise

This process demonstrates how to adapt any OPEA agent for your specific use case.

---


# 2. Setup and Initialization

In [None]:
# Download your kubeconfig access 

!export KUBECONFIG=/path/to/your/kubeconfig

In [None]:
# Verify kubectl connection and cluster info
!kubectl cluster-info
!kubectl get nodes

In [None]:
# First, let's ensure our Supervisor agent is accessible
# This assumes you have completed Notebook 1 and have AgentQnA running

import subprocess
import socket
import time

def is_port_open(host='localhost', port=9090):
    """Check if port is already open"""
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.settimeout(1)
        try:
            sock.connect((host, port))
            return True
        except:
            return False

if is_port_open():
    print("✅ Port-forward already running on localhost:9090")
else:
    print("🔁 Starting port-forward to Supervisor agent...")
    subprocess.Popen(
        ["kubectl", "port-forward", "svc/agentqna-supervisor", "9090:9090"],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL
    )
    time.sleep(2)
    if is_port_open():
        print("✅ Supervisor agent now accessible on localhost:9090")
    else:
        print("❌ Failed to start port-forward")

# 3. Clean and Reset the Knowledge Base



## 3.1 🧹 Understanding Knowledge Base Management

Before adding new specialized knowledge, we need to clean the existing knowledge available for the RAG Agent. This ensures:

- **🎯 Focused Results**: Removes irrelevant information that could interfere with new domain queries
- **🚀 Better Performance**: Smaller, focused knowledge bases improve retrieval speed and accuracy  
- **🧠 Cleaner Reasoning**: Agents can focus on the most relevant information for their specialized domain

> **💡 Best Practice**: Always start with a clean knowledge base when specializing an agent for a new domain

## 3.2 Access the Data Preparation Service

The Data Prep microservice is part of the [DocIndexRetriever blueprint](https://github.com/opea-project/GenAIExamples/tree/main/DocIndexRetriever). It handles all knowledge base operations - adding, cleaning, and deleting documents.

- Validate is accessible outside the cluster

In [None]:
# Make data prep microservice available outside of the cluster
# This service manages our knowledge base (vector database)

import subprocess
import subprocess
import socket
import time

def is_port_open(host='localhost', port=6007):
    """Check if data prep port is already open"""
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.settimeout(1)
        try:
            sock.connect((host, port))
            return True
        except:
            return False

if is_port_open(port=6007):
    print("✅ Data Prep service already accessible on localhost:6007")
else:
    print("🔁 Starting port-forward to Data Prep service...")
    subprocess.Popen(
        ["kubectl", "port-forward", "svc/agentqna-data-prep", "6007:6007"],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL
    )
    time.sleep(2)
    if is_port_open(port=6007):
        print("✅ Data Prep service now accessible on localhost:6007")
    else:
        print("❌ Failed to start port-forward for Data Prep service")

In [None]:
# Get files present in the knowledge base (vector database). Files were ingested in notebook 1
# This shows us what documents are currently indexed

print("📋 Current files in knowledge base:")
!curl -X POST "http://localhost:6007/v1/dataprep/get" \
    -H "Content-Type: application/json"

## 3.4 Clear the Knowledge Base

Now let's remove all existing documents to prepare for our new specialized content:


In [None]:
# Delete all documents from the knowledge base
# Using "all" removes everything - perfect for starting fresh

print("🧹 Clearing knowledge base...")
!curl -X POST "http://localhost:6007/v1/dataprep/delete" \
    -d '{"file_path": "all"}' \
    -H "Content-Type: application/json"

!curl -X POST "http://localhost:6007/v1/dataprep/get" \
    -H "Content-Type: application/json"

print("\n✅ Knowledge base cleared! Ready for new content.")



# 4. Add Specialized Knowledge



## 4.1 🌻 Transforming to a 1960s Counterculture Expert

Now we'll ingest specialized content about the **Summer of Love (1967)** - a pivotal moment in 1960s counterculture. This demonstrates how to:

- **🎯 Focus your agent** on a specific historical period and cultural movement
- **📚 Add authoritative sources** from reputable knowledge repositories  
- **🧠 Create domain expertise** that wasn't present in the original deployment

### Why the Summer of Love?

The Summer of Love represents a fascinating case study in:
- **🎵 Cultural Revolution**: Music, art, and social movements intersecting
- **🏛️ Historical Significance**: Impact on society, politics, and culture
- **🌐 Rich Documentation**: Well-documented period with multiple perspectives

## 4.2 Ingest New Knowledge Source

In [None]:
# Ingest specialized knowledge about the Summer of Love from Britannica
# This authoritative source will become our agent's primary knowledge base

print("📚 Ingesting Summer of Love knowledge from Britannica...")
print("🔄 This may take a few moments as the content is processed and vectorized...")

!curl -X POST "http://localhost:6007/v1/dataprep/ingest" \
    -H "Content-Type: multipart/form-data" \
    -F 'link_list=["https://www.britannica.com/event/Summer-of-Love-1967"]'

print("\n✅ Knowledge ingestion complete! Your agent now specializes in 1960s counterculture.")

# 5. Update Agent Tool Configuration



## 5.1 🔧 Why Update Tool Descriptions?

Agent tools have **descriptions** that guide the Supervisor agent in deciding when to use them. By updating these descriptions, we can:

- **🎯 Improve Routing**: Help the supervisor understand the new domain expertise
- **🧠 Enhance Reasoning**: Provide context about what the RAG agent now knows
- **⚡ Optimize Performance**: Ensure questions get routed to the right specialized agent

## 5.2 Modify the Tool Configuration

We need to update the `supervisor_agent_tools.yaml` file to reflect our agent's new specialization:

### 📝 Tool Configuration Overview

You can edit the `supervisor_agent_tools.yaml` file in `/mnt/tools/` to update tool descriptions. The configuration file tells the Supervisor agent what each tool does and when to use it.

**Replace `search_knowledge_base` block for the updated configuration for our Summer of Love specialization:** 

```bash
search_knowledge_base:
  description: Search a knowledge base for getting more information about 1960s Counterculture for a given query. Returns text related to the query.
  callable_api: tools.py:search_knowledge_base
  args_schema:
    query:
      type: str
      description: query
  return_output: retrieved_data

```

`Save` the file and mount it to your control plane

In [None]:
# Deploy updated agent tools and configuration
# This copies our customized tool configuration to the cluster

import os

WORKDIR = os.getcwd()
TOOLS_DIR = os.path.join(WORKDIR, "mnt/tools")

print("🔧 Deploying updated agent configuration...")

# Ensure the tools directory exists in the cluster
if os.system("docker exec kind-control-plane mkdir -p /mnt/tools") == 0:
    print("✅ Verified /mnt/tools directory exists")
else:
    print("❌ Failed to create directory")

# Copy updated tools and configuration to the cluster
if os.system(f"docker cp {TOOLS_DIR}/. kind-control-plane:/mnt/tools/") == 0:
    print("✅ Updated tools deployed to cluster")
    print("📝 Tool descriptions now reflect Summer of Love specialization")
else:
    print("❌ Failed to deploy updated tools")


In [None]:
# Restart the Supervisor agent to load the updated configuration
# This ensures the agent picks up our new tool descriptions

print("🔄 Restarting Supervisor agent to apply changes...")
!kubectl rollout restart deployment/agentqna-supervisor

print("⏳ Waiting for rollout to complete...")
!kubectl rollout status deployment/agentqna-supervisor --timeout=60s

print("✅ Supervisor agent restarted with updated configuration!")

# 6. Test Your Customized Agent



## 6.1  Validating Specialization

Now let's test our customized agent to ensure it:
- **🎯 Understands its new domain** (Summer of Love & 1960s counterculture)
- **🧠 Retrieves relevant information** from the specialized knowledge base
- **⚡ Routes questions appropriately** through the Supervisor agent

### Expected Behavior:
- Questions about 1960s counterculture should route to our specialized RAG agent
- The agent should provide detailed, accurate information about the Summer of Love
- Responses should be more focused and relevant than generic answers

## 6.2 Setup Port Forwarding for Testing

In [None]:
## Enable port forwarding
import subprocess
import socket
import time

def is_port_open(host='localhost', port=9090):
    """Check if port is already open"""
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.settimeout(1)
        try:
            sock.connect((host, port))
            return True
        except:
            return False

if is_port_open():
    print("✅ Port-forward already running on localhost:9090")
else:
    print("🔁 Starting port-forward to Supervisor agent...")
    subprocess.Popen(
        ["kubectl", "port-forward", "svc/agentqna-supervisor", "9090:9090"],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL
    )
    time.sleep(2)
    if is_port_open():
        print("✅ Supervisor agent now accessible on localhost:9090")
    else:
        print("❌ Failed to start port-forward")

In [None]:
!curl -s http://localhost:9090/v1/chat/completions \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"role": "user", "messages": "How did the Fillmore Auditorium and Avalon Ballroom contribute to the music scene?", "thread_id": "18364", "stream": false}' \
  | jq -r '.text'

Check for the logs on the RAG Agent. We can see the supervisor identifies that is a topic that could be answered by the RAG agent but the search returned documents that aren't fully related, so the supervisor will discard it and use its own knowledge to answer.

In [None]:
!kubectl logs -l app.kubernetes.io/name=ragagent --tail=30

In [None]:
!kubectl logs -l app.kubernetes.io/name=supervisor --tail=30

Let's try with a different question

In [None]:
!curl -s http://localhost:9090/v1/chat/completions \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"role": "user", "messages": "How did the 1967 gatherings in San Francisco’s Haight-Ashbury district reflect the rise of countercultural values in the United States?", "thread_id": "18364", "stream": false}' \
  | jq -r '.text'

In [None]:
!kubectl logs -l app.kubernetes.io/name=ragagent --tail=30

In [None]:
!kubectl logs -l app.kubernetes.io/name=supervisor --tail=30

# 🎉 Congratulations!

You have successfully customized your OPEA AgentQnA system! You now have:

✅ **A specialized knowledge base** focused on 1960s counterculture and the Summer of Love  
✅ **Updated tool configurations** that guide the Supervisor agent appropriately  
✅ **Domain-specific expertise** that provides more relevant and accurate responses  
✅ **Production-ready customization** that can be applied to any domain

### 🔄 What You've Learned

This notebook demonstrated the complete customization workflow:

1. **🧹 Knowledge Base Management** - How to clean and reset your agent's knowledge
2. **📚 Content Ingestion** - Adding specialized domain knowledge from authoritative sources
3. **⚙️ Tool Configuration** - Updating agent descriptions to improve routing decisions
4. **🧪 Validation Testing** - Ensuring your customized agent behaves as expected

### 🚀 Next Steps

Now that you understand the customization process, you can:

- **🏥 Create healthcare agents** with medical literature and clinical guidelines
- **💼 Build enterprise agents** with company policies and internal documentation  
- **🎓 Develop educational agents** with course materials and research papers
- **🏛️ Design legal agents** with regulations and case studies

### 💡 Key Takeaways

- **Domain specialization** dramatically improves agent relevance and accuracy
- **Tool descriptions** are crucial for proper question routing in multi-agent systems
- **Knowledge base management** allows for rapid iteration and domain switching
- **OPEA's modular architecture** makes customization straightforward and scalable

---

**Ready to build your next specialized agent?**
