# Simple CLM Dataset Search Agent

This notebook demonstrates a basic agent that can:
1. Search for datasets based on topics
2. Answer questions about dataset metadata
3. Maintain conversation history

## Step 1: Install Required Packages

In [None]:
# Install required packages
!pip -q install python-dotenv ipywidgets pydantic-ai fastmcp openai nest-asyncio

## Step 2: Import Libraries

In [None]:
import asyncio
import os
import nest_asyncio
from pydantic_ai import Agent, RunContext
from fastmcp import Client
from dataclasses import dataclass
from typing import Optional
import ipywidgets as widgets
from IPython.display import display, clear_output
from datetime import datetime

# Enable nested asyncio for Jupyter notebooks
nest_asyncio.apply()
print("‚úì Libraries imported successfully")

## Step 3: Set Up API Keys

You need an OpenAI API key or a NRP API key to use this agent. Create a `.env` file with:
```
OPENAI_API_KEY=your_openai_key

NRP_API_KEY=your_nrp_key
```

In [None]:
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Check API keys
openai_key = os.getenv('OPENAI_API_KEY')
nrp_key = os.getenv('NRP_API_KEY')

if not openai_key:
    print("‚ö†Ô∏è Warning: OPENAI_API_KEY not set!")
else:
    print("‚úì OpenAI API key found")

if not nrp_key:
    print("‚ö†Ô∏è Warning: NRP_API_KEY not set!")
else:
    print("‚úì NRP API key found")

# Choose your model: "openai" or "nrp"
MODEL = "openai"  # Change this to "nrp" to use Qwen3

if MODEL == "openai":
    print("‚úì Using OpenAI GPT-4o-mini")
elif MODEL == "nrp":
    print("‚úì Using NRP Qwen3")

## Step 4: Initialize MCP Client

The MCP (Model Context Protocol) client connects to the dataset search service.

In [None]:
# Initialize the MCP client for dataset search
mcp_client = Client("https://wenokn.fastmcp.app/mcp")
print("‚úì MCP client initialized")

## Step 5: Create the Agent

This is the core of our application. The agent:
- Understands natural language questions
- Uses the search_datasets tool to find relevant datasets
- Maintains conversation history
- Can answer follow-up questions about dataset metadata

In [None]:
@dataclass
class AgentContext:
    """Stores information about the current dataset being discussed"""
    current_dataset: Optional[dict] = None

def get_model_config(model_name: str = "openai"):
    """Get model configuration based on model name."""
    if model_name == "nrp":
        # Configure for NRP
        os.environ['OPENAI_BASE_URL'] = 'https://ellm.nrp-nautilus.io/v1'
        os.environ['OPENAI_API_KEY'] = os.getenv('NRP_API_KEY', '')
        return 'openai:qwen3'
    else:
        # Configure for OpenAI
        if 'OPENAI_BASE_URL' in os.environ:
            del os.environ['OPENAI_BASE_URL']
        return 'openai:gpt-4o-mini'

# Create the agent with selected model
agent = Agent(
    model=get_model_config(MODEL),
    deps_type=AgentContext,
    system_prompt="""You are a helpful assistant that helps users find and learn about California Landscape Metrics datasets.

You have access to a search_datasets tool that can find relevant datasets based on user queries.

When a user asks about a topic:
1. Use the search_datasets tool to find the most relevant dataset
2. Present the top result with key information (title, description, units)
3. Answer any follow-up questions about the dataset metadata

The conversation history is provided in this format:
User: <question1>
Assistant: <answer1>
User: <question2>

Use this history to understand context for follow-up questions.

Be concise and helpful!"""
)

@agent.tool
async def search_datasets(
    ctx: RunContext[AgentContext],
    query: str,
    top_k: int = 3
) -> dict:
    """Search for datasets related to the query.
    
    Args:
        query: The search query (e.g., 'carbon turnover', 'burn probability')
        top_k: Number of results to return (default: 3)
    
    Returns:
        Dictionary with search results and metadata
    """
    async with mcp_client:
        result = await mcp_client.call_tool(
            "search_datasets",
            {"query": query, "top_k": top_k}
        )
        
        data = result.data
        if data.get('success') and data.get('datasets'):
            # Store the top dataset in context for follow-up questions
            best_dataset = data['datasets'][0]
            ctx.deps.current_dataset = best_dataset
            
            return {
                'success': True,
                'top_dataset': best_dataset,
                'alternatives': data['datasets'][1:] if len(data['datasets']) > 1 else [],
                'message': f"Found: {best_dataset['title']}"
            }
        else:
            return {
                'success': False,
                'message': 'No datasets found',
                'error': data.get('error', 'Unknown error')
            }

print(f"‚úì Agent created successfully with {MODEL}!")

## Step 6: Add Conversation History Management

This wrapper class manages the conversation history, making the agent "remember" previous exchanges.

In [None]:
class ConversationalAgent:
    """Wrapper that adds conversation history to the agent"""
    
    def __init__(self, agent, model_name="openai"):
        self.agent = agent
        self.model_name = model_name
        self.history = []  # Stores conversation history
        # Set default timeout based on model
        self.default_timeout = 180 if model_name == "nrp" else 60
    
    async def ask(self, question: str, timeout: int = None) -> str:
        """Ask a question and get a response.
        
        Args:
            question: The user's question
            timeout: Maximum seconds to wait for response (uses default if None)
        
        Returns:
            The agent's response as a string
        """
        # Use default timeout if not specified
        if timeout is None:
            timeout = self.default_timeout
            
        # Build the full input with history
        if self.history:
            full_input = "\n".join(self.history) + f"\nUser: {question}"
        else:
            full_input = f"User: {question}"
        
        try:
            # Run the agent with timeout
            result = await asyncio.wait_for(
                self.agent.run(full_input, deps=AgentContext()),
                timeout=timeout
            )
            
            # Extract the response
            response = result.output if hasattr(result, 'output') else str(result)
            
            # Update history
            self.history.append(f"User: {question}")
            self.history.append(f"Assistant: {response}")
            
            return response
            
        except asyncio.TimeoutError:
            return f"Error: Request timed out after {timeout} seconds. Try a simpler question or switch to OpenAI model."
        except Exception as e:
            return f"Error: {type(e).__name__}: {str(e)}"
    
    def clear_history(self):
        """Clear the conversation history"""
        self.history = []

# Create the conversational agent with model-aware timeout
conv_agent = ConversationalAgent(agent, model_name=MODEL)
print(f"‚úì Conversational agent ready with {MODEL} (timeout: {conv_agent.default_timeout}s)!")

## Step 7: Create Chat Interface

A simple, user-friendly chat interface using Jupyter widgets.

In [None]:
class SimpleChatInterface:
    """Simple chat interface for the agent"""
    
    def __init__(self, agent):
        self.agent = agent
        self.messages = []
        
        # Create UI components
        self.output_area = widgets.VBox(
            layout=widgets.Layout(
                border='1px solid #ddd',
                height='400px',
                overflow_y='auto',
                padding='10px',
                margin='10px 0'
            )
        )
        
        self.input_box = widgets.Textarea(
            placeholder='Ask about datasets (e.g., "Find datasets about carbon turnover")...',
            layout=widgets.Layout(width='100%', height='80px')
        )
        
        self.send_button = widgets.Button(
            description='Send',
            button_style='primary',
            layout=widgets.Layout(width='100px')
        )
        
        self.clear_button = widgets.Button(
            description='Clear',
            button_style='warning',
            layout=widgets.Layout(width='100px', margin='0 0 0 10px')
        )
        
        self.status_label = widgets.HTML(value="‚úÖ Ready")
        
        # Connect buttons
        self.send_button.on_click(self.on_send)
        self.clear_button.on_click(self.on_clear)
        
        # Layout
        button_row = widgets.HBox([self.send_button, self.clear_button, self.status_label])
        self.interface = widgets.VBox([
            widgets.HTML("<h3>ü§ñ Dataset Search Agent</h3>"),
            self.output_area,
            self.input_box,
            button_row
        ])
        
        # Welcome message
        self.add_message(
            "Welcome! I can help you find California Landscape Metrics datasets.\n\n"
            "Try asking:\n"
            "‚Ä¢ Find datasets about carbon turnover\n"
            "‚Ä¢ What datasets are available for burn probability?\n"
            "‚Ä¢ Tell me about the units used in this dataset\n"
            "‚Ä¢ What's the description of this dataset?",
            "system"
        )
    
    def add_message(self, text, role="user"):
        """Add a message to the chat display"""
        timestamp = datetime.now().strftime("%H:%M:%S")
        
        if role == "user":
            color = "#007bff"
            icon = "üë§"
            label = "You"
            bg = "#e7f3ff"
        elif role == "assistant":
            color = "#28a745"
            icon = "ü§ñ"
            label = "Agent"
            bg = "#e8f5e9"
        else:
            color = "#6c757d"
            icon = "‚ÑπÔ∏è"
            label = "System"
            bg = "#f8f9fa"
        
        message = widgets.HTML(
            value=f"""
            <div style='margin: 10px 0; padding: 10px; background: {bg}; 
                        border-radius: 8px; border-left: 4px solid {color};'>
                <div style='display: flex; justify-content: space-between; margin-bottom: 5px;'>
                    <strong style='color: {color};'>{icon} {label}</strong>
                    <span style='color: #999; font-size: 0.85em;'>{timestamp}</span>
                </div>
                <div style='white-space: pre-wrap;'>{text}</div>
            </div>
            """
        )
        
        self.messages.append(message)
        self.output_area.children = tuple(self.messages)
    
    def on_send(self, button):
        """Handle send button click"""
        question = self.input_box.value.strip()
        if not question:
            return
        
        # Show user message
        self.add_message(question, "user")
        self.input_box.value = ""
        
        # Disable input while processing
        self.send_button.disabled = True
        self.input_box.disabled = True
        self.status_label.value = "<span style='color: orange;'>‚è≥ Thinking...</span>"
        
        try:
            # Get response from agent
            response = asyncio.get_event_loop().run_until_complete(
                self.agent.ask(question)
            )
            
            # Show agent response
            self.add_message(response, "assistant")
            self.status_label.value = "<span style='color: green;'>‚úÖ Ready</span>"
            
        except Exception as e:
            error_msg = f"Error: {str(e)}"
            self.add_message(error_msg, "system")
            self.status_label.value = "<span style='color: red;'>‚ùå Error</span>"
        
        finally:
            # Re-enable input
            self.send_button.disabled = False
            self.input_box.disabled = False
    
    def on_clear(self, button):
        """Clear the chat"""
        self.messages = []
        self.agent.clear_history()
        self.output_area.children = tuple(self.messages)
        self.add_message(
            "Chat cleared! Ready for new questions.",
            "system"
        )
    
    def display(self):
        """Display the chat interface"""
        clear_output(wait=True)
        display(self.interface)

print("‚úì Chat interface ready!")

## Step 8: Launch the Chat Interface

Run this cell to start chatting with your agent!

In [None]:
# Create and display the chat interface
chat = SimpleChatInterface(conv_agent)
chat.display()

## Example Questions to Try

1. **Find datasets about carbon turnover**
2. **What datasets are available for burn probability?**
3. **What are the units for this dataset?**
4. **Can you describe this dataset in more detail?**
5. **Search for datasets related to fire risk**

## How This Works

### Agent Architecture

```
User Question
     ‚Üì
ConversationalAgent (adds history)
     ‚Üì
Pydantic AI Agent (processes with context)
     ‚Üì
search_datasets tool (queries MCP server)
     ‚Üì
Response (with dataset metadata)
```

### Key Components

1. **LLM (Large Language Model)**: The AI brain that processes natural language and generates responses
   - OpenAI GPT-4o-mini: Fast and cost-effective model from OpenAI
   - NRP Qwen3: Open-source model hosted on NRP infrastructure
2. **Agent**: The core orchestrator that understands questions and decides when to use tools
3. **Tool (search_datasets)**: Connects to the MCP server to search for datasets
4. **Context**: Stores the current dataset being discussed for follow-up questions
5. **History**: Maintains conversation flow to enable contextual responses
6. **Chat Interface**: User-friendly UI for interaction

## Next Steps

To extend this agent, you could:
- Add more tools (statistics, visualization, etc.)
- Improve the system prompt for better responses
- Add filtering options for search results
- Add support for multiple datasets simultaneously