# 🤖 Simple AI Agent Evaluation Lab

Welcome! In this lab, you'll learn how to evaluate AI agents using IBM's Granite model. We'll keep things simple and focus on the basics.

## What You'll Learn
1. How to connect to an AI model
2. How to create and use simple tools
3. How to evaluate AI responses

Let's start by installing the required packages:

In [None]:
!pip install replicate python-dotenv

## 1. Setup and Imports

First, let's import the libraries we need and set up our connection to the Granite model:

In [None]:
import os
import replicate
import json

# Enter your Replicate API token
api_token = input('Enter your Replicate API token: ')
os.environ['REPLICATE_API_TOKEN'] = api_token

# Initialize the Granite model
client = replicate.Client(api_token=api_token)
model = client.models.get("ibm-granite/granite-3.3-8b-instruct")

print('✅ Setup complete!')

## 2. Simple Tools

Now let's create some simple tools that our AI agent can use:

In [None]:
class SimpleAgent:
    def __init__(self):
        """Initialize our simple agent with basic tools"""
        # Weather data (simulated)
        self.weather_data = {
            "new york": "Sunny, 72°F",
            "london": "Cloudy, 65°F",
            "tokyo": "Rainy, 68°F",
            "paris": "Clear, 70°F"
        }
    
    def use_calculator(self, expression: str) -> str:
        """Simple calculator tool"""
        try:
            # Only allow basic math operations
            allowed = set('0123456789+-*/.() ')
            if not all(c in allowed for c in expression):
                return "Error: Only basic math operations allowed"
            result = eval(expression)
            return f"The result of {expression} is {result}"
        except:
            return "Error: Could not calculate"
    
    def check_weather(self, city: str) -> str:
        """Simple weather lookup tool"""
        return self.weather_data.get(city.lower(), "Weather data not available")
    
    def analyze_text(self, text: str) -> str:
        """Simple text analysis tool"""
        words = text.lower().split()
        positive = {'good', 'great', 'excellent', 'happy'}
        negative = {'bad', 'poor', 'terrible', 'sad'}
        
        pos_count = sum(1 for w in words if w in positive)
        neg_count = sum(1 for w in words if w in negative)
        
        if pos_count > neg_count:
            return "Positive sentiment detected"
        elif neg_count > pos_count:
            return "Negative sentiment detected"
        else:
            return "Neutral sentiment detected"

# Create our agent
agent = SimpleAgent()

# Test the tools
print("Calculator Test:", agent.use_calculator("2 + 2"))
print("Weather Test:", agent.check_weather("London"))
print("Text Analysis Test:", agent.analyze_text("This is a great day!"))

## 3. Evaluation Function

Let's create a simple function to evaluate our agent's responses:

In [None]:
def evaluate_response(query: str, tool_used: str, response: str) -> dict:
    """Evaluate an agent's response using the Granite model"""
    
    prompt = f"""<think>
Please evaluate this AI agent interaction:

User Query: {query}
Tool Used: {tool_used}
Agent Response: {response}

Rate on a scale of 1-5 (5 being best) and provide a brief explanation.
Return your evaluation in this JSON format:
{{"score": <1-5>, "explanation": "<your brief explanation>"}}
</think>"""

    try:
        # Get evaluation from Granite
        result = model.predict(prompt, max_new_tokens=200)
        return json.loads(result.strip())
    except:
        return {"score": 0, "explanation": "Evaluation failed"}

# Test the evaluation
test_query = "What's 5 plus 3?"
test_response = agent.use_calculator("5 + 3")
eval_result = evaluate_response(test_query, "calculator", test_response)

print(f"Query: {test_query}")
print(f"Response: {test_response}")
print(f"Evaluation Score: {eval_result['score']}/5")
print(f"Explanation: {eval_result['explanation']}")

## 4. Try It Yourself!

Now you can try different queries and evaluate the responses. Here's an example to get you started:

In [None]:
def try_agent(query: str, tool: str, input_value: str):
    """Test the agent with different queries"""
    # Get the right tool function
    if tool == "calculator":
        response = agent.use_calculator(input_value)
    elif tool == "weather":
        response = agent.check_weather(input_value)
    elif tool == "text":
        response = agent.analyze_text(input_value)
    else:
        return "Unknown tool"
    
    # Evaluate the response
    evaluation = evaluate_response(query, tool, response)
    
    # Print results
    print(f"Query: {query}")
    print(f"Tool Used: {tool}")
    print(f"Response: {response}")
    print(f"Evaluation Score: {evaluation['score']}/5")
    print(f"Feedback: {evaluation['explanation']}")

# Example usage:
print("Example 1: Calculator")
try_agent("What's 10 times 5?", "calculator", "10 * 5")

print("\nExample 2: Weather")
try_agent("What's the weather in Tokyo?", "weather", "Tokyo")

print("\nExample 3: Text Analysis")
try_agent("How does this text sound?", "text", "This is a great and wonderful day!")

## Your Turn!

Try creating your own queries below. Here are some ideas:
- Try different calculations
- Check weather for different cities
- Analyze different text samples

Just copy and modify the example below:

In [None]:
# Your tests here!
try_agent(
    query="What's 25 divided by 5?",  # Your question
    tool="calculator",                # Choose: calculator, weather, or text
    input_value="25 / 5"             # Your input
)

## 🎉 Congratulations!

You've completed the simple AI agent evaluation lab! You've learned:
- How to work with a simple AI agent
- How to use different tools
- How to evaluate AI responses

Feel free to experiment with different queries and tools!