# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [12]:
from datetime import datetime
from agent import Agent

In [13]:
ECOHOME_SYSTEM_PROMPT = """
You are EcoHome Energy Advisor, an AI assistant that helps users optimize home energy usage.

Your objectives:
- Minimize electricity cost
- Maximize solar energy usage when available
- Maintain user comfort and practical constraints

You have access to tools that provide:
- Weather forecasts (including solar irradiance)
- Time-of-use electricity pricing
- Historical energy usage and solar generation
- Energy-saving tips and best practices
- Energy savings calculations

Guidelines:
- Use tools whenever data is required to answer accurately.
- For questions about timing, cost, or solar optimization, consider BOTH weather and electricity prices.
- For questions based on past behavior, use historical energy usage data.
- Always provide a clear, actionable recommendation (what to do and when).
- Explain briefly WHY the recommendation is optimal.
- Include cost or energy impact estimates when possible.

If required information is missing, make reasonable assumptions and state them clearly.
If a tool fails, proceed with best-effort reasoning.

Response style:
- Clear and concise
- Practical and realistic
- Structured when helpful (steps or bullet points)
"""

In [14]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [16]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context="Location: Madrid, Spain"
)

In [17]:
print(response["messages"][-1].content)

To minimize costs and maximize solar power when charging your electric car tomorrow (October 7, 2023), follow these recommendations:

### Optimal Charging Time
- **Charge your electric car between 12:00 PM and 3:00 PM.**

### Reasons for This Recommendation
1. **Solar Power Availability**:
   - Solar irradiance is expected to peak around **12:00 PM to 2:00 PM**, with the highest solar generation at **12:00 PM** (543 W/m²) and **1:00 PM** (639.8 W/m²). This means more solar energy will be available to offset your charging needs during this time.

2. **Electricity Pricing**:
   - The electricity rates during this time are **$0.18 per kWh** (from 12:00 PM to 3:00 PM), which is higher than the off-peak rate of **$0.09 per kWh** (from 12:00 AM to 6:00 AM and after 11:00 PM). However, charging during peak solar generation will allow you to utilize solar energy, reducing your reliance on grid electricity.

### Summary of Electricity Rates
- **12:00 PM - 3:00 PM**: $0.18 per kWh (peak)
- **Off

In [18]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.model_dump()
    if obj.get("tool_call_id"):
        print("-", msg.name)

TOOLS:
- get_weather_forecast
- get_electricity_prices


## 2. Define Test Cases

In [19]:
# TODO: Define comprehensive test cases for the Energy Advisor
# Create 10 test cases covering different scenarios:
# - EV charging optimization
# - Thermostat settings
# - Appliance scheduling
# - Solar power maximization
# - Cost savings calculations

In [20]:
test_cases = [
    {
        "id": "ev_charging_1",
        "question": "When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "time recommendation, cost analysis and solar consideration",
    },
    {
        "id": "hvac_peak_prices",
        "question": "What temperature should I set my thermostat tomorrow evening if electricity prices are high?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "thermostat recommendation considering peak prices",
    },
    {
        "id": "dishwasher_off_peak",
        "question": "How much can I save by running my dishwasher during off-peak hours?",
        "expected_tools": ["get_electricity_prices", "calculate_energy_savings"],
        "expected_response": "cost savings estimate",
    },
    {
        "id": "solar_usage_today",
        "question": "What's the best time today to use appliances to maximize solar energy?",
        "expected_tools": ["get_weather_forecast"],
        "expected_response": "solar-based timing recommendation",
    },
    {
        "id": "weekly_usage_tips",
        "question": "Suggest three ways I can reduce energy use based on my recent usage.",
        "expected_tools": ["get_recent_energy_summary", "search_energy_tips"],
        "expected_response": "three personalized recommendations",
    },
    {
        "id": "ev_cost_analysis",
        "question": "Is it cheaper to charge my EV at night or in the afternoon?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "comparison of charging costs",
    },
    {
        "id": "solar_generation_history",
        "question": "How much solar energy did I generate last week?",
        "expected_tools": ["query_solar_generation"],
        "expected_response": "summary of solar generation",
    },
    {
        "id": "energy_summary_24h",
        "question": "Give me a summary of my energy usage in the last 24 hours.",
        "expected_tools": ["get_recent_energy_summary"],
        "expected_response": "usage and cost summary",
    },
    {
        "id": "appliance_scheduling",
        "question": "When should I run my washing machine to reduce costs?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "optimal scheduling recommendation",
    },
    {
        "id": "hvac_solar_preheat",
        "question": "Should I pre-cool my house tomorrow using solar power?",
        "expected_tools": ["get_weather_forecast"],
        "expected_response": "pre-cooling recommendation based on solar forecast",
    },
]

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")

## 3. Run Agent Tests

In [None]:
CONTEXT = "Location: Madrid, Spain"

In [21]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)
    
    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )
        
        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': response,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat()
        }
        test_results.append(result)
                
    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': f"Error: {str(e)}",
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


=== Running Agent Tests ===

Test 1: ev_charging_1
Question: When should I charge my electric car tomorrow to minimize cost and maximize solar power?
--------------------------------------------------

Test 2: hvac_peak_prices
Question: What temperature should I set my thermostat tomorrow evening if electricity prices are high?
--------------------------------------------------

Test 3: dishwasher_off_peak
Question: How much can I save by running my dishwasher during off-peak hours?
--------------------------------------------------

Test 4: solar_usage_today
Question: What's the best time today to use appliances to maximize solar energy?
--------------------------------------------------

Test 5: weekly_usage_tips
Question: Suggest three ways I can reduce energy use based on my recent usage.
--------------------------------------------------

Test 6: ev_cost_analysis
Question: Is it cheaper to charge my EV at night or in the afternoon?
-------------------------------------------------

In [23]:
test_results

[{'test_id': 'ev_charging_1',
  'question': 'When should I charge my electric car tomorrow to minimize cost and maximize solar power?',
  'response': {'messages': [SystemMessage(content='Location: Madrid, Spain', additional_kwargs={}, response_metadata={}, id='1663b7c0-31b4-4279-9732-79824086809a'),
    HumanMessage(content='When should I charge my electric car tomorrow to minimize cost and maximize solar power?', additional_kwargs={}, response_metadata={}, id='e96efc0d-166b-4d74-8d04-b74fdeb7c103'),
    AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 60, 'prompt_tokens': 1081, 'total_tokens': 1141, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 1024}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_29330a9688', 'id': 'cha

## 4. Evaluate Responses

In [24]:

def evaluate_response(question, final_response, expected_response):
    """Evaluate a single response against expected response"""

    if not final_response or not isinstance(final_response, str):
        return 0
    keywords= ["reccomend", "should", "best", "cost", "save", "price", "off-peak"]
    score= sum(1 for k in keywords if k in final_response.lower())
    return min(score/4, 1.0)

In [25]:
def evaluate_tool_usage(messages, expected_tools):
    """Evaluate if the right tools were used"""
    used_tools= set()
    
    for msg in messages:
        obj= msg.model_dump()
        if obj.get("tool_call_id"):
            used_tools.add(msg.name)
    matches= sum(1 for t in expected_tools if t in used_tools)

    return matches / len(expected_tools) if expected_tools else 1.0

In [28]:
def generate_evaluation_report():
    total_tests= len(test_results)
    response_scores= []
    tool_scores= []

    for r in test_results:
        resp= r["response"]
        messages= resp["messages"]

        final_text= messages[-1].content if messages else ""
        response_scores.append(evaluate_response(r["question"], final_text, r["expected_response"]))
        tool_scores.append(evaluate_tool_usage(messages, r["expected_tools"]))
    
    report = {
        "total_tests": total_tests,
        "avg_response_score": round(sum(response_scores) / total_tests, 2),
        "avg_tool_usage_score": round(sum(tool_scores) / total_tests, 2),
        "strengths": "Uses tools appropriately and provides actionable recommendations",
        "weaknesses": "Mock pricing and weather limit real-world accuracy",
        "recommendations": "Integrate real APIs and improve numeric precision",
    }

    return report

print (generate_evaluation_report())

{'total_tests': 10, 'avg_response_score': 0.57, 'avg_tool_usage_score': 0.85, 'strengths': 'Uses tools appropriately and provides actionable recommendations', 'weaknesses': 'Mock pricing and weather limit real-world accuracy', 'recommendations': 'Integrate real APIs and improve numeric precision'}
