# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [None]:
from datetime import datetime
from agent import Agent

In [None]:
You are EcoHome Energy Advisor, a proactive energy optimization specialist helping residential customers maximize solar generation, minimize grid costs, and maintain comfort.

Mission & Role:
- Serve as a trusted advisor for homeowners with solar panels, EVs, smart thermostats, and connected appliances.
- Interpret the user's context (location, device details, goals) and deliver data-backed recommendations.

Operating Procedure:
1. Clarify context: restate key details (location, time horizon, devices, goals) and identify gaps to ask follow-up questions if needed.
2. Decide which tools to call. Use:
   - get_weather_forecast for solar/temperature planning.
   - get_electricity_prices for rate schedules or cost comparisons.
   - query_energy_usage / query_solar_generation for historical insights.
   - get_recent_energy_summary for quick rollups.
   - search_energy_tips for curated best practices.
   - calculate_energy_savings for quantitative comparisons.
3. Synthesize findings: combine tool outputs with domain knowledge.
4. Present a clear recommendation plan with actionable steps, quantified savings, and timing guidance.
5. Close with next-step suggestions, assumptions, and offer to answer follow-up questions.

Key Capabilities:
- Translate forecasts and pricing into hourly schedules.
- Compare on-peak vs off-peak charging or appliance use.
- Estimate savings and solar utilization percentages.
- Explain trade-offs in plain language.
- Cite which tools were used and why.

Recommendation Guidelines:
- Provide at least two concrete actions with timing or setpoints.
- Quantify expected savings (cost, kWh, or %), noting assumptions.
- Highlight solar usage opportunities and comfort/maintenance tips.
- If data is missing, state the uncertainty and propose how to obtain it.

Example Questions You Can Answer:
- "When should I charge my EV tomorrow to maximize my solar output?"
- "What thermostat schedule minimizes cost during this week's heatwave?"
- "Suggest weekend appliance run times that keep bills low."
- "How can I reduce my energy usage by 10% based on last month?"
- "Summarize solar production trends and maintenance actions for me."


In [None]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [None]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context="Location: San Francisco, CA"
)

In [None]:
print(response['final_response'])


In [None]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.model_dump()
    if obj.get("tool_call_id"):
        print("-", msg.name)

## 2. Define Test Cases

In [None]:
# TODO: Define comprehensive test cases for the Energy Advisor
# Create 10 test cases covering different scenarios:
# - EV charging optimization
# - Thermostat settings
# - Appliance scheduling
# - Solar power maximization
# - Cost savings calculations

In [None]:
test_cases = [
    {
        "id": "ev_charging_peak",
        "question": "When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "Recommend charging window aligned with daytime solar output, include rate comparison and savings estimate.",
    },
    {
        "id": "ev_overnight_grid",
        "question": "Is it cheaper to charge my EV overnight this weekend or during midday solar hours?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast", "calculate_energy_savings"],
        "expected_response": "Compare weekend rates versus solar availability and quantify expected savings for each option.",
    },
    {
        "id": "thermostat_heatwave",
        "question": "How should I adjust my thermostat settings this week to stay comfortable during the heatwave without high bills?",
        "expected_tools": ["get_weather_forecast", "search_energy_tips"],
        "expected_response": "Provide daily setpoints, pre-cooling strategy, and efficiency tips referencing forecasted highs.",
    },
    {
        "id": "thermostat_winter",
        "question": "Give me an energy-efficient heating schedule for the next 3 days while I'm working from home.",
        "expected_tools": ["get_weather_forecast", "query_energy_usage"],
        "expected_response": "Outline hourly temperature plan and reference recent usage to explain savings.",
    },
    {
        "id": "appliance_laundry",
        "question": "When should I run my washer and dryer tomorrow to use the most solar and pay the least?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices", "calculate_energy_savings"],
        "expected_response": "Recommend two scheduling windows, note expected kWh cost and solar contribution.",
    },
    {
        "id": "appliance_dishwasher",
        "question": "Should I run my dishwasher right after dinner or early morning to save more energy?",
        "expected_tools": ["get_electricity_prices", "search_energy_tips"],
        "expected_response": "Compare evening vs morning rates, mention load shifting benefits and tips.",
    },
    {
        "id": "solar_output_drop",
        "question": "My solar production dropped 15% this month—what should I look into?",
        "expected_tools": ["query_solar_generation", "search_energy_tips"],
        "expected_response": "Identify production patterns, suggest maintenance checks and troubleshooting steps.",
    },
    {
        "id": "solar_maximization_weekend",
        "question": "Help me plan weekend activities to maximize usage of my solar power and minimize grid draw.",
        "expected_tools": ["get_weather_forecast", "query_solar_generation", "get_recent_energy_summary"],
        "expected_response": "Provide schedule aligning high-load tasks with peak irradiance, show expected grid savings.",
    },
    {
        "id": "savings_goal",
        "question": "How can I reduce my total energy bill by 15% next month based on my recent usage?",
        "expected_tools": ["query_energy_usage", "calculate_energy_savings", "search_energy_tips"],
        "expected_response": "List prioritized actions with projected savings per action and timeline.",
    },
    {
        "id": "battery_strategy",
        "question": "What battery charging/discharging strategy should I use this week to cover peak evening rates?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices", "query_solar_generation"],
        "expected_response": "Recommend charge/discharge windows, include rate comparison and solar availability.",
    },
    {
        "id": "hvac_vs_fans",
        "question": "Is it cheaper to cool my home with ceiling fans this week instead of running the AC continuously?",
        "expected_tools": ["get_weather_forecast", "calculate_energy_savings", "search_energy_tips"],
        "expected_response": "Quantify energy difference, provide comfort tips, and advise when AC is still needed.",
    },
] 

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")


## 3. Run Agent Tests

In [None]:
CONTEXT = "Location: San Francisco, CA"

In [None]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)
    
    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )
        
        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': response,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat()
        }
        test_results.append(result)
                
    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': f"Error: {str(e)}",
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


In [None]:
test_results

## 4. Evaluate Responses

In [None]:
# TODO: Implement evaluation functions
# Create functions to evaluate:
# - Final Response
# - Tool usage

In [None]:
# TODO: Create a response evaluator
def evaluate_response(question, final_response, expected_response):
    """Evaluate a single response against expected response"""
    pass

In [None]:
# TODO: Create a tool udage evaluator
def evaluate_tool_usage(messages, expected_tools):
    """Evaluate if the right tools were used"""
    pass

In [None]:
# TODO: Generate a comprehensive evaluation report
# Calculate overall scores and metrics
# Identify strengths and weaknesses
# Provide recommendations for improvement
def generate_evaluation_report():
    pass