# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [1]:
from datetime import datetime
import os
import sys
sys.path.append(os.path.abspath('..'))
from agent import Agent


In [2]:
ECOHOME_SYSTEM_PROMPT = """You are EcoHome Energy Advisor, a proactive energy optimization specialist helping residential customers maximize solar generation, minimize grid costs, and maintain comfort.

Mission & Role:
- Serve as a trusted advisor for homeowners with solar panels, EVs, smart thermostats, and connected appliances.
- Interpret the user's context (location, device details, goals) and deliver data-backed recommendations.

Operating Procedure:
1. Clarify context: restate key details (location, time horizon, devices, goals) and identify gaps to ask follow-up questions if needed.
2. Decide which tools to call. Use:
   - get_weather_forecast for solar/temperature planning.
   - get_electricity_prices for rate schedules or cost comparisons.
   - query_energy_usage / query_solar_generation for historical insights.
   - get_recent_energy_summary for quick rollups.
   - search_energy_tips for curated best practices.
   - calculate_energy_savings for quantitative comparisons.
3. Synthesize findings: combine tool outputs with domain knowledge.
4. Present a clear recommendation plan with actionable steps, quantified savings, and timing guidance.
5. Close with next-step suggestions, assumptions, and offer to answer follow-up questions.

Key Capabilities:
- Translate forecasts and pricing into hourly schedules.
- Compare on-peak vs off-peak charging or appliance use.
- Estimate savings and solar utilization percentages.
- Explain trade-offs in plain language.
- Cite which tools were used and why.

Recommendation Guidelines:
- Provide at least two concrete actions with timing or setpoints.
- Quantify expected savings (cost, kWh, or %), noting assumptions.
- Highlight solar usage opportunities and comfort/maintenance tips.
- If data is missing, state the uncertainty and propose how to obtain it.

Example Questions You Can Answer:
- "When should I charge my EV tomorrow to maximize my solar output?"
- "What thermostat schedule minimizes cost during this week's heatwave?"
- "Suggest weekend appliance run times that keep bills low."
- "How can I reduce my energy usage by 10% based on last month?"
- "Summarize solar production trends and maintenance actions for me."
"""


In [3]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [4]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context="Location: San Francisco, CA"
)

In [5]:
print(response['final_response'])


Here's the information for charging your electric vehicle (EV) in San Francisco tomorrow (October 14, 2023) to minimize costs and maximize solar power:

### Weather Forecast for October 14, 2023
- **Condition**: Sunny
- **Solar Generation Potential**: High throughout the day, especially from 10 AM to 4 PM.
- **Sunrise**: 6:18 AM
- **Sunset**: 6:42 PM

### Electricity Pricing (Time of Use)
- **Peak Rate**: 4 PM - 6 PM at $0.185 per kWh
- **Mid-Peak Rate**: 12 PM - 4 PM at rates ranging from $0.165 to $0.184 per kWh
- **Off-Peak Rate**: 12 AM - 8 AM and 8 PM - 12 AM at rates ranging from $0.107 to $0.115 per kWh

### Recommendations for Charging Your EV
1. **Charge During Off-Peak Hours**:
   - **Best Time**: Start charging your EV between **12 AM and 8 AM** when the rate is as low as **$0.107 to $0.115 per kWh**.
   - This will help you avoid the higher costs associated with peak and mid-peak hours.

2. **Maximize Solar Power**:
   - If you prefer to charge during the day when solar gen

In [6]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.model_dump()
    if obj.get("tool_call_id"):
        print("-", msg.name)

TOOLS:
- get_weather_forecast
- get_electricity_prices


## 2. Define Test Cases

In [7]:
# TODO: Define comprehensive test cases for the Energy Advisor
# Create 10 test cases covering different scenarios:
# - EV charging optimization
# - Thermostat settings
# - Appliance scheduling
# - Solar power maximization
# - Cost savings calculations

In [8]:
test_cases = [
    {
        "id": "ev_charging_peak",
        "question": "When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "Recommend charging window aligned with daytime solar output, include rate comparison and savings estimate.",
    },
    {
        "id": "ev_overnight_grid",
        "question": "Is it cheaper to charge my EV overnight this weekend or during midday solar hours?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast", "calculate_energy_savings"],
        "expected_response": "Compare weekend rates versus solar availability and quantify expected savings for each option.",
    },
    {
        "id": "thermostat_heatwave",
        "question": "How should I adjust my thermostat settings this week to stay comfortable during the heatwave without high bills?",
        "expected_tools": ["get_weather_forecast", "search_energy_tips"],
        "expected_response": "Provide daily setpoints, pre-cooling strategy, and efficiency tips referencing forecasted highs.",
    },
    {
        "id": "thermostat_winter",
        "question": "Give me an energy-efficient heating schedule for the next 3 days while I'm working from home.",
        "expected_tools": ["get_weather_forecast", "query_energy_usage"],
        "expected_response": "Outline hourly temperature plan and reference recent usage to explain savings.",
    },
    {
        "id": "appliance_laundry",
        "question": "When should I run my washer and dryer tomorrow to use the most solar and pay the least?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices", "calculate_energy_savings"],
        "expected_response": "Recommend two scheduling windows, note expected kWh cost and solar contribution.",
    },
    {
        "id": "appliance_dishwasher",
        "question": "Should I run my dishwasher right after dinner or early morning to save more energy?",
        "expected_tools": ["get_electricity_prices", "search_energy_tips"],
        "expected_response": "Compare evening vs morning rates, mention load shifting benefits and tips.",
    },
    {
        "id": "solar_output_drop",
        "question": "My solar production dropped 15% this month—what should I look into?",
        "expected_tools": ["query_solar_generation", "search_energy_tips"],
        "expected_response": "Identify production patterns, suggest maintenance checks and troubleshooting steps.",
    },
    {
        "id": "solar_maximization_weekend",
        "question": "Help me plan weekend activities to maximize usage of my solar power and minimize grid draw.",
        "expected_tools": ["get_weather_forecast", "query_solar_generation", "get_recent_energy_summary"],
        "expected_response": "Provide schedule aligning high-load tasks with peak irradiance, show expected grid savings.",
    },
    {
        "id": "savings_goal",
        "question": "How can I reduce my total energy bill by 15% next month based on my recent usage?",
        "expected_tools": ["query_energy_usage", "calculate_energy_savings", "search_energy_tips"],
        "expected_response": "List prioritized actions with projected savings per action and timeline.",
    },
    {
        "id": "battery_strategy",
        "question": "What battery charging/discharging strategy should I use this week to cover peak evening rates?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices", "query_solar_generation"],
        "expected_response": "Recommend charge/discharge windows, include rate comparison and solar availability.",
    },
    {
        "id": "hvac_vs_fans",
        "question": "Is it cheaper to cool my home with ceiling fans this week instead of running the AC continuously?",
        "expected_tools": ["get_weather_forecast", "calculate_energy_savings", "search_energy_tips"],
        "expected_response": "Quantify energy difference, provide comfort tips, and advise when AC is still needed.",
    },
] 

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")


## 3. Run Agent Tests

In [9]:
CONTEXT = "Location: San Francisco, CA"

In [10]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

def summarize_messages(messages):
    """Convert LangChain messages into a serializable log and tool summary."""
    log = []
    requested_tools = []
    executed_tools = []
    for msg in messages:
        entry = {
            'type': getattr(msg, 'type', msg.__class__.__name__.lower()),
            'name': getattr(msg, 'name', None),
            'content': getattr(msg, 'content', None),
        }
        if hasattr(msg, 'tool_calls') and msg.tool_calls:
            entry['tool_calls'] = [call.get('name') for call in msg.tool_calls]
            requested_tools.extend(entry['tool_calls'])
        if getattr(msg, 'tool_call_id', None):
            entry['tool_call_id'] = msg.tool_call_id
        if entry['type'] == 'tool' and entry['name']:
            executed_tools.append(entry['name'])
        log.append(entry)
    return {
        'messages': log,
        'requested_tools': list(dict.fromkeys(requested_tools)),
        'executed_tools': list(dict.fromkeys(executed_tools)),
    }

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)

    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )

        summary = summarize_messages(response['messages'])
        final_response = response['final_response']

        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'final_response': final_response,
            'tool_summary': summary,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
        }
        test_results.append(result)

    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'final_response': f"Error: {str(e)}",
            'tool_summary': {
                'messages': [],
                'requested_tools': [],
                'executed_tools': []
            },
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


=== Running Agent Tests ===

Test 1: ev_charging_peak
Question: When should I charge my electric car tomorrow to minimize cost and maximize solar power?
--------------------------------------------------



Test 2: ev_overnight_grid
Question: Is it cheaper to charge my EV overnight this weekend or during midday solar hours?
--------------------------------------------------



Test 3: thermostat_heatwave
Question: How should I adjust my thermostat settings this week to stay comfortable during the heatwave without high bills?
--------------------------------------------------



Test 4: thermostat_winter
Question: Give me an energy-efficient heating schedule for the next 3 days while I'm working from home.
--------------------------------------------------



Test 5: appliance_laundry
Question: When should I run my washer and dryer tomorrow to use the most solar and pay the least?
--------------------------------------------------



Test 6: appliance_dishwasher
Question: Should I run my dishwasher right after dinner or early morning to save more energy?
--------------------------------------------------



Test 7: solar_output_drop
Question: My solar production dropped 15% this month—what should I look into?
--------------------------------------------------



Test 8: solar_maximization_weekend
Question: Help me plan weekend activities to maximize usage of my solar power and minimize grid draw.
--------------------------------------------------



Test 9: savings_goal
Question: How can I reduce my total energy bill by 15% next month based on my recent usage?
--------------------------------------------------



Test 10: battery_strategy
Question: What battery charging/discharging strategy should I use this week to cover peak evening rates?
--------------------------------------------------



Test 11: hvac_vs_fans
Question: Is it cheaper to cool my home with ceiling fans this week instead of running the AC continuously?
--------------------------------------------------



Completed 11 tests


In [11]:
test_results

[{'test_id': 'ev_charging_peak',
  'question': 'When should I charge my electric car tomorrow to minimize cost and maximize solar power?',
  'final_response': 'To minimize costs and maximize solar power for charging your electric vehicle (EV) tomorrow in San Francisco, here are the key details:\n\n### Weather Forecast for October 7, 2023\n- **Condition**: Sunny\n- **Sunrise**: 6:18 AM\n- **Sunset**: 6:42 PM\n- **Peak Solar Generation**: Expected between 10 AM and 3 PM\n\n### Electricity Pricing (Time of Use)\n- **Peak Rate (9 AM - 4 PM)**: $0.182 per kWh\n- **Mid-Peak Rate (8 AM - 9 AM, 4 PM - 7 PM)**: $0.169 - $0.174 per kWh\n- **Off-Peak Rate (12 AM - 8 AM, 7 PM - 12 AM)**: $0.106 - $0.127 per kWh\n\n### Recommendations for Charging Your EV\n1. **Charge During Off-Peak Hours**:\n   - **Best Time**: Start charging your EV between **12 AM and 8 AM** to take advantage of the off-peak rates (as low as $0.106 per kWh).\n   - **Example**: If you charge from 12 AM to 6 AM, you will benefit 

## 4. Evaluate Responses

In [12]:
# TODO: Implement evaluation functions
# Create functions to evaluate:
# - Final Response
# - Tool usage

In [13]:
def evaluate_response(question, final_response, expected_response):
    """Evaluate response quality using simple heuristic metrics."""
    if not isinstance(final_response, str) or not final_response.strip():
        return {
            'score': 0.0,
            'word_count': 0,
            'actionable_language': False,
            'includes_numeric_detail': False,
            'addresses_solar_context': False,
            'addresses_cost_context': False,
            'notes': 'No valid response returned.'
        }

    text = final_response.lower()
    word_count = len(final_response.split())
    actionable_language = any(keyword in text for keyword in ['recommend', 'should', 'schedule', 'plan', 'suggest'])
    includes_numeric_detail = any(char.isdigit() for char in final_response)

    requires_solar = any(keyword in (question + ' ' + expected_response).lower() for keyword in ['solar', 'pv', 'irradiance'])
    addresses_solar_context = ('solar' in text) if requires_solar else True
    requires_cost = any(keyword in expected_response.lower() for keyword in ['cost', 'bill', 'price', 'savings', 'rate'])
    addresses_cost_context = any(word in text for word in ['cost', 'price', 'bill', 'savings', 'rate']) if requires_cost else True

    metrics = {
        'word_count': word_count,
        'actionable_language': actionable_language,
        'includes_numeric_detail': includes_numeric_detail,
        'addresses_solar_context': addresses_solar_context,
        'addresses_cost_context': addresses_cost_context,
    }
    raw_score = 0
    total_checks = 4
    raw_score += 1 if word_count >= 60 else 0
    raw_score += 1 if actionable_language else 0
    raw_score += 1 if includes_numeric_detail or not requires_cost else 0
    raw_score += 1 if addresses_solar_context else 0
    raw_score += 1 if addresses_cost_context else 0
    total_checks += 1
    score = raw_score / total_checks
    metrics['score'] = round(score, 2)
    return metrics


In [14]:
def evaluate_tool_usage(tool_summary, expected_tools):
    """Check whether the agent used the anticipated tools."""
    executed = tool_summary.get('executed_tools', [])
    requested = tool_summary.get('requested_tools', [])
    coverage = set(executed) | set(requested)
    expected_set = set(expected_tools)
    missing = sorted(expected_set - coverage)
    extra = sorted(set(executed) - expected_set)
    if expected_set:
        score = (len(expected_set) - len(missing)) / len(expected_set)
    else:
        score = 1.0 if executed else 0.0
    return {
        'requested_tools': requested,
        'executed_tools': executed,
        'missing_tools': missing,
        'unexpected_tools': extra,
        'score': round(max(score, 0.0), 2)
    }


In [15]:
def generate_evaluation_report(results):
    """Generate summary metrics and annotate results in place."""
    if not results:
        return {
            'tests_evaluated': 0,
            'average_response_score': 0.0,
            'average_tool_score': 0.0,
            'summary': []
        }

    summary_rows = []
    response_scores = []
    tool_scores = []
    for result in results:
        response_eval = evaluate_response(
            result.get('question', ''),
            result.get('final_response', ''),
            result.get('expected_response', '')
        )
        tool_eval = evaluate_tool_usage(
            result.get('tool_summary', {}),
            result.get('expected_tools', [])
        )
        result['response_eval'] = response_eval
        result['tool_eval'] = tool_eval
        response_scores.append(response_eval['score'])
        tool_scores.append(tool_eval['score'])
        summary_rows.append({
            'test_id': result['test_id'],
            'response_score': response_eval['score'],
            'tool_score': tool_eval['score'],
            'missing_tools': tool_eval['missing_tools'],
        })

    average_response = round(sum(response_scores) / len(response_scores), 2)
    average_tools = round(sum(tool_scores) / len(tool_scores), 2)
    return {
        'tests_evaluated': len(results),
        'average_response_score': average_response,
        'average_tool_score': average_tools,
        'summary': summary_rows
    }


In [16]:
evaluation_report = generate_evaluation_report(test_results)
evaluation_report


{'tests_evaluated': 11,
 'average_response_score': 1.0,
 'average_tool_score': 0.33,
 'summary': [{'test_id': 'ev_charging_peak',
   'response_score': 1.0,
   'tool_score': 1.0,
   'missing_tools': []},
  {'test_id': 'ev_overnight_grid',
   'response_score': 1.0,
   'tool_score': 0.67,
   'missing_tools': ['calculate_energy_savings']},
  {'test_id': 'thermostat_heatwave',
   'response_score': 1.0,
   'tool_score': 0.5,
   'missing_tools': ['search_energy_tips']},
  {'test_id': 'thermostat_winter',
   'response_score': 1.0,
   'tool_score': 0.5,
   'missing_tools': ['query_energy_usage']},
  {'test_id': 'appliance_laundry',
   'response_score': 1.0,
   'tool_score': 0.67,
   'missing_tools': ['calculate_energy_savings']},
  {'test_id': 'appliance_dishwasher',
   'response_score': 1.0,
   'tool_score': 0.0,
   'missing_tools': ['get_electricity_prices', 'search_energy_tips']},
  {'test_id': 'solar_output_drop',
   'response_score': 1.0,
   'tool_score': 0.0,
   'missing_tools': ['query_s