# Example 27: py_agent - Complete AI-Powered Forecasting Pipeline

## Overview

This notebook demonstrates the **py_agent AI-Powered Forecasting Agent**, the flagship feature of py-tidymodels v1.0.0.

py_agent provides **5 progressive phases of intelligence** for automated forecasting:

### Phase 1: Rule-Based Workflow Generation
- Fast, deterministic model selection based on data characteristics
- Pattern recognition (trends, seasonality, autocorrelation)
- Domain-aware preprocessing recommendations

### Phase 2: LLM-Enhanced Reasoning
- Integration with Claude Sonnet 4.5 for explainable AI
- Natural language model selection reasoning
- Budget management for API costs
- Constraint handling (speed, interpretability, accuracy priority)

### Phase 3.3: Multi-Model Comparison
- Automatic generation of diverse model candidates
- Diversity scoring to ensure variety
- Cross-validation evaluation
- Best model selection with confidence metrics

### Phase 3.4: RAG Knowledge Base
- Retrieval-Augmented Generation with 8 foundational examples
- Learn from similar forecasting scenarios
- Example-driven recommendations

### Phase 3.5: Autonomous Iteration
- Try-evaluate-improve loops
- Automatic iteration toward performance targets
- Adaptive strategy selection
- Convergence tracking

---

## Use Cases

**When to use py_agent**:
- ‚úÖ Rapid prototyping of forecasting pipelines
- ‚úÖ Exploring new datasets where optimal model is unknown
- ‚úÖ Automatic model selection based on data characteristics
- ‚úÖ Explainable AI for model recommendations
- ‚úÖ Educational/demonstration purposes
- ‚úÖ Iterative improvement toward performance targets

**When NOT to use py_agent**:
- ‚ùå You know exactly which model and parameters to use
- ‚ùå Production with strict latency requirements (<1 second)
- ‚ùå API costs are prohibitive (LLM mode)
- ‚ùå Full control over every preprocessing step is critical

---

## Prerequisites

- Basic understanding of time series forecasting
- Anthropic API key for LLM features (Phase 2 only - optional)
- Familiarity with py-tidymodels workflow concepts

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# py_agent imports
from py_agent import ForecastAgent

# Core py-tidymodels
from py_parsnip import linear_reg, prophet_reg, arima_reg, rand_forest
from py_workflows import Workflow
from py_rsample import initial_time_split, time_series_cv
from py_yardstick import metric_set, rmse, mae, r_squared, mape

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print("‚úì Imports successful")

## Load Real-World Data

We'll use European gas demand data with weather features:
- **96,433 rows** (daily data)
- **10 countries** (2013-2023)
- **Features**: temperature, wind_speed, gas_demand
- **Domain**: Energy forecasting with strong seasonality

In [None]:
# Load data
raw_data = pd.read_csv('../_md/__data/european_gas_demand_weather_data.csv')
raw_data['date'] = pd.to_datetime(raw_data['date'])

print(f"Total dataset: {len(raw_data):,} rows")
print(f"Countries: {raw_data['country'].nunique()}")
print(f"Date range: {raw_data['date'].min()} to {raw_data['date'].max()}")
print(f"\nColumns: {list(raw_data.columns)}")

# Show country distribution
print("\nRows per country:")
print(raw_data['country'].value_counts().head(10))

### Focus on Germany

For this demo, we'll focus on Germany's gas demand data.

In [None]:
# Focus on Germany
germany_data = raw_data[raw_data['country'] == 'Germany'].copy()
germany_data = germany_data.sort_values('date').reset_index(drop=True)

print(f"Germany data: {len(germany_data):,} rows")
print(f"Date range: {germany_data['date'].min()} to {germany_data['date'].max()}")
print(f"\nFirst few rows:")
print(germany_data.head())

print(f"\nData statistics:")
print(germany_data[['temperature', 'wind_speed', 'gas_demand']].describe())

### Data Exploration

In [None]:
# Visualize time series
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

# Gas demand
axes[0].plot(germany_data['date'], germany_data['gas_demand'], linewidth=0.5, alpha=0.7)
axes[0].set_title('Germany Gas Demand (Daily)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Gas Demand')
axes[0].grid(True, alpha=0.3)

# Temperature
axes[1].plot(germany_data['date'], germany_data['temperature'], linewidth=0.5, alpha=0.7, color='orange')
axes[1].set_title('Temperature (¬∞C)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Temperature')
axes[1].grid(True, alpha=0.3)

# Wind speed
axes[2].plot(germany_data['date'], germany_data['wind_speed'], linewidth=0.5, alpha=0.7, color='green')
axes[2].set_title('Wind Speed', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Wind Speed')
axes[2].set_xlabel('Date')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Key observations:")
print("- Strong yearly seasonality (winter heating demand)")
print("- Inverse relationship with temperature (cold ‚Üí high demand)")
print("- Potential weekly patterns (business vs weekend)")
print("- Multiple exogenous variables available")

## Train/Test Split

Split data chronologically: 80% train, 20% test

In [None]:
# Time-based split
split = initial_time_split(germany_data, prop=0.8)
train_data = split.training()
test_data = split.testing()

print(f"Train: {len(train_data):,} rows ({train_data['date'].min()} to {train_data['date'].max()})")
print(f"Test:  {len(test_data):,} rows ({test_data['date'].min()} to {test_data['date'].max()})")
print(f"\nSplit ratio: {len(train_data) / len(germany_data):.1%} train, {len(test_data) / len(germany_data):.1%} test")

---

# Phase 1: Rule-Based Workflow Generation

The most basic py_agent mode uses **rule-based heuristics** to select models based on data characteristics.

**Advantages**:
- ‚ö° Very fast (no API calls)
- üí∞ Zero cost
- üîí Works offline
- üéØ Deterministic and reproducible

**How it works**:
1. Analyze data characteristics (trend, seasonality, autocorrelation)
2. Match patterns to model capabilities
3. Generate appropriate workflow
4. Return ready-to-fit model

In [None]:
# Initialize Phase 1 agent (rule-based)
agent_phase1 = ForecastAgent(verbose=True)

print("ü§ñ Phase 1: Rule-Based Agent initialized")
print("Mode: Rule-based heuristics")
print("Cost: $0")

In [None]:
# Generate workflow with natural language request
workflow_phase1 = agent_phase1.generate_workflow(
    data=train_data,
    request="Forecast daily gas demand considering temperature effects and yearly seasonality for energy domain"
)

print("\n‚úÖ Phase 1 Workflow Generated")
print(f"Model selected: {workflow_phase1.spec.model_type}")
print(f"Engine: {workflow_phase1.spec.engine}")
# Formula will be available after fitting


In [None]:
# Fit and evaluate Phase 1
print("Training Phase 1 model...")
fit_phase1 = workflow_phase1.fit(train_data)

# Evaluate on test data
eval_phase1 = fit_phase1.evaluate(test_data)
outputs1, coeffs1, stats1 = eval_phase1.extract_outputs()

print("\nüìä Phase 1 Performance:")
print(f"Test RMSE: {stats1[stats1['split']=='test']['rmse'].iloc[0]:.2f}")
print(f"Test MAE:  {stats1[stats1['split']=='test']['mae'].iloc[0]:.2f}")
print(f"Test R¬≤:   {stats1[stats1['split']=='test']['r_squared'].iloc[0]:.4f}")
print(f"Test MAPE: {stats1[stats1['split']=='test']['mape'].iloc[0]:.2f}%")

---

# Phase 2: LLM-Enhanced Reasoning

Phase 2 adds **Claude Sonnet 4.5 intelligence** for explainable model selection.

**Advantages**:
- üß† Explainable reasoning for model choices
- üéØ Constraint handling (speed, interpretability, accuracy)
- üìä Domain-aware recommendations
- üí¨ Natural language understanding

**Requirements**:
- Anthropic API key (set in environment: `ANTHROPIC_API_KEY`)
- Budget management (cost per request: ~$0.001-0.01)

**Note**: This section requires an API key. If not available, Phase 2 will be skipped.

In [None]:
import os

# Check if API key is available
has_api_key = 'ANTHROPIC_API_KEY' in os.environ

if has_api_key:
    print("‚úì Anthropic API key found - Phase 2 will run")
    
    # Initialize Phase 2 agent (LLM-enhanced)
    agent_phase2 = ForecastAgent(
        use_llm=True,
        model="claude-sonnet-4.5",
        budget_per_day=10.0,  # $10/day budget
        verbose=True
    )
    
    print("\nü§ñ Phase 2: LLM-Enhanced Agent initialized")
    print("Mode: Claude Sonnet 4.5")
    print("Budget: $10/day")
    
else:
    print("‚ö†Ô∏è No API key found - Phase 2 will be skipped")
    print("To enable Phase 2: export ANTHROPIC_API_KEY='your-key-here'")
    agent_phase2 = None

In [None]:
if agent_phase2:
    # Generate workflow with constraints
    workflow_phase2 = agent_phase2.generate_workflow(
        data=train_data,
        request="Forecast daily gas demand with high accuracy for energy domain",
        constraints={
            'priority': 'accuracy',  # vs 'speed' or 'interpretability'
            'domain': 'energy',
            'interpretability': 'medium',
            'special_requirements': [
                'Strong yearly seasonality (winter heating)',
                'Inverse temperature relationship',
                'Handle extreme weather events'
            ]
        }
    )
    
    print("\n‚úÖ Phase 2 Workflow Generated")
    print(f"Model selected: {workflow_phase2.spec.model_type}")
    # Formula will be available after fitting
    
    # Show LLM reasoning
    if agent_phase2.last_workflow_info:
        reasoning = agent_phase2.last_workflow_info.get('model_selection_reasoning', 'N/A')
        print(f"\nüß† LLM Reasoning:\n{reasoning[:500]}...")  # First 500 chars

else:
    print("‚è≠Ô∏è Phase 2 skipped (no API key)")
    workflow_phase2 = None

In [None]:
if workflow_phase2:
    # Fit and evaluate Phase 2
    print("Training Phase 2 model...")
    fit_phase2 = workflow_phase2.fit(train_data)
    
    eval_phase2 = fit_phase2.evaluate(test_data)
    outputs2, coeffs2, stats2 = eval_phase2.extract_outputs()
    
    print("\nüìä Phase 2 Performance:")
    print(f"Test RMSE: {stats2[stats2['split']=='test']['rmse'].iloc[0]:.2f}")
    print(f"Test MAE:  {stats2[stats2['split']=='test']['mae'].iloc[0]:.2f}")
    print(f"Test R¬≤:   {stats2[stats2['split']=='test']['r_squared'].iloc[0]:.4f}")
    print(f"Test MAPE: {stats2[stats2['split']=='test']['mape'].iloc[0]:.2f}%")
    
    # Compare Phase 1 vs Phase 2
    rmse1 = stats1[stats1['split']=='test']['rmse'].iloc[0]
    rmse2 = stats2[stats2['split']=='test']['rmse'].iloc[0]
    improvement = (rmse1 - rmse2) / rmse1 * 100
    
    print(f"\nüìà Phase 2 vs Phase 1:")
    print(f"RMSE improvement: {improvement:+.2f}%")
else:
    print("‚è≠Ô∏è Phase 2 evaluation skipped")

---

# Phase 3.3: Multi-Model Comparison

Phase 3.3 automatically generates and compares **multiple diverse models**.

**Advantages**:
- üé≤ Diversity scoring ensures variety (not just variants of same model)
- üî¨ Cross-validation for robust evaluation
- üèÜ Best model selection with confidence
- üìä Comparison metrics across all candidates

**How it works**:
1. Generate N diverse model candidates
2. Evaluate each with cross-validation
3. Rank by performance metric
4. Return best model

In [None]:
# Phase 3.3: Multi-model comparison
print("üî¨ Phase 3.3: Multi-Model Comparison")
print("Generating and evaluating diverse models...\n")

# Use rule-based agent for demo (works without API key)
agent_phase3 = ForecastAgent(verbose=True)

# Compare 5 diverse models
results_phase3 = agent_phase3.compare_models(
    data=train_data,
    request="Forecast daily gas demand with temperature and seasonality",
    n_models=5,
    cv_strategy='time_series',
    metric='rmse'
)

print("\n‚úÖ Phase 3.3 Complete")
print(f"Models evaluated: {len(results_phase3['models'])}")

In [None]:
# Show comparison results
comparison_df = pd.DataFrame({
    'Model': [m['model_type'] for m in results_phase3['models']],
    'CV_RMSE': [m['cv_rmse'] for m in results_phase3['models']],
    'CV_MAE': [m['cv_mae'] for m in results_phase3['models']],
    'Diversity_Score': [m['diversity_score'] for m in results_phase3['models']]
}).sort_values('CV_RMSE')

print("\nüìä Model Comparison (sorted by CV RMSE):")
print(comparison_df.to_string(index=False))

# Best model
best_model_info = results_phase3['models'][0]  # Already sorted
print(f"\nüèÜ Best Model: {best_model_info['model_type']}")
print(f"   CV RMSE: {best_model_info['cv_rmse']:.2f}")
print(f"   Diversity: {best_model_info['diversity_score']:.3f}")

In [None]:
# Fit best model on full training data and evaluate
best_workflow_phase3 = results_phase3['best_workflow']
fit_phase3 = best_workflow_phase3.fit(train_data)

eval_phase3 = fit_phase3.evaluate(test_data)
outputs3, coeffs3, stats3 = eval_phase3.extract_outputs()

print("\nüìä Phase 3.3 Best Model - Test Performance:")
print(f"Test RMSE: {stats3[stats3['split']=='test']['rmse'].iloc[0]:.2f}")
print(f"Test MAE:  {stats3[stats3['split']=='test']['mae'].iloc[0]:.2f}")
print(f"Test R¬≤:   {stats3[stats3['split']=='test']['r_squared'].iloc[0]:.4f}")
print(f"Test MAPE: {stats3[stats3['split']=='test']['mape'].iloc[0]:.2f}%")

---

# Phase 3.4: RAG Knowledge Base

Phase 3.4 uses **Retrieval-Augmented Generation (RAG)** with 8 foundational forecasting examples.

**Advantages**:
- üìö Learn from similar forecasting scenarios
- üéØ Example-driven recommendations
- üîç Context-aware model selection
- üí° Best practices from past successes

**Knowledge Base Examples**:
1. Retail sales forecasting
2. Energy demand prediction
3. Financial time series
4. Multi-step ahead forecasting
5. Grouped/panel data
6. Exogenous variables
7. Multiple seasonality
8. Irregular patterns

In [None]:
# Initialize Phase 3.4 agent with RAG
agent_phase3_4 = ForecastAgent(use_rag=True, verbose=True)

print("ü§ñ Phase 3.4: RAG-Enhanced Agent initialized")
print("Knowledge base: 8 foundational forecasting examples")
print("Mode: Example-driven recommendations")

In [None]:
# Generate workflow with RAG
workflow_phase3_4 = agent_phase3_4.generate_workflow(
    data=train_data,
    request="Forecast daily gas demand with temperature effects and yearly seasonality"
)

print("\n‚úÖ Phase 3.4 Workflow Generated (RAG-enhanced)")
print(f"Model selected: {workflow_phase3_4.spec.model_type}")
# Formula will be available after fitting

# Show which examples were retrieved
if agent_phase3_4.last_workflow_info:
    retrieved = agent_phase3_4.last_workflow_info.get('retrieved_examples', [])
    if retrieved:
        print(f"\nüìö Retrieved {len(retrieved)} relevant examples:")
        for i, ex in enumerate(retrieved[:3], 1):  # Show top 3
            print(f"  {i}. {ex.get('name', 'Example')} (similarity: {ex.get('similarity', 0):.3f})")

In [None]:
# Fit and evaluate Phase 3.4
fit_phase3_4 = workflow_phase3_4.fit(train_data)
eval_phase3_4 = fit_phase3_4.evaluate(test_data)
outputs3_4, coeffs3_4, stats3_4 = eval_phase3_4.extract_outputs()

print("üìä Phase 3.4 Performance (RAG-enhanced):")
print(f"Test RMSE: {stats3_4[stats3_4['split']=='test']['rmse'].iloc[0]:.2f}")
print(f"Test MAE:  {stats3_4[stats3_4['split']=='test']['mae'].iloc[0]:.2f}")
print(f"Test R¬≤:   {stats3_4[stats3_4['split']=='test']['r_squared'].iloc[0]:.4f}")
print(f"Test MAPE: {stats3_4[stats3_4['split']=='test']['mape'].iloc[0]:.2f}%")

---

# Phase 3.5: Autonomous Iteration

Phase 3.5 provides **autonomous try-evaluate-improve loops** toward performance targets.

**Advantages**:
- üîÅ Automatic iteration toward RMSE/MAE targets
- üéØ Adaptive strategy selection
- üìà Convergence tracking
- üõë Early stopping when target achieved

**How it works**:
1. Try initial model
2. Evaluate performance
3. If target not met, try alternative approach
4. Repeat until target achieved or max iterations
5. Return best model found

In [None]:
# Phase 3.5: Autonomous iteration
print("üîÅ Phase 3.5: Autonomous Iteration")
print("Goal: Achieve test RMSE < 15000\n")

agent_phase3_5 = ForecastAgent(verbose=True)

# Iterate toward performance target
best_workflow_iter, iteration_history = agent_phase3_5.iterate(
    data=train_data,
    request="Forecast daily gas demand with temperature and seasonality",
    target_metric='rmse',
    target_value=15000.0,
    max_iterations=5
)

print("\n‚úÖ Phase 3.5 Complete")
print(f"Iterations completed: {len(iteration_history)}")

In [None]:
# Show iteration history
iter_df = pd.DataFrame([
    {
        'Iteration': i + 1,
        'Model': h['model_type'],
        'RMSE': h['rmse'],
        'Target_Met': '‚úì' if h['rmse'] < 15000 else '‚úó'
    }
    for i, h in enumerate(iteration_history)
])

print("\nüìä Iteration History:")
print(iter_df.to_string(index=False))

# Best iteration
best_iter = min(iteration_history, key=lambda x: x['rmse'])
print(f"\nüèÜ Best Result: {best_iter['model_type']}")
print(f"   RMSE: {best_iter['rmse']:.2f}")
print(f"   Target (< 15000): {'‚úì ACHIEVED' if best_iter['rmse'] < 15000 else '‚úó Not achieved'}")

In [None]:
# Evaluate best model from iteration
fit_phase3_5 = best_workflow_iter.fit(train_data)
eval_phase3_5 = fit_phase3_5.evaluate(test_data)
outputs3_5, coeffs3_5, stats3_5 = eval_phase3_5.extract_outputs()

print("üìä Phase 3.5 Best Model - Test Performance:")
print(f"Test RMSE: {stats3_5[stats3_5['split']=='test']['rmse'].iloc[0]:.2f}")
print(f"Test MAE:  {stats3_5[stats3_5['split']=='test']['mae'].iloc[0]:.2f}")
print(f"Test R¬≤:   {stats3_5[stats3_5['split']=='test']['r_squared'].iloc[0]:.4f}")
print(f"Test MAPE: {stats3_5[stats3_5['split']=='test']['mape'].iloc[0]:.2f}%")

---

# Final Comparison: All Phases

Let's compare all phases side-by-side to see the progression of intelligence.

In [None]:
# Compile results
comparison_all = pd.DataFrame([
    {
        'Phase': 'Phase 1 (Rule-Based)',
        'Model': workflow_phase1.spec.model_type,
        'Test_RMSE': stats1[stats1['split']=='test']['rmse'].iloc[0],
        'Test_MAE': stats1[stats1['split']=='test']['mae'].iloc[0],
        'Test_R¬≤': stats1[stats1['split']=='test']['r_squared'].iloc[0],
        'Test_MAPE': stats1[stats1['split']=='test']['mape'].iloc[0]
    },
    {
        'Phase': 'Phase 3.3 (Multi-Model)',
        'Model': best_workflow_phase3.spec.model_type,
        'Test_RMSE': stats3[stats3['split']=='test']['rmse'].iloc[0],
        'Test_MAE': stats3[stats3['split']=='test']['mae'].iloc[0],
        'Test_R¬≤': stats3[stats3['split']=='test']['r_squared'].iloc[0],
        'Test_MAPE': stats3[stats3['split']=='test']['mape'].iloc[0]
    },
    {
        'Phase': 'Phase 3.4 (RAG)',
        'Model': workflow_phase3_4.spec.model_type,
        'Test_RMSE': stats3_4[stats3_4['split']=='test']['rmse'].iloc[0],
        'Test_MAE': stats3_4[stats3_4['split']=='test']['mae'].iloc[0],
        'Test_R¬≤': stats3_4[stats3_4['split']=='test']['r_squared'].iloc[0],
        'Test_MAPE': stats3_4[stats3_4['split']=='test']['mape'].iloc[0]
    },
    {
        'Phase': 'Phase 3.5 (Iteration)',
        'Model': best_workflow_iter.spec.model_type,
        'Test_RMSE': stats3_5[stats3_5['split']=='test']['rmse'].iloc[0],
        'Test_MAE': stats3_5[stats3_5['split']=='test']['mae'].iloc[0],
        'Test_R¬≤': stats3_5[stats3_5['split']=='test']['r_squared'].iloc[0],
        'Test_MAPE': stats3_5[stats3_5['split']=='test']['mape'].iloc[0]
    }
])

# Add Phase 2 if available
if workflow_phase2:
    phase2_row = pd.DataFrame([{
        'Phase': 'Phase 2 (LLM)',
        'Model': workflow_phase2.spec.model_type,
        'Test_RMSE': stats2[stats2['split']=='test']['rmse'].iloc[0],
        'Test_MAE': stats2[stats2['split']=='test']['mae'].iloc[0],
        'Test_R¬≤': stats2[stats2['split']=='test']['r_squared'].iloc[0],
        'Test_MAPE': stats2[stats2['split']=='test']['mape'].iloc[0]
    }])
    comparison_all = pd.concat([comparison_all.iloc[:1], phase2_row, comparison_all.iloc[1:]], ignore_index=True)

# Sort by RMSE
comparison_all = comparison_all.sort_values('Test_RMSE')

print("\n" + "="*80)
print("üìä FINAL COMPARISON: All py_agent Phases")
print("="*80 + "\n")
print(comparison_all.to_string(index=False))

# Best phase
best_phase = comparison_all.iloc[0]
print(f"\nüèÜ BEST PHASE: {best_phase['Phase']}")
print(f"   Model: {best_phase['Model']}")
print(f"   Test RMSE: {best_phase['Test_RMSE']:.2f}")
print(f"   Test R¬≤: {best_phase['Test_R¬≤']:.4f}")

# Improvement from Phase 1 to best
phase1_rmse = comparison_all[comparison_all['Phase'] == 'Phase 1 (Rule-Based)']['Test_RMSE'].iloc[0]
best_rmse = best_phase['Test_RMSE']
improvement = (phase1_rmse - best_rmse) / phase1_rmse * 100

print(f"\nüìà Improvement from Phase 1 to Best: {improvement:+.2f}%")

## Visualize Phase Comparison

In [None]:
# Create comparison visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# RMSE comparison
axes[0].barh(comparison_all['Phase'], comparison_all['Test_RMSE'], color='steelblue', alpha=0.7)
axes[0].set_xlabel('Test RMSE (lower is better)', fontsize=11)
axes[0].set_title('Test RMSE by Phase', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='x')
axes[0].invert_yaxis()

# R¬≤ comparison
axes[1].barh(comparison_all['Phase'], comparison_all['Test_R¬≤'], color='coral', alpha=0.7)
axes[1].set_xlabel('Test R¬≤ (higher is better)', fontsize=11)
axes[1].set_title('Test R¬≤ by Phase', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='x')
axes[1].invert_yaxis()

plt.tight_layout()
plt.show()

---

# Key Takeaways

## Phase Summary

| Phase | Strength | Use Case | Cost |
|-------|----------|----------|------|
| **Phase 1** | ‚ö° Fast, free, deterministic | Quick prototypes, offline work | $0 |
| **Phase 2** | üß† Explainable, constraint-aware | Need reasoning, domain-specific | ~$0.01/request |
| **Phase 3.3** | üèÜ Best model selection | When accuracy is critical | $0 (rule-based) |
| **Phase 3.4** | üìö Example-driven | Learn from similar scenarios | $0 |
| **Phase 3.5** | üéØ Target-seeking | Achieve specific performance | $0 (rule-based) |

## When to Use Each Phase

**Phase 1 (Rule-Based)**:
- ‚úÖ Rapid prototyping
- ‚úÖ Offline work
- ‚úÖ Cost-sensitive applications
- ‚úÖ Simple, straightforward data patterns

**Phase 2 (LLM-Enhanced)**:
- ‚úÖ Need explainable model selection
- ‚úÖ Complex domain requirements
- ‚úÖ Constraint handling (speed/interpretability/accuracy tradeoffs)
- ‚úÖ Natural language interaction

**Phase 3.3 (Multi-Model)**:
- ‚úÖ Accuracy is paramount
- ‚úÖ Unsure which model type is best
- ‚úÖ Need model diversity
- ‚úÖ Have compute budget for CV

**Phase 3.4 (RAG)**:
- ‚úÖ Similar forecasting problems solved before
- ‚úÖ Learn from domain examples
- ‚úÖ Best practices important
- ‚úÖ Educational/learning context

**Phase 3.5 (Iteration)**:
- ‚úÖ Specific performance target (e.g., RMSE < 100)
- ‚úÖ Willing to try multiple approaches
- ‚úÖ Convergence toward goal important
- ‚úÖ Adaptive improvement needed

## Best Practices

1. **Start with Phase 1** for quick baseline
2. **Use Phase 3.3** when accuracy matters most
3. **Enable RAG** for example-driven learning
4. **Use Phase 3.5** for target-seeking
5. **Add Phase 2 LLM** only when explainability is critical (costs apply)

## Common Pitfalls

- ‚ùå Using LLM mode without budget management
- ‚ùå Expecting Phase 1 to find globally optimal model
- ‚ùå Not validating agent output before production deployment
- ‚ùå Relying solely on automation without domain expertise
- ‚ùå Using iteration without reasonable target values

## Production Considerations

**For Production Use**:
- Validate agent-generated workflows on holdout data
- Set reasonable iteration limits (max_iterations)
- Monitor API costs if using LLM mode
- Implement fallback to Phase 1 if API unavailable
- Cache workflows for repeated similar requests
- Review generated formulas for domain appropriateness

---

# References

- **py_agent README**: `py_agent/README.md`
- **Tutorial Notebook 22**: Complete Agent Overview
- **Tutorial Notebook 23**: LLM-Enhanced Mode
- **Tutorial Notebook 24**: Domain-Specific Examples
- **Tutorial Notebook 25**: Advanced Features & Production
- **CLAUDE.md**: Complete architecture documentation