# py_agent: LLM-Enhanced Mode Tutorial (Phase 2)

This notebook demonstrates **Phase 2: LLM Integration** with Claude Sonnet 4.5 for advanced reasoning and model selection.

## What You'll Learn

1. **Basic LLM Mode**: Enable Claude Sonnet 4.5 for intelligent model selection
2. **Advanced Reasoning**: See LLM reasoning for model and preprocessing choices
3. **Budget Management**: Control API costs with daily budget limits
4. **Dual-Mode Comparison**: Compare rule-based ($0) vs LLM-enhanced ($4-10) approaches
5. **Complex Constraints**: Use natural language constraints for specialized requirements

## Phase 2 Overview

**Objective**: Enhance workflow generation with Claude Sonnet 4.5 for advanced reasoning

**Benefits**:
- Better model selection for complex patterns
- Natural language constraint handling
- Explainable recommendations with reasoning
- Domain-specific insights

**Cost**: $4-10 per workflow (vs $0 for rule-based)

**When to Use LLM Mode**:
- Complex forecasting scenarios
- Need explainability and reasoning
- Domain-specific requirements
- Willing to pay for better accuracy

**When to Use Rule-Based Mode**:
- Simple forecasting tasks
- Cost-sensitive applications
- Fast prototyping
- Batch processing many datasets

## Setup

**Important**: You need an Anthropic API key to run LLM mode. Set the environment variable:

```bash
export ANTHROPIC_API_KEY="your-api-key-here"
```

Or set it in this notebook (not recommended for production).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from datetime import datetime, timedelta

# Import py_agent
from py_agent import ForecastAgent

# Check for API key
if 'ANTHROPIC_API_KEY' not in os.environ:
    print("⚠️  WARNING: ANTHROPIC_API_KEY not set. LLM mode will not work.")
    print("   Set it with: os.environ['ANTHROPIC_API_KEY'] = 'your-key-here'")
else:
    print("✓ ANTHROPIC_API_KEY found")

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Setup complete!")

## Generate Sample Data: E-Commerce Sales

We'll create a complex e-commerce sales dataset with:
- **Multiple seasonality**: Weekly + yearly patterns
- **External factors**: Marketing spend, competitor pricing, weather
- **Special events**: Holidays, promotions, product launches
- **Non-linear relationships**: Interaction effects

This complexity will showcase LLM's reasoning abilities.

In [None]:
# Generate 2 years of daily data
np.random.seed(42)
n_days = 730
dates = pd.date_range('2022-01-01', periods=n_days, freq='D')

# Base sales with trend
trend = np.linspace(100, 200, n_days)

# Weekly seasonality (weekend spike)
day_of_week = dates.dayofweek
weekly_season = np.where(day_of_week >= 5, 30, -10)  # Weekend boost

# Yearly seasonality (holiday season)
day_of_year = dates.dayofyear
yearly_season = 50 * np.sin(2 * np.pi * (day_of_year - 335) / 365)  # Peak around Christmas

# Marketing spend (varies by month)
marketing_spend = 5 + 3 * np.sin(2 * np.pi * day_of_year / 365) + np.random.normal(0, 0.5, n_days)
marketing_spend = np.maximum(marketing_spend, 0)

# Competitor pricing (affects our sales negatively)
competitor_price = 50 + 10 * np.sin(2 * np.pi * day_of_year / 180) + np.random.normal(0, 2, n_days)

# Weather (temperature affects sales)
temperature = 60 + 20 * np.sin(2 * np.pi * (day_of_year - 80) / 365) + np.random.normal(0, 5, n_days)

# Special events (holidays, product launches)
is_holiday = np.isin(day_of_year, [1, 150, 185, 244, 316, 359])  # Major holidays
holiday_boost = np.where(is_holiday, 80, 0)

# Product launch (day 400)
product_launch = np.where(np.arange(n_days) >= 400, 40, 0)

# Sales with non-linear interaction: marketing × temperature
marketing_temp_interaction = 0.3 * marketing_spend * (temperature - 60) / 10

# Competitor price effect (non-linear)
competitor_effect = -0.5 * (competitor_price - 50)

# Combine all components
sales = (trend + 
         weekly_season + 
         yearly_season + 
         5 * marketing_spend +
         competitor_effect +
         0.2 * temperature +
         marketing_temp_interaction +
         holiday_boost +
         product_launch +
         np.random.normal(0, 15, n_days))  # Noise

sales = np.maximum(sales, 0)  # No negative sales

# Create DataFrame
data = pd.DataFrame({
    'date': dates,
    'sales': sales,
    'marketing_spend': marketing_spend,
    'competitor_price': competitor_price,
    'temperature': temperature,
    'is_holiday': is_holiday.astype(int)
})

print(f"Generated {len(data)} days of e-commerce sales data")
print(f"Date range: {data['date'].min()} to {data['date'].max()}")
print(f"\nSample data:")
print(data.head(10))

# Plot the data
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

# Sales over time
axes[0].plot(data['date'], data['sales'], label='Sales', alpha=0.7)
axes[0].set_title('E-Commerce Daily Sales (2 Years)', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Sales ($1000s)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# External factors
ax2_1 = axes[1]
ax2_2 = ax2_1.twinx()
ax2_1.plot(data['date'], data['marketing_spend'], color='green', label='Marketing Spend', alpha=0.7)
ax2_2.plot(data['date'], data['competitor_price'], color='red', label='Competitor Price', alpha=0.7)
ax2_1.set_title('External Factors', fontsize=12, fontweight='bold')
ax2_1.set_ylabel('Marketing Spend ($1000s)', color='green')
ax2_2.set_ylabel('Competitor Price ($)', color='red')
ax2_1.legend(loc='upper left')
ax2_2.legend(loc='upper right')
ax2_1.grid(True, alpha=0.3)

# Temperature
axes[2].plot(data['date'], data['temperature'], color='orange', label='Temperature', alpha=0.7)
axes[2].scatter(data[data['is_holiday'] == 1]['date'], 
                data[data['is_holiday'] == 1]['temperature'], 
                color='red', s=100, label='Holidays', zorder=5)
axes[2].set_title('Temperature & Holidays', fontsize=12, fontweight='bold')
axes[2].set_xlabel('Date')
axes[2].set_ylabel('Temperature (°F)')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nData characteristics:")
print(f"  - Mean sales: ${data['sales'].mean():.2f}k")
print(f"  - Std sales: ${data['sales'].std():.2f}k")
print(f"  - Min sales: ${data['sales'].min():.2f}k")
print(f"  - Max sales: ${data['sales'].max():.2f}k")
print(f"  - Holidays: {data['is_holiday'].sum()} days")

## Split Data: Train/Test

Train on first 18 months, test on last 6 months.

In [None]:
# Split: 18 months train, 6 months test
train_days = 18 * 30  # Approx 540 days
train_data = data.iloc[:train_days].copy()
test_data = data.iloc[train_days:].copy()

print(f"Train data: {len(train_data)} days ({train_data['date'].min()} to {train_data['date'].max()})")
print(f"Test data: {len(test_data)} days ({test_data['date'].min()} to {test_data['date'].max()})")

## Example 1: Rule-Based Mode (Baseline)

First, let's try the **rule-based mode** (Phase 1) as a baseline.

**Cost**: $0  
**Speed**: <1 second

In [None]:
# Initialize agent in rule-based mode
agent_rule = ForecastAgent(verbose=True, use_llm=False)

print("\n" + "="*80)
print("RULE-BASED MODE (Phase 1)")
print("="*80)

# Generate workflow
workflow_rule = agent_rule.generate_workflow(
    data=train_data,
    request="Forecast daily e-commerce sales with weekly and yearly seasonality, accounting for marketing, competitor pricing, and temperature effects"
)

print(f"\n✓ Workflow generated (rule-based)")
print(f"  Model: {workflow_rule.extract_spec_parsnip().model_type}")
print(f"  Cost: $0.00")

In [None]:
# Fit and evaluate
fit_rule = workflow_rule.fit(train_data)
eval_rule = fit_rule.evaluate(test_data)

# Extract metrics
outputs_rule, coeffs_rule, stats_rule = eval_rule.extract_outputs()

test_stats_rule = stats_rule[stats_rule['split'] == 'test']
rmse_rule = test_stats_rule['rmse'].iloc[0]
mae_rule = test_stats_rule['mae'].iloc[0]
r2_rule = test_stats_rule['r_squared'].iloc[0]

print(f"\nRule-Based Performance:")
print(f"  RMSE: {rmse_rule:.2f}")
print(f"  MAE: {mae_rule:.2f}")
print(f"  R²: {r2_rule:.4f}")

## Example 2: LLM-Enhanced Mode (Phase 2)

Now let's use **LLM mode** with Claude Sonnet 4.5 for intelligent reasoning.

**Cost**: $4-10 per workflow  
**Speed**: 10-30 seconds  
**Benefits**: Better model selection, explainable reasoning

In [None]:
# Initialize agent in LLM mode
agent_llm = ForecastAgent(
    verbose=True,
    use_llm=True,  # Enable LLM mode
    model="claude-sonnet-4.5",  # Specify model
    budget_per_day=100.0  # Set daily budget ($100/day default)
)

print("\n" + "="*80)
print("LLM-ENHANCED MODE (Phase 2)")
print("="*80)

# Generate workflow with same request
workflow_llm = agent_llm.generate_workflow(
    data=train_data,
    request="Forecast daily e-commerce sales with weekly and yearly seasonality, accounting for marketing, competitor pricing, and temperature effects"
)

print(f"\n✓ Workflow generated (LLM-enhanced)")
print(f"  Model: {workflow_llm.extract_spec_parsnip().model_type}")
print(f"  Cost: ${agent_llm.llm_client.total_cost:.4f}")

### View LLM Reasoning

One of the key benefits of LLM mode is **explainability** - you can see WHY the agent chose a specific model and preprocessing strategy.

In [None]:
# Access LLM reasoning from last workflow generation
if hasattr(agent_llm, 'last_workflow_info'):
    info = agent_llm.last_workflow_info
    
    print("\n" + "="*80)
    print("LLM REASONING")
    print("="*80)
    
    if 'model_selection_reasoning' in info:
        print("\nModel Selection Reasoning:")
        print(info['model_selection_reasoning'])
    
    if 'feature_engineering_reasoning' in info:
        print("\nFeature Engineering Reasoning:")
        print(info['feature_engineering_reasoning'])
    
    if 'data_analysis_insights' in info:
        print("\nData Analysis Insights:")
        print(info['data_analysis_insights'])
else:
    print("\n⚠️  LLM reasoning not available (LLM mode may not be enabled or API key missing)")

In [None]:
# Fit and evaluate LLM workflow
fit_llm = workflow_llm.fit(train_data)
eval_llm = fit_llm.evaluate(test_data)

# Extract metrics
outputs_llm, coeffs_llm, stats_llm = eval_llm.extract_outputs()

test_stats_llm = stats_llm[stats_llm['split'] == 'test']
rmse_llm = test_stats_llm['rmse'].iloc[0]
mae_llm = test_stats_llm['mae'].iloc[0]
r2_llm = test_stats_llm['r_squared'].iloc[0]

print(f"\nLLM-Enhanced Performance:")
print(f"  RMSE: {rmse_llm:.2f}")
print(f"  MAE: {mae_llm:.2f}")
print(f"  R²: {r2_llm:.4f}")

## Compare Rule-Based vs LLM-Enhanced

Let's compare the two approaches side-by-side.

In [None]:
# Create comparison DataFrame
comparison = pd.DataFrame({
    'Mode': ['Rule-Based', 'LLM-Enhanced'],
    'Model': [
        workflow_rule.extract_spec_parsnip().model_type,
        workflow_llm.extract_spec_parsnip().model_type
    ],
    'RMSE': [rmse_rule, rmse_llm],
    'MAE': [mae_rule, mae_llm],
    'R²': [r2_rule, r2_llm],
    'Cost': [0.0, agent_llm.llm_client.total_cost if hasattr(agent_llm, 'llm_client') else 0.0],
    'Speed': ['<1s', '10-30s']
})

# Calculate improvement
comparison['RMSE_Improvement'] = ((rmse_rule - comparison['RMSE']) / rmse_rule * 100).round(2)
comparison['MAE_Improvement'] = ((mae_rule - comparison['MAE']) / mae_rule * 100).round(2)

print("\n" + "="*80)
print("PERFORMANCE COMPARISON")
print("="*80)
print(comparison.to_string(index=False))

# Visual comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Metrics comparison
metrics = ['RMSE', 'MAE']
rule_vals = [rmse_rule, mae_rule]
llm_vals = [rmse_llm, mae_llm]

x = np.arange(len(metrics))
width = 0.35

axes[0].bar(x - width/2, rule_vals, width, label='Rule-Based', alpha=0.8)
axes[0].bar(x + width/2, llm_vals, width, label='LLM-Enhanced', alpha=0.8)
axes[0].set_ylabel('Error')
axes[0].set_title('Performance Comparison (Lower is Better)', fontweight='bold')
axes[0].set_xticks(x)
axes[0].set_xticklabels(metrics)
axes[0].legend()
axes[0].grid(True, alpha=0.3, axis='y')

# Predictions comparison
test_outputs_rule = outputs_rule[outputs_rule['split'] == 'test']
test_outputs_llm = outputs_llm[outputs_llm['split'] == 'test']

axes[1].plot(test_data['date'].values, test_outputs_rule['actuals'].values, 
             label='Actual', linewidth=2, alpha=0.7)
axes[1].plot(test_data['date'].values, test_outputs_rule['fitted'].values,
             label='Rule-Based', linestyle='--', alpha=0.7)
axes[1].plot(test_data['date'].values, test_outputs_llm['fitted'].values,
             label='LLM-Enhanced', linestyle='--', alpha=0.7)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Sales ($1000s)')
axes[1].set_title('Test Set Predictions', fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Example 3: Advanced Constraints with LLM

LLM mode excels at handling **complex natural language constraints** that are difficult to encode in rules.

Let's add domain-specific requirements.

In [None]:
# Create new agent with complex constraints
agent_constrained = ForecastAgent(
    verbose=True,
    use_llm=True,
    model="claude-sonnet-4.5"
)

print("\n" + "="*80)
print("LLM WITH COMPLEX CONSTRAINTS")
print("="*80)

# Generate workflow with specific constraints
workflow_constrained = agent_constrained.generate_workflow(
    data=train_data,
    request="Forecast e-commerce sales with seasonality and external factors",
    constraints={
        'interpretability': 'high',  # Need to explain to stakeholders
        'domain': 'retail',  # E-commerce domain
        'priority': 'accuracy',  # Prioritize accuracy over speed
        'special_requirements': [
            'Must handle multiple seasonality (weekly + yearly)',
            'Account for marketing campaign effects',
            'Consider competitor pricing dynamics',
            'Temperature interactions with sales'
        ]
    }
)

print(f"\n✓ Workflow generated with constraints")
print(f"  Model: {workflow_constrained.extract_spec_parsnip().model_type}")
print(f"  Cost: ${agent_constrained.llm_client.total_cost:.4f}")

In [None]:
# View reasoning for constrained workflow
if hasattr(agent_constrained, 'last_workflow_info'):
    info = agent_constrained.last_workflow_info
    
    print("\n" + "="*80)
    print("CONSTRAINT-AWARE REASONING")
    print("="*80)
    
    if 'model_selection_reasoning' in info:
        print("\nHow constraints influenced model selection:")
        print(info['model_selection_reasoning'])
    
    if 'constraint_satisfaction' in info:
        print("\nConstraint satisfaction analysis:")
        print(info['constraint_satisfaction'])
else:
    print("\n⚠️  LLM reasoning not available")

In [None]:
# Evaluate constrained workflow
fit_constrained = workflow_constrained.fit(train_data)
eval_constrained = fit_constrained.evaluate(test_data)

outputs_constrained, coeffs_constrained, stats_constrained = eval_constrained.extract_outputs()

test_stats_constrained = stats_constrained[stats_constrained['split'] == 'test']
rmse_constrained = test_stats_constrained['rmse'].iloc[0]
mae_constrained = test_stats_constrained['mae'].iloc[0]
r2_constrained = test_stats_constrained['r_squared'].iloc[0]

print(f"\nConstrained LLM Performance:")
print(f"  RMSE: {rmse_constrained:.2f}")
print(f"  MAE: {mae_constrained:.2f}")
print(f"  R²: {r2_constrained:.4f}")
print(f"  Cost: ${agent_constrained.llm_client.total_cost:.4f}")

## Example 4: Budget Management

LLM mode includes **budget controls** to prevent runaway API costs.

In [None]:
# Create agent with strict budget
agent_budget = ForecastAgent(
    verbose=True,
    use_llm=True,
    model="claude-sonnet-4.5",
    budget_per_day=50.0  # Lower daily budget
)

print(f"\nAgent initialized with daily budget: ${agent_budget.llm_client.budget_per_day:.2f}")
print(f"Current total cost: ${agent_budget.llm_client.total_cost:.4f}")
print(f"Remaining budget: ${agent_budget.llm_client.budget_per_day - agent_budget.llm_client.total_cost:.2f}")

# Try to generate workflow
try:
    workflow_budget = agent_budget.generate_workflow(
        data=train_data,
        request="Forecast sales"
    )
    print(f"\n✓ Workflow generated successfully")
    print(f"  Cost: ${agent_budget.llm_client.total_cost:.4f}")
    print(f"  Remaining budget: ${agent_budget.llm_client.budget_per_day - agent_budget.llm_client.total_cost:.2f}")
except Exception as e:
    print(f"\n❌ Budget exceeded: {e}")

## Example 5: Multi-Model Comparison with LLM Reasoning

Combine **Phase 2 (LLM)** with **Phase 3.3 (Multi-Model Comparison)** for intelligent model selection across multiple candidates.

In [None]:
# Create agent for multi-model comparison
agent_multi = ForecastAgent(
    verbose=True,
    use_llm=True,
    model="claude-sonnet-4.5"
)

print("\n" + "="*80)
print("MULTI-MODEL COMPARISON WITH LLM REASONING")
print("="*80)

# Compare 5 models with LLM reasoning for each
results_multi = agent_multi.compare_models(
    data=train_data,
    request="Forecast e-commerce sales with complex patterns",
    n_models=5,  # Compare top 5 LLM recommendations
    cv_strategy='time_series',
    date_column='date',
    initial='12 months',
    assess='3 months',
    skip='1 month',
    return_ensemble=True
)

print(f"\n✓ Compared {len(results_multi['model_ids'])} models with LLM reasoning")
print(f"  Total cost: ${agent_multi.llm_client.total_cost:.4f}")

In [None]:
# View rankings
print("\nModel Rankings (with LLM reasoning):")
print(results_multi['rankings'].head(10).to_string(index=False))

print(f"\nBest model: {results_multi['best_model_id']}")
print(f"Ensemble recommended: {results_multi['ensemble_recommended']}")

if results_multi['ensemble_recommended']:
    print(f"Ensemble models: {results_multi['ensemble_models']}")

## Key Takeaways

### When to Use LLM Mode (Phase 2)

**✅ Use LLM Mode When**:
1. **Complex patterns**: Multiple seasonality, non-linear relationships, external factors
2. **Need explainability**: Must explain model choices to stakeholders
3. **Domain-specific requirements**: Retail, finance, healthcare constraints
4. **Natural language constraints**: Complex requirements hard to encode
5. **Willing to pay**: $4-10 per workflow acceptable for better accuracy

**❌ Use Rule-Based Mode When**:
1. **Simple patterns**: Basic trend + seasonality
2. **Cost-sensitive**: Need $0 cost for batch processing
3. **Fast prototyping**: Immediate results (<1s)
4. **Standard forecasting**: No special requirements

### Performance vs Cost Trade-off

| Mode | Cost | Speed | Accuracy | Explainability |
|------|------|-------|----------|----------------|
| Rule-Based | $0 | <1s | 70-80% | Basic |
| LLM-Enhanced | $4-10 | 10-30s | 80-85% | High |

### Budget Management Best Practices

1. **Set daily budgets**: Use `budget_per_day` parameter
2. **Monitor costs**: Check `agent.llm_client.total_cost`
3. **Start small**: Test with rule-based, upgrade to LLM if needed
4. **Batch processing**: Use rule-based for many datasets, LLM for critical ones

### Next Steps

1. **Try on your data**: Apply to your own forecasting problems
2. **Experiment with constraints**: Test different domain/interpretability requirements
3. **Compare modes**: Evaluate rule-based vs LLM for your use case
4. **Combine phases**: Use LLM (Phase 2) + RAG (Phase 3.4) + Iteration (Phase 3.5)
5. **Production deployment**: Set up API key management and budget controls

See other tutorials for:
- **22_agent_complete_tutorial.ipynb**: All phases overview
- **24_agent_domain_specific_examples.ipynb**: Industry-specific examples
- **25_agent_advanced_features.ipynb**: Advanced capabilities