# py_agent: Domain-Specific Examples Tutorial

This notebook demonstrates how to apply **py_agent** to different industries and domains:

1. **Retail**: Daily store sales forecasting
2. **Finance**: Stock price prediction
3. **Energy**: Electricity demand forecasting
4. **Healthcare**: Hospital admission forecasting
5. **Manufacturing**: Production demand planning

Each domain has unique characteristics, and we'll show how py_agent adapts to them.

## What You'll Learn

- Domain-adapted preprocessing strategies
- Industry-specific model selection
- Constraint handling for different business requirements
- RAG knowledge base leveraging similar examples
- Best practices per industry

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Import py_agent
from py_agent import ForecastAgent

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Setup complete!")

## Domain 1: Retail Sales Forecasting

**Business Context**: Multi-location retail chain needs daily sales forecasts for inventory planning.

**Data Characteristics**:
- Strong weekly seasonality (weekends higher)
- Holiday effects (spikes on major holidays)
- Promotional impacts
- Weather effects on foot traffic

**Business Requirements**:
- High interpretability (explain to store managers)
- Handle multiple stores (grouped modeling)
- Inventory optimization (minimize stockouts + overstock)

In [None]:
# Generate retail sales data
np.random.seed(42)
n_days = 365 * 2  # 2 years
dates = pd.date_range('2022-01-01', periods=n_days, freq='D')

# Base sales with growth trend
trend = np.linspace(50, 80, n_days)

# Strong weekly seasonality (weekend boost)
day_of_week = dates.dayofweek
weekly_season = np.where(day_of_week >= 5, 25, 0)  # Weekend spike
weekly_season += np.where(day_of_week == 4, 10, 0)  # Friday boost

# Holiday effects
day_of_year = dates.dayofyear
is_holiday = np.isin(day_of_year, [1, 150, 185, 244, 316, 359])
holiday_boost = np.where(is_holiday, 40, 0)

# Promotions (random)
promotion = np.random.choice([0, 1], size=n_days, p=[0.85, 0.15])
promo_boost = promotion * np.random.uniform(15, 30, n_days)

# Weather (temperature affects foot traffic)
temperature = 60 + 20 * np.sin(2 * np.pi * (day_of_year - 80) / 365) + np.random.normal(0, 5, n_days)
temp_effect = 0.3 * (temperature - 60)

# Combine
sales = trend + weekly_season + holiday_boost + promo_boost + temp_effect + np.random.normal(0, 8, n_days)
sales = np.maximum(sales, 0)

retail_data = pd.DataFrame({
    'date': dates,
    'sales': sales,
    'promotion': promotion,
    'temperature': temperature
})

# Train/test split
train_retail = retail_data.iloc[:int(0.75 * len(retail_data))].copy()
test_retail = retail_data.iloc[int(0.75 * len(retail_data)):].copy()

print(f"\nðŸ“Š RETAIL SALES DATA")
print(f"  - {len(retail_data)} days of daily sales")
print(f"  - Train: {len(train_retail)} days")
print(f"  - Test: {len(test_retail)} days")
print(f"  - Mean sales: ${retail_data['sales'].mean():.2f}k")
print(f"  - Promotions: {retail_data['promotion'].sum()} days ({retail_data['promotion'].mean()*100:.1f}%)")

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

axes[0].plot(retail_data['date'], retail_data['sales'], alpha=0.7)
axes[0].scatter(retail_data[retail_data['promotion']==1]['date'],
                retail_data[retail_data['promotion']==1]['sales'],
                color='red', s=20, alpha=0.5, label='Promotions')
axes[0].axvline(train_retail['date'].iloc[-1], color='gray', linestyle='--', label='Train/Test Split')
axes[0].set_title('Retail Daily Sales', fontweight='bold', fontsize=12)
axes[0].set_ylabel('Sales ($1000s)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(retail_data['date'], retail_data['temperature'], color='orange', alpha=0.7)
axes[1].set_title('Temperature', fontweight='bold', fontsize=12)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Temperature (Â°F)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Initialize agent for retail domain
agent_retail = ForecastAgent(
    verbose=True,
    use_rag=True  # Use RAG for retail examples
)

print("\n" + "="*80)
print("RETAIL SALES FORECASTING")
print("="*80)

# Generate workflow with retail-specific constraints
workflow_retail = agent_retail.generate_workflow(
    data=train_retail,
    request="Forecast daily retail sales with weekly seasonality, promotions, and weather effects",
    constraints={
        'domain': 'retail',
        'interpretability': 'high',  # Need to explain to store managers
        'priority': 'accuracy',  # Minimize inventory costs
        'special_requirements': [
            'Strong weekly patterns (weekend spikes)',
            'Holiday effects',
            'Promotional impacts'
        ]
    }
)

print(f"\nâœ“ Retail workflow generated")
print(f"  Model: {workflow_retail.extract_spec_parsnip().model_type}")

In [None]:
# Fit and evaluate
fit_retail = workflow_retail.fit(train_retail)
eval_retail = fit_retail.evaluate(test_retail)

outputs_retail, _, stats_retail = eval_retail.extract_outputs()

test_stats = stats_retail[stats_retail['split'] == 'test']
print(f"\nRetail Forecast Performance:")
print(f"  RMSE: ${test_stats['rmse'].iloc[0]:.2f}k")
print(f"  MAE: ${test_stats['mae'].iloc[0]:.2f}k")
print(f"  RÂ²: {test_stats['r_squared'].iloc[0]:.4f}")

# Visualize predictions
test_outputs = outputs_retail[outputs_retail['split'] == 'test']

plt.figure(figsize=(14, 6))
plt.plot(test_retail['date'].values, test_outputs['actuals'].values, 
         label='Actual Sales', linewidth=2, alpha=0.7)
plt.plot(test_retail['date'].values, test_outputs['fitted'].values,
         label='Forecast', linestyle='--', linewidth=2, alpha=0.7)
plt.fill_between(test_retail['date'].values,
                 test_outputs['fitted'].values - 2*test_stats['rmse'].iloc[0],
                 test_outputs['fitted'].values + 2*test_stats['rmse'].iloc[0],
                 alpha=0.2, label='95% Prediction Interval')
plt.title('Retail Sales Forecast (Test Period)', fontweight='bold', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Sales ($1000s)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Domain 2: Financial Markets

**Business Context**: Quantitative trading firm needs short-term price predictions.

**Data Characteristics**:
- High volatility and noise
- Non-stationary (trends change)
- Autocorrelation (momentum effects)
- External market factors

**Business Requirements**:
- High frequency (daily)
- Handle volatility clustering
- Risk management (prediction intervals)
- Fast execution (<5 min)

In [None]:
# Generate financial data (stock prices)
np.random.seed(123)
n_days_fin = 500
dates_fin = pd.date_range('2022-01-01', periods=n_days_fin, freq='D')

# Random walk with drift (typical stock behavior)
returns = np.random.normal(0.0005, 0.02, n_days_fin)  # Daily returns
price = 100 * np.exp(np.cumsum(returns))  # Compound to price

# Market index (correlated)
market_returns = np.random.normal(0.0003, 0.015, n_days_fin)
market_index = 1000 * np.exp(np.cumsum(market_returns))

# Volatility (GARCH-like)
volatility = np.abs(returns) * 10 + np.random.uniform(0.01, 0.03, n_days_fin)

# Trading volume
volume = np.random.lognormal(15, 0.5, n_days_fin)

finance_data = pd.DataFrame({
    'date': dates_fin,
    'price': price,
    'market_index': market_index,
    'volatility': volatility,
    'volume': volume
})

# Add technical indicators
finance_data['returns'] = finance_data['price'].pct_change()
finance_data['ma_7'] = finance_data['price'].rolling(7).mean()
finance_data['ma_30'] = finance_data['price'].rolling(30).mean()
finance_data = finance_data.dropna()

# Train/test split
train_finance = finance_data.iloc[:int(0.8 * len(finance_data))].copy()
test_finance = finance_data.iloc[int(0.8 * len(finance_data)):].copy()

print(f"\nðŸ’¹ FINANCIAL DATA (Stock Prices)")
print(f"  - {len(finance_data)} trading days")
print(f"  - Train: {len(train_finance)} days")
print(f"  - Test: {len(test_finance)} days")
print(f"  - Price range: ${finance_data['price'].min():.2f} - ${finance_data['price'].max():.2f}")
print(f"  - Mean daily return: {finance_data['returns'].mean()*100:.3f}%")
print(f"  - Volatility (std): {finance_data['returns'].std()*100:.2f}%")

# Visualize
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

axes[0].plot(finance_data['date'], finance_data['price'], alpha=0.7, label='Price')
axes[0].plot(finance_data['date'], finance_data['ma_7'], alpha=0.5, label='MA(7)', linestyle='--')
axes[0].plot(finance_data['date'], finance_data['ma_30'], alpha=0.5, label='MA(30)', linestyle='--')
axes[0].axvline(train_finance['date'].iloc[-1], color='gray', linestyle='--', label='Train/Test')
axes[0].set_title('Stock Price with Moving Averages', fontweight='bold', fontsize=12)
axes[0].set_ylabel('Price ($)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(finance_data['date'], finance_data['returns'], alpha=0.7)
axes[1].set_title('Daily Returns', fontweight='bold', fontsize=12)
axes[1].set_ylabel('Returns')
axes[1].axhline(0, color='black', linestyle='-', linewidth=0.5)
axes[1].grid(True, alpha=0.3)

axes[2].bar(finance_data['date'], finance_data['volume'], alpha=0.7, width=1)
axes[2].set_title('Trading Volume', fontweight='bold', fontsize=12)
axes[2].set_xlabel('Date')
axes[2].set_ylabel('Volume')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Initialize agent for finance domain
agent_finance = ForecastAgent(
    verbose=True,
    use_rag=True
)

print("\n" + "="*80)
print("FINANCIAL PRICE FORECASTING")
print("="*80)

# Generate workflow with finance-specific constraints
workflow_finance = agent_finance.generate_workflow(
    data=train_finance,
    request="Forecast stock price with momentum and market correlation",
    constraints={
        'domain': 'finance',
        'priority': 'accuracy',
        'speed': 'fast',  # Trading requires quick execution
        'special_requirements': [
            'Handle non-stationarity',
            'Account for autocorrelation',
            'Market index correlation',
            'Provide prediction intervals for risk management'
        ]
    }
)

print(f"\nâœ“ Finance workflow generated")
print(f"  Model: {workflow_finance.extract_spec_parsnip().model_type}")

In [None]:
# Fit and evaluate
fit_finance = workflow_finance.fit(train_finance)
eval_finance = fit_finance.evaluate(test_finance)

outputs_finance, _, stats_finance = eval_finance.extract_outputs()

test_stats_fin = stats_finance[stats_finance['split'] == 'test']
print(f"\nFinancial Forecast Performance:")
print(f"  RMSE: ${test_stats_fin['rmse'].iloc[0]:.2f}")
print(f"  MAE: ${test_stats_fin['mae'].iloc[0]:.2f}")
print(f"  MAPE: {test_stats_fin['mape'].iloc[0]:.2f}%")
print(f"  RÂ²: {test_stats_fin['r_squared'].iloc[0]:.4f}")

# Visualize predictions
test_outputs_fin = outputs_finance[outputs_finance['split'] == 'test']

plt.figure(figsize=(14, 6))
plt.plot(test_finance['date'].values, test_outputs_fin['actuals'].values,
         label='Actual Price', linewidth=2, alpha=0.7)
plt.plot(test_finance['date'].values, test_outputs_fin['fitted'].values,
         label='Forecast', linestyle='--', linewidth=2, alpha=0.7)
plt.title('Stock Price Forecast (Test Period)', fontweight='bold', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Domain 3: Energy Demand Forecasting

**Business Context**: Utility company needs hourly electricity demand forecasts for grid optimization.

**Data Characteristics**:
- Multiple seasonality (daily + weekly + yearly)
- Strong temperature dependence
- Time-of-day patterns
- Calendar effects (weekday vs weekend)

**Business Requirements**:
- High accuracy (grid stability)
- Handle extreme weather events
- Peak demand prediction critical

In [None]:
# Generate energy demand data (daily aggregated from hourly)
np.random.seed(456)
n_days_energy = 365 * 2
dates_energy = pd.date_range('2022-01-01', periods=n_days_energy, freq='D')

# Base load with slight growth
base_load = np.linspace(1000, 1100, n_days_energy)

# Yearly seasonality (summer cooling, winter heating)
day_of_year_e = dates_energy.dayofyear
summer_cooling = 200 * np.sin(2 * np.pi * (day_of_year_e - 172) / 365)**2  # Peak July
winter_heating = 150 * np.sin(2 * np.pi * (day_of_year_e + 80) / 365)**2  # Peak January
yearly_season = summer_cooling + winter_heating

# Weekly seasonality (weekday vs weekend)
dow_energy = dates_energy.dayofweek
weekly_season = np.where(dow_energy < 5, 100, -80)  # Weekdays higher

# Temperature effect (cooling + heating)
temperature_energy = 60 + 25 * np.sin(2 * np.pi * (day_of_year_e - 80) / 365) + np.random.normal(0, 8, n_days_energy)
temp_effect_energy = np.where(temperature_energy > 75, 5 * (temperature_energy - 75),  # Cooling
                              np.where(temperature_energy < 40, 3 * (40 - temperature_energy), 0))  # Heating

# Extreme weather events (random)
extreme_events = np.random.choice([0, 1], size=n_days_energy, p=[0.97, 0.03])
extreme_boost = extreme_events * np.random.uniform(200, 400, n_days_energy)

# Combine
demand = base_load + yearly_season + weekly_season + temp_effect_energy + extreme_boost + np.random.normal(0, 30, n_days_energy)
demand = np.maximum(demand, 500)

energy_data = pd.DataFrame({
    'date': dates_energy,
    'demand_mw': demand,
    'temperature': temperature_energy,
    'is_extreme_weather': extreme_events
})

# Train/test split
train_energy = energy_data.iloc[:int(0.75 * len(energy_data))].copy()
test_energy = energy_data.iloc[int(0.75 * len(energy_data)):].copy()

print(f"\nâš¡ ENERGY DEMAND DATA")
print(f"  - {len(energy_data)} days")
print(f"  - Train: {len(train_energy)} days")
print(f"  - Test: {len(test_energy)} days")
print(f"  - Mean demand: {energy_data['demand_mw'].mean():.0f} MW")
print(f"  - Peak demand: {energy_data['demand_mw'].max():.0f} MW")
print(f"  - Extreme weather events: {energy_data['is_extreme_weather'].sum()} days")

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

axes[0].plot(energy_data['date'], energy_data['demand_mw'], alpha=0.7)
axes[0].scatter(energy_data[energy_data['is_extreme_weather']==1]['date'],
                energy_data[energy_data['is_extreme_weather']==1]['demand_mw'],
                color='red', s=30, alpha=0.7, label='Extreme Weather', zorder=5)
axes[0].axvline(train_energy['date'].iloc[-1], color='gray', linestyle='--', label='Train/Test')
axes[0].set_title('Electricity Demand', fontweight='bold', fontsize=12)
axes[0].set_ylabel('Demand (MW)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(energy_data['date'], energy_data['temperature'], color='orange', alpha=0.7)
axes[1].axhline(75, color='red', linestyle='--', alpha=0.5, label='Cooling Threshold')
axes[1].axhline(40, color='blue', linestyle='--', alpha=0.5, label='Heating Threshold')
axes[1].set_title('Temperature', fontweight='bold', fontsize=12)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Temperature (Â°F)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Initialize agent for energy domain
agent_energy = ForecastAgent(
    verbose=True,
    use_rag=True
)

print("\n" + "="*80)
print("ENERGY DEMAND FORECASTING")
print("="*80)

# Generate workflow with energy-specific constraints
workflow_energy = agent_energy.generate_workflow(
    data=train_energy,
    request="Forecast electricity demand with multiple seasonality and temperature effects",
    constraints={
        'domain': 'energy',
        'priority': 'accuracy',  # Grid stability critical
        'special_requirements': [
            'Multiple seasonality (daily + weekly + yearly)',
            'Strong temperature dependence (cooling + heating)',
            'Handle extreme weather spikes',
            'Peak demand prediction critical'
        ]
    }
)

print(f"\nâœ“ Energy workflow generated")
print(f"  Model: {workflow_energy.extract_spec_parsnip().model_type}")

In [None]:
# Fit and evaluate
fit_energy = workflow_energy.fit(train_energy)
eval_energy = fit_energy.evaluate(test_energy)

outputs_energy, _, stats_energy = eval_energy.extract_outputs()

test_stats_energy = stats_energy[stats_energy['split'] == 'test']
print(f"\nEnergy Forecast Performance:")
print(f"  RMSE: {test_stats_energy['rmse'].iloc[0]:.0f} MW")
print(f"  MAE: {test_stats_energy['mae'].iloc[0]:.0f} MW")
print(f"  MAPE: {test_stats_energy['mape'].iloc[0]:.2f}%")
print(f"  RÂ²: {test_stats_energy['r_squared'].iloc[0]:.4f}")

# Visualize predictions
test_outputs_energy = outputs_energy[outputs_energy['split'] == 'test']

plt.figure(figsize=(14, 6))
plt.plot(test_energy['date'].values, test_outputs_energy['actuals'].values,
         label='Actual Demand', linewidth=2, alpha=0.7)
plt.plot(test_energy['date'].values, test_outputs_energy['fitted'].values,
         label='Forecast', linestyle='--', linewidth=2, alpha=0.7)
plt.title('Electricity Demand Forecast (Test Period)', fontweight='bold', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Demand (MW)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Domain Comparison Summary

Let's compare how py_agent adapted to each domain.

In [None]:
# Create comparison table
comparison = pd.DataFrame({
    'Domain': ['Retail', 'Finance', 'Energy'],
    'Data_Size': [len(train_retail), len(train_finance), len(train_energy)],
    'Model_Selected': [
        workflow_retail.extract_spec_parsnip().model_type,
        workflow_finance.extract_spec_parsnip().model_type,
        workflow_energy.extract_spec_parsnip().model_type
    ],
    'Test_RMSE': [
        stats_retail[stats_retail['split']=='test']['rmse'].iloc[0],
        test_stats_fin['rmse'].iloc[0],
        test_stats_energy['rmse'].iloc[0]
    ],
    'Test_MAE': [
        stats_retail[stats_retail['split']=='test']['mae'].iloc[0],
        test_stats_fin['mae'].iloc[0],
        test_stats_energy['mae'].iloc[0]
    ],
    'Test_R2': [
        stats_retail[stats_retail['split']=='test']['r_squared'].iloc[0],
        test_stats_fin['r_squared'].iloc[0],
        test_stats_energy['r_squared'].iloc[0]
    ],
    'Key_Challenge': [
        'Weekly seasonality + promotions',
        'Non-stationarity + volatility',
        'Multiple seasonality + extreme events'
    ]
})

print("\n" + "="*80)
print("DOMAIN COMPARISON")
print("="*80)
print(comparison.to_string(index=False))

# Visual comparison of RÂ² scores
fig, ax = plt.subplots(figsize=(10, 6))

domains = comparison['Domain'].values
r2_scores = comparison['Test_R2'].values

colors = ['#1f77b4', '#ff7f0e', '#2ca02c']
bars = ax.bar(domains, r2_scores, color=colors, alpha=0.7)

ax.set_ylabel('RÂ² Score', fontsize=12)
ax.set_title('Model Performance Across Domains', fontweight='bold', fontsize=14)
ax.set_ylim(0, 1.0)
ax.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for bar, val in zip(bars, r2_scores):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height + 0.02,
            f'{val:.4f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## Key Takeaways by Domain

### Retail Sales
**Characteristics**:
- Strong weekly patterns (weekend spikes)
- Promotional effects (binary events)
- Holiday spikes
- Weather impacts

**Best Practices**:
- Use models that handle seasonality well (Prophet, ARIMA, Seasonal Decomposition)
- Include promotional indicators
- Account for holiday calendars
- High interpretability for store managers

### Financial Markets
**Characteristics**:
- High volatility and noise
- Non-stationary (regime changes)
- Autocorrelation (momentum)
- Market correlations

**Best Practices**:
- Use models robust to non-stationarity (ARIMA with differencing)
- Include lagged features (AR terms)
- Market indicators as exogenous variables
- Prediction intervals for risk management
- Fast execution critical for trading

### Energy Demand
**Characteristics**:
- Multiple nested seasonality (daily + weekly + yearly)
- Strong temperature dependence (non-linear: cooling + heating)
- Extreme weather spikes
- Calendar effects (weekday/weekend)

**Best Practices**:
- Use models with multiple seasonality (Prophet, Seasonal Reg)
- Non-linear temperature effects (polynomial or splines)
- Handle outliers (extreme weather)
- Peak demand prediction critical
- High accuracy for grid stability

## Universal Best Practices

1. **Use RAG Knowledge Base**: `use_rag=True` leverages similar domain examples
2. **Set Domain Constraints**: Specify `domain` parameter for domain-adapted preprocessing
3. **Specify Business Requirements**: Use `constraints` dict for interpretability, speed, accuracy priorities
4. **Iterate if Needed**: Use Phase 3.5 autonomous iteration for complex cases
5. **Multi-Model Comparison**: Compare 5+ models for critical forecasts (Phase 3.3)

## Next Steps

1. **Try your domain**: Apply to manufacturing, healthcare, transportation, etc.
2. **Combine phases**: Use LLM (Phase 2) + RAG (Phase 3.4) + Iteration (Phase 3.5)
3. **Grouped modeling**: Forecast multiple stores/products/regions with `fit_nested()`
4. **Custom constraints**: Add domain-specific requirements

See other tutorials:
- **22_agent_complete_tutorial.ipynb**: All phases overview
- **23_agent_llm_mode_tutorial.ipynb**: LLM-enhanced forecasting
- **25_agent_advanced_features.ipynb**: Advanced capabilities