# TSForge Recipes & Workflows

This notebook provides an in-depth exploration of TSForge's **Recipes** and **Workflows** using the M5 competition dataset.

### What You'll Learn:

1. **Recipe Fundamentals** - Building blocks of feature engineering
2. **Simple to Complex Recipes** - Progressive examples
3. **Workflow Types** - MLForecast, StatsForecast, Ensemble
4. **Combining Recipes & Workflows** - Real-world patterns
5. **Advanced Techniques** - Custom steps, dynamic recipes
6. **Comparison Strategies** - Testing multiple approaches
7. **Best Practices** - Production-ready patterns

---

## Setup

In [1]:
import pandas as pd
import numpy as np
import os
from functools import partial

# TSForge
from tsforge import Workflow, WorkflowManager, Recipe
from tsforge.evaluation import accuracy_table
from tsforge.logging.logger import WorkflowLogger, get_recipe_callbacks, get_manager_callbacks

# Feature engineering
from tsforge.feature_engineering import (
    as_category,
    one_hot,
    TargetMeanEncoder,
    drop_features
)

# Lag transforms
from mlforecast.lag_transforms import (
    RollingMean, RollingStd, RollingMin, RollingMax,
    ExpandingMean, ExpandingStd, ExponentiallyWeightedMean
)

# ML models
import lightgbm as lgb
from sklearn.ensemble import RandomForestRegressor

os.environ["NIXTLA_ID_AS_COL"] = "1"

from tsforge.display import enable_notebook_style
enable_notebook_style()

print("✅ Setup complete")

  __import__("pkg_resources").declare_namespace(__name__)  # type: ignore
  from tqdm.autonotebook import tqdm


✅ Setup complete


---

## 1. Understanding TSForge's Framework
### Recipe ▸ Workflow ▸ Manager
TSForge is built on **three core architectural layers** that work together to provide a flexible and powerful forecasting framework.

- **Recipe = one‑time, static transformations** (cast types, encode IDs, add immutable columns).  
- **Workflow = dynamic modeling config** (lags, rolling windows, date features, models).  
- **Manager = orchestration** (cross‑validation, comparisons, final forecasts).


```
Data → Recipe → Workflow → Manager → Results
         ↓         ↓           ↓         ↓
    Transform   Configure   Execute   Analyze
```

Data → Recipe → Workflow → Manager → Results
         ↓         ↓           ↓         ↓
    Transform   Configure   Execute   Analyze

## 1.1 Three-Layer Architecture

```

┌───────────────────────────────────────────────────────────────────────────┐
│                         USER INTERFACE (APIs)                              │
│  Recipe API     • build static feature pipeline                            │
│  Workflow API   • bind engine + dynamic TS features + models               │
│  Manager API    • run CV/forecast, compare, log, select                    │
└───────────────────────────────────────────────────────────────────────────┘
                                    │ delegates
                                    ▼
┌───────────────────────────────────────────────────────────────────────────┐
│                         PROCESSING / ADAPTER LAYER                         │
│  Feature Pipeline     • execute Recipe (pure transforms, encoders)         │
│  Engine Adapter       • standardize in/out to a unified schema             │
│  Ensemble Combiner    • merge member predictions & intervals               │
│  Cache & Fingerprint  • reuse results across workflows/ensembles           │
└───────────────────────────────────────────────────────────────────────────┘
                                    │ wraps
                                    ▼
┌───────────────────────────────────────────────────────────────────────────┐
│                               ENGINE LAYER                                 │
│  MLForecast (Nixtla)   • tabular ML with lags/rolling & date features      │
│  StatsForecast (Nixtla)• ARIMA/ETS/Naive baselines, fast & interpretable   │
│  NeuralForecast (Nixtla)• deep TS models (N-Beats, TFT, etc.)              │
└───────────────────────────────────────────────────────────────────────────┘

```

| Layer | You define… | It contains | Examples | Why here |
|---|---|---|---|---|
| **Recipe** | **one‑time, static** transforms | dtype casts, OOF encoders, immutable flags | `as_category`, `TargetMeanEncoder`, renames/drops | Doesn’t depend on cutoff or horizon; fit once and reuse. |
| **Workflow** | **dynamic, time‑dependent** features + **models** | lags, rolling, date parts, engine + hyperparams | `.with_lags([1,7,14])`, `.with_lag_transforms(...)`, `.add_model("lgbm")` | Depends on history/cutoff/horizon; recomputed each window. |
| **Manager** | **orchestration** | CV windows, parallel runs, standardization, ensembles, metrics | `.cross_validation(...)`, `.forecast(...)` | Ensures fair comparisons + unified outputs. |
| **Nixtla engines** | **training/prediction** | ML/Stats/Neural implementations | MLForecast, StatsForecast, NeuralForecast | Best‑in‑class engines; TSForge adapts/standardizes around them. |

### Key Design Principles

1. **Separation of Concerns** - Each component has a single, well-defined responsibility
2. **Builder Pattern** - Fluent API for intuitive configuration
3. **Strategy Pattern** - Pluggable engines and combiners
4. **Observer Pattern** - Callbacks for logging and monitoring
5. **Pipeline Pattern** - Sequential, composable transformation steps

## Load Data

In [3]:
# Load M5 data
train_df = pd.read_parquet('tsforge/data/input/processed/01_train_df.parquet')
train_df['sales'] = train_df['sales'].fillna(0)

print(f"Dataset shape: {train_df.shape}")
print(f"Date range: {train_df['date'].min()} to {train_df['date'].max()}")
print(f"Unique series: {train_df['unique_id'].nunique():,}")
print(f"\nColumns: {list(train_df.columns)}")
train_df.head(3)

Dataset shape: (6141593, 8)
Date range: 2012-06-19 00:00:00 to 2016-05-22 00:00:00
Unique series: 4,836

Columns: ['unique_id', 'date', 'sales', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id']


Unnamed: 0,unique_id,date,sales,item_id,dept_id,cat_id,store_id,state_id
0,HOBBIES_1_001_CA_1,2013-07-18,1.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
1,HOBBIES_1_001_CA_1,2013-07-19,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
2,HOBBIES_1_001_CA_1,2013-07-20,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA


---

# Part 1: Understanding Recipes

## What is a Recipe?

A **Recipe** is a pipeline of data transformation steps that:
- Ensures reproducibility
- Makes feature engineering modular
- Can be shared across workflows
- Tracks transformations applied

### Recipe Structure:
```python
Recipe(name="My Recipe")
    .add_step(function1, arg1=value1)
    .add_step(function2, arg2=value2)
    .add_step(function3)
```

## 1.1 Example 1: Basic Recipe

A simple recipe - just convert columns to categorical.

In [11]:
# Create the simpe recipe
simple_recipe = (
    Recipe(name="set categorical variables")
    .add_step(as_category, cols=['unique_id', 'item_id'])
)

print("\n Recipe Structure:")
display(simple_recipe.summary())
df_baked = simple_recipe.bake(train_df)
df_baked.head()


 Recipe Structure:


Unnamed: 0,Step,Function
0,1,"as_category(cols=['unique_id', 'item_id'])"


Unnamed: 0,unique_id,date,sales,item_id,dept_id,cat_id,store_id,state_id
0,HOBBIES_1_001_CA_1,2013-07-18,1.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
1,HOBBIES_1_001_CA_1,2013-07-19,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
2,HOBBIES_1_001_CA_1,2013-07-20,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
3,HOBBIES_1_001_CA_1,2013-07-21,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
4,HOBBIES_1_001_CA_1,2013-07-22,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA


## 1.2 Example 2: Basic Recipe with Multiple Steps

Add categorical conversion + one-hot encoding.

In [12]:
one_hot_recipe = (
    Recipe(name="Basic - Categories + One-Hot")
    # Step 1: Convert to categorical
    .add_step(
        as_category,
        cols=['unique_id', 'item_id', 'dept_id', 'store_id']
    )
    # Step 2: One-hot encode state
    .add_step(
        one_hot,
        cols=['state_id'],
    )
)

print("\n Recipe Structure:")
display(one_hot_recipe.summary())
df_baked = one_hot_recipe.bake(train_df)
df_baked.head()


 Recipe Structure:


Unnamed: 0,Step,Function
0,1,"as_category(cols=['unique_id', 'item_id', 'dep..."
1,2,one_hot(cols=['state_id'])


Unnamed: 0,unique_id,date,sales,item_id,dept_id,cat_id,store_id,state_id_CA,state_id_TX
0,HOBBIES_1_001_CA_1,2013-07-18,1.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,1,0
1,HOBBIES_1_001_CA_1,2013-07-19,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,1,0
2,HOBBIES_1_001_CA_1,2013-07-20,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,1,0
3,HOBBIES_1_001_CA_1,2013-07-21,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,1,0
4,HOBBIES_1_001_CA_1,2013-07-22,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,1,0


## 1.3 Example 3: Intermediate Recipe with Target Encoding

Add target mean encoding to capture category-level patterns.

In [14]:
logger = WorkflowLogger(verbose=1)

target_enc_recipe = (
    Recipe(
        name="Target Encoding",
        callbacks=get_recipe_callbacks(logger)
    )
    # Step 1: Convert to categorical
    .add_step(as_category,
              cols=['unique_id', 'item_id', 'dept_id', 'cat_id', 
                    'store_id', 'state_id']) 
           
    # Step 2: Target mean encoding for department
    .add_step(TargetMeanEncoder,
              cols=['category_id'],
              target_col='sales',
              id_cols=['unique_id'],
              oof=True)
)

print(f"Number of steps: {len(target_enc_recipe.steps)}")
print("\n Recipe Structure:")
display(target_enc_recipe.summary())
df_baked = target_enc_recipe.bake(train_df)
df_baked.head()

Number of steps: 2

 Recipe Structure:


Unnamed: 0,Step,Function
0,1,"as_category(cols=['unique_id', 'item_id', 'dep..."
1,2,"TargetMeanEncoder(cols=['category_id'], target..."



Recipe: Target Encoding
Steps: 2
Input shape: (6141593, 8)
Final shape: (6141593, 8)
✓ Recipe completed in 1.76s



Unnamed: 0,unique_id,date,sales,item_id,dept_id,cat_id,store_id,state_id
0,HOBBIES_1_001_CA_1,2013-07-18,1.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
1,HOBBIES_1_001_CA_1,2013-07-19,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
2,HOBBIES_1_001_CA_1,2013-07-20,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
3,HOBBIES_1_001_CA_1,2013-07-21,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA
4,HOBBIES_1_001_CA_1,2013-07-22,0.0,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA


---

# Part 2: Understanding Workflows 

## What is a Workflow?

A **Workflow** combines:
- A Recipe (optional)
- One or more models
- Model-specific configurations (lags, features)

### Three Workflow Types:
1. **MLForecast** - Machine learning models (LightGBM, XGBoost, etc.)
2. **StatsForecast** - Statistical models (ARIMA, ETS, etc.)
3. **Ensemble** - Combines other workflows

## 2.1 Example 1: Simple MLForecast Workflow

LightGBM with minimal configuration.

In [None]:
wf_simple = (
    Workflow.mlforecast("Simple LGBM")
    .use_recipe(simple_recipe)  # Use our minimal recipe
    .add_model(
        "lgbm",
        n_estimators=50,
        random_state=42,
        enable_categorical=True,
    )
    .with_lags([7])  # Just 1-week lag
    .build()
)

## 2.2 Example 2: MLForecast with Dynamic Features

Add more lags and lag transformations.

In [None]:
wf_transformed = (
    Workflow.mlforecast("LGBM")
    .use_recipe(target_enc_recipe)  # Use recipe with target encoding
    .add_model(
        "lgbm",
        n_estimators=100,
        enable_categorical=True,
        random_state=42
    )
    # Multiple lags
    .with_lags([1, 7, 14, 21, 28])
    # Lag transformations
    .with_lag_transforms({
        1: [ExpandingMean()],
        7: [
            RollingMean(window_size=7),
            RollingStd(window_size=7)
        ],
        28: [
            RollingMean(window_size=28),
            RollingStd(window_size=28)
        ]
    })
    # Date features
    .with_date_features(["dayofweek", "month", "year"])
    .build()
)

print("Transformed workflow created")

Transformed workflow created


## 2.4 Example 4: StatsForecast Workflow

Statistical baseline models.

In [18]:
wf_statistical = (
    Workflow.statsforecast("Statistical Baselines")
    .add_model("naive")
    .add_model("seasonal_naive", season_length=7)
    .build()
)

print("Statistical workflow created")

Statistical workflow created


## 2.5 Example 5: Ensemble Workflow

Combine ML and statistical models.

In [31]:
wf_ensemble = (
    Workflow.ensemble("Ensemble - Mean")
    .add_member(wf_transformed)
    .add_member(wf_statistical)
    .combine_using("mean")  # Average predictions
    .build()
)

print("Ensemble workflow created")

Ensemble workflow created


## 2.6 Example 6: Weighted Ensemble

Give more weight to the better models.

In [20]:
wf_weighted_ensemble = (
    Workflow("Ensemble - Weighted")
    .add_member(wf_transformed)      # First member
    .add_member(wf_statistical)        # Second member
    .combine_using("weighted", weights=[0.7, 0.3])  # Weights in order
    .build()
)


---

# Part 3: Understanding WorkflowManager 

## What is a WorkflowManager?

The **WorkflowManager** is the orchestration layer that:
- **Coordinates** multiple workflows for fair comparison
- **Standardizes** inputs/outputs across different engines
- **Manages** cross-validation and forecasting
- **Tracks** execution times and performance metrics
- **Enables** parallel execution (when configured)

### Key Responsibilities:

```

─────────────────────────────────────────────────────┐
│                    WorkflowManager                       │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  1. Workflow Orchestration                              │
│     • Run multiple workflows on same data               │
│     • Ensure fair comparison (same cutoffs, horizons)   │
│     • Manage workflow lifecycle                         │
│                                                          │
│  2. Cross-Validation                                     │
│     • Time series specific CV windows                   │
│     • Expanding/sliding window strategies               │
│     • Multi-window backtesting                          │
│                                                          │
│  3. Result Standardization                               │
│     • Unified output format across engines              │
│     • Consistent column naming                          │
│     • Prediction intervals (when available)             │
│                                                          │
│  4. Performance Tracking                                 │
│     • Execution time per workflow                       │
│     • Memory usage (optional)                           │
│     • Model-level metrics                               │
│                                                          │
│  5. Callbacks & Logging                                  │
│     • Progress tracking                                 │
│     • Error handling                                    │
│     • Custom monitoring                                 │
│                                                          │
└─────────────────────────────────────────────────────────┘
```

## 3.1 Basic Manager Setup

### Creating a Manager

In [None]:
manager = WorkflowManager([wf_statistical])

# Method 2: With logging/callbacks
logger = WorkflowLogger(verbose=1)  # verbose: 0=silent, 1=progress, 2=detailed
manager = WorkflowManager(
    workflows=[wf_statistical],
    callbacks=get_manager_callbacks(logger)
)



🚀 WorkflowManager initialized with 1 workflow(s)
   [1] Statistical Baselines | Engine: statsforecast | Models:  2 | Recipe: ✗



## 3.2 Example 2: Multi-Model Workflow


In [32]:
multi_manager = WorkflowManager([wf_transformed,
                            wf_statistical,
                            wf_ensemble],
                        callbacks=get_manager_callbacks(logger)
                    )



🚀 WorkflowManager initialized with 3 workflow(s)
   [1] LGBM - Rich Features | Engine: mlforecast   | Models:  1 | Recipe: ✓
   [2] Statistical Baselines | Engine: statsforecast | Models:  2 | Recipe: ✗
   [3] Ensemble - Mean      | Engine: ensemble     | Models:  0 | Recipe: ✗



## 3.2 Cross-Validation 

### Understanding Time Series Cross-Validation

Time series CV is different from standard ML cross-validation:
- **No random shuffling** - maintains temporal order
- **Expanding window** - training set grows over time
- **Multiple cutoffs** - test on different future periods

**Parameter Details:**
- **h (horizon)**: How many steps ahead to forecast
  - `h=7` → forecast next 7 days
  - `h=28` → forecast next 28 days
  
- **n_windows**: Number of backtesting windows
  - `n_windows=1` → single test period (fast, less robust)
  - `n_windows=3` → three test periods (balanced)
  - `n_windows=5+` → many test periods (slower, more robust)

- **step_size**: Distance between window cutoffs
  - `None` or `h` → non-overlapping windows
  - `h//2` → 50% overlap between windows
  - `1` → maximum overlap (sliding window)

- **level**: Prediction interval confidence levels
  - `[80]` → 80% prediction intervals
  - `[80, 95]` → both 80% and 95% intervals
  - `None` → no intervals

In [27]:
cv_results = manager.cross_validation(
    train_df,
    h=14,
    n_windows=1
)
cv_results.head()


🔬 STARTING CROSS-VALIDATION
   Horizon: 14 steps
   Windows: 1
   Step size: 14
   Workflows: 1

✓ CROSS-VALIDATION COMPLETE
   Predictions: 135,408
   Models: 2
   Workflows: 1
   Time: 1.61s



Unnamed: 0,unique_id,date,cutoff,sales,yhat,model,workflow,engine
0,HOBBIES_1_001_CA_1,2016-05-09,2016-05-08,2.0,1.0,Naive,Statistical Baselines,statsforecast
1,HOBBIES_1_001_CA_1,2016-05-10,2016-05-08,2.0,1.0,Naive,Statistical Baselines,statsforecast
2,HOBBIES_1_001_CA_1,2016-05-11,2016-05-08,1.0,1.0,Naive,Statistical Baselines,statsforecast
3,HOBBIES_1_001_CA_1,2016-05-12,2016-05-08,0.0,1.0,Naive,Statistical Baselines,statsforecast
4,HOBBIES_1_001_CA_1,2016-05-13,2016-05-08,2.0,1.0,Naive,Statistical Baselines,statsforecast


In [33]:
cv_results_multi_model = multi_manager.cross_validation(
    train_df,
    h=14,
    n_windows=1
)
cv_results_multi_model.head()


🔬 STARTING CROSS-VALIDATION
   Horizon: 14 steps
   Windows: 1
   Step size: 14
   Workflows: 3

Recipe: Target Encoding
Steps: 2
Input shape: (6141593, 8)

Recipe: Target Encoding
Steps: 2
Input shape: (6141593, 8)
Final shape: (6141593, 8)
✓ Recipe completed in 1.56s

Final shape: (6141593, 8)
✓ Recipe completed in 2.50s


✓ CROSS-VALIDATION COMPLETE
   Predictions: 338,520
   Models: 4
   Workflows: 3
   Time: 18.64s



Unnamed: 0,unique_id,date,cutoff,sales,yhat,model,workflow,engine
0,HOBBIES_1_001_CA_1,2016-05-09,2016-05-08,2.0,0.379742,LGBMRegressor,LGBM - Rich Features,mlforecast
1,HOBBIES_1_001_CA_1,2016-05-10,2016-05-08,2.0,0.379742,LGBMRegressor,LGBM - Rich Features,mlforecast
2,HOBBIES_1_001_CA_1,2016-05-11,2016-05-08,1.0,0.482527,LGBMRegressor,LGBM - Rich Features,mlforecast
3,HOBBIES_1_001_CA_1,2016-05-12,2016-05-08,0.0,0.967906,LGBMRegressor,LGBM - Rich Features,mlforecast
4,HOBBIES_1_001_CA_1,2016-05-13,2016-05-08,2.0,0.447598,LGBMRegressor,LGBM - Rich Features,mlforecast


In [34]:
from tsforge.evaluation import accuracy_table

# Aggregate metrics across all series and windows
aggregate_scores = accuracy_table(
    cv_results_multi_model, 
    train_df, 
    mode='aggregate'
)

print("=" * 80)
print("ACCURACY METRICS (AGGREGATE)")
print("=" * 80)
print(aggregate_scores[['workflow', 'model', 'mae', 'rmse', 'mape']])


ACCURACY METRICS (AGGREGATE)
                workflow          model       mae      rmse       mape
0        Ensemble - Mean  Ensemble-mean       NaN       NaN        NaN
1   LGBM - Rich Features  LGBMRegressor  0.872007  1.532689  53.054906
2  Statistical Baselines          Naive  1.006735  2.049776  84.035706
3  Statistical Baselines  SeasonalNaive  0.959559  2.021647  82.648522


---

# Part 4: Running Experiments 

## 4.1 Experiment 1: Recipe Impact Analysis

Compare how different recipes affect model performance.

In [None]:
# Run recipe comparison experiment
print("🔬 Running Recipe Comparison Experiment...")
print("This may take a few minutes...\n")

logger_exp1 = WorkflowLogger(verbose=1)
manager_exp1 = WorkflowManager(
    workflows_recipe_comparison,
    callbacks=get_manager_callbacks(logger_exp1)
)

cv_recipe_comparison = manager_exp1.cross_validation(
    train_df,
    h=14,  # Shorter horizon for faster testing
    n_windows=1  # Single window
)

print("\n✅ Recipe comparison completed")

In [None]:
# Evaluate recipe comparison
scores_recipe = accuracy_table(cv_recipe_comparison, train_df, mode='aggregate')
exec_times_recipe = manager_exp1.execution_summary()

results_recipe = scores_recipe.merge(
    exec_times_recipe[['workflow', 'execution_time_s']],
    on='workflow'
).sort_values('mae')

print("=" * 80)
print("RECIPE COMPARISON RESULTS")
print("=" * 80)
print(results_recipe[['workflow', 'mae', 'rmse', 'mape', 'execution_time_s']])

## 4.2 Experiment 2: Lag Configuration Impact

In [None]:
# Run lag comparison experiment
print("🔬 Running Lag Configuration Experiment...\n")

logger_exp2 = WorkflowLogger(verbose=1)
manager_exp2 = WorkflowManager(
    workflows_lag_comparison,
    callbacks=get_manager_callbacks(logger_exp2)
)

cv_lag_comparison = manager_exp2.cross_validation(
    train_df,
    h=14,
    n_windows=1
)

print("\n✅ Lag comparison completed")

## 4.3 Experiment 3: Transform Strategy Impact

---

# Part 5: Advanced Patterns 

## 5.1 Dynamic Recipe Creation

Create recipes programmatically based on data characteristics.

## 5.2 Recipe Factory Pattern

Create a factory to generate recipes for different use cases.

In [None]:
class RecipeFactory:
    """Factory for creating common recipe patterns."""
    
    @staticmethod
    def minimal():
        """Minimal recipe for fast prototyping."""
        return Recipe("Minimal").add_step(
            as_category,
            cols=['unique_id', 'item_id']
        )
    
    @staticmethod
    def standard():
        """Standard recipe for most use cases."""
        return (
            Recipe("Standard")
            .add_step(as_category, cols=['unique_id', 'item_id', 'dept_id', 'store_id'])
            .add_step(one_hot, cols=['state_id'], keep_original=False)
        )
    
    @staticmethod
    def production():
        """Production-ready recipe with all bells and whistles."""
        return (
            Recipe("Production")
            .add_step(as_category, cols=['unique_id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id'])
            .add_step(TargetMeanEncoder, cols=['dept_id', 'cat_id'], target_col='sales', id_cols=['unique_id'], oof=True)
            .add_step(one_hot, cols=['state_id'], keep_original=True)
            .add_step(drop_features, cols=['event_name_1', 'event_type_1'])
        )

# Use factory
recipe_from_factory = RecipeFactory.production()
print("✅ Recipe created from factory")
print(f"Steps: {len(recipe_from_factory.steps)}")

## 5.3 Workflow Presets

Create reusable workflow configurations.

In [None]:
class WorkflowPresets:
    """Presets for common workflow configurations."""
    
    @staticmethod
    def fast_prototype(recipe):
        """Quick workflow for rapid prototyping."""
        return (
            Workflow.mlforecast("Fast Prototype")
            .use_recipe(recipe)
            .add_model("lgbm", n_estimators=50, random_state=42)
            .with_lags([7])
            .build()
        )
    
    @staticmethod
    def balanced(recipe):
        """Balanced workflow - good accuracy and speed."""
        return (
            Workflow.mlforecast("Balanced")
            .use_recipe(recipe)
            .add_model("lgbm", n_estimators=100, learning_rate=0.05, random_state=42)
            .with_lags([7, 14])
            .with_lag_transforms({7: [RollingMean(window_size=7)]})
            .with_date_features(["dayofweek"])
            .build()
        )
    
    @staticmethod
    def accurate(recipe):
        """High accuracy workflow - slower but better."""
        return (
            Workflow.mlforecast("Accurate")
            .use_recipe(recipe)
            .add_model("lgbm", n_estimators=200, learning_rate=0.01, random_state=42)
            .with_lags([1, 7, 14, 21, 28])
            .with_lag_transforms({
                1: [ExpandingMean()],
                7: [RollingMean(window_size=7), RollingStd(window_size=7)],
                28: [RollingMean(window_size=28)]
            })
            .with_date_features(["dayofweek", "month", "year"])
            .build()
        )

# Create workflows from presets
wf_fast = WorkflowPresets.fast_prototype(recipe_basic)
wf_balanced = WorkflowPresets.balanced(recipe_intermediate)
wf_accurate = WorkflowPresets.accurate(recipe_advanced)

print("✅ Workflow presets created:")
print("  - Fast Prototype")
print("  - Balanced")
print("  - Accurate")

## 5.4 Compare Workflow Presets

In [None]:
# Compare presets
print("🔬 Comparing Workflow Presets...\n")

logger_presets = WorkflowLogger(verbose=1)
manager_presets = WorkflowManager(
    [wf_fast, wf_balanced, wf_accurate],
    callbacks=get_manager_callbacks(logger_presets)
)

cv_presets = manager_presets.cross_validation(
    train_df,
    h=14,
    n_windows=1
)

print("\n✅ Preset comparison completed")

In [None]:
# Evaluate presets
scores_presets = accuracy_table(cv_presets, train_df, mode='aggregate')
exec_times_presets = manager_presets.execution_summary()

results_presets = scores_presets.merge(
    exec_times_presets[['workflow', 'execution_time_s']],
    on='workflow'
).sort_values('mae')

print("=" * 80)
print("WORKFLOW PRESET COMPARISON")
print("=" * 80)
print(results_presets[['workflow', 'mae', 'rmse', 'execution_time_s']])
print("\n💡 Insight: Notice the accuracy vs. speed tradeoff")

---

# Part 7: Complete End-to-End Example 

## Putting It All Together

Let's create a complete workflow from scratch using everything we've learned.

---

# Summary & Key Takeaways

## What We Learned

### Recipes:
- **Modular** - Each step is independent
- **Reusable** - Share across workflows
- **Traceable** - Know exactly what transformations applied
- **Testable** - Easy to compare different approaches

### Workflows:
- **Flexible** - Mix and match recipes and models
- **Comparable** - Test multiple approaches easily
- **Scalable** - From prototype to production
- **Maintainable** - Clear separation of concerns

### Best Practices:
1. Start simple, add complexity gradually
2. Always include baseline models
3. Use cross-validation with multiple windows
4. Monitor accuracy vs. computational cost
5. Document recipe and workflow choices

### Advanced Patterns:
- Dynamic recipe creation based on data
- Factory patterns for reusable configurations
- Preset workflows for common use cases
- Systematic experimentation framework

---

## Next Steps

1. **Experiment** with your own data
2. **Create** custom recipes for your domain
3. **Build** a library of workflow presets
4. **Share** successful patterns with your team
5. **Monitor** production model performance

**Happy Forecasting! **