# Baseline Models: null_model() and naive_reg()

This notebook demonstrates **baseline models** in py-tidymodels, which are essential for benchmarking any modeling project.

## Why Baselines are Critical

**Rule #1 of Machine Learning:** If your model can't beat a simple baseline, it's not adding value.

Baseline models provide:
- **Reality check**: Is your complex model actually learning something useful?
- **Minimum bar**: The simplest possible prediction strategy
- **Quick implementation**: Minutes to fit, not hours
- **Interpretability**: Everyone understands "predict the mean" or "use last value"

## Models Covered

1. **null_model()** - Predicts a constant (mean, median, or mode)
   - Regression: mean or median of training target
   - Classification: most frequent class
   - Use case: General baseline for any problem

2. **naive_reg()** - Time series naive forecasting methods
   - **naive**: Last observed value (random walk)
   - **seasonal_naive**: Last value from same season
   - **drift**: Linear trend extrapolation
   - Use case: Essential baseline for time series forecasting

In [1]:
!pip install -e .

Obtaining file:///Users/matthewdeane/Documents/Data%20Science/python/_projects/py-tidymodels/examples
[31mERROR: file:///Users/matthewdeane/Documents/Data%20Science/python/_projects/py-tidymodels/examples does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.[0m[31m
[0m

In [2]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Import py-tidymodels packages
from py_parsnip import null_model, naive_reg, linear_reg, rand_forest
from py_recipes import recipe
from py_workflows import workflow
from py_rsample import initial_time_split
from py_visualize import plot_model_comparison

print("‚úì All packages imported successfully")

‚úì All packages imported successfully


## Part 1: null_model() - The Simplest Baseline

The null model is the most basic baseline: predict a constant for all observations.

### For Regression:
- Predicts the **mean** (or median) of the training target
- Represents "what if I just guessed the average every time?"
- RMSE of null model = standard deviation of target

### For Classification:
- Predicts the **mode** (most frequent class)
- Represents "what if I just guessed the majority class every time?"

### Setup: Create Sample Data

In [3]:
# Create simple regression data
np.random.seed(42)
n = 200

# Features
x1 = np.random.randn(n)
x2 = np.random.randn(n)
x3 = np.random.randn(n)

# Target: linear relationship + noise
y = 10 + 2 * x1 + 1.5 * x2 - 0.8 * x3 + np.random.randn(n) * 2

data = pd.DataFrame({
    'x1': x1,
    'x2': x2,
    'x3': x3,
    'y': y
})

# Train/test split
train_data = data.iloc[:150]
test_data = data.iloc[150:]

print(f"Training observations: {len(train_data)}")
print(f"Testing observations: {len(test_data)}")
print(f"\nTarget statistics (training):")
print(f"  Mean: {train_data['y'].mean():.4f}")
print(f"  Std:  {train_data['y'].std():.4f}")
print(f"  Min:  {train_data['y'].min():.4f}")
print(f"  Max:  {train_data['y'].max():.4f}")

Training observations: 150
Testing observations: 50

Target statistics (training):
  Mean: 9.9444
  Std:  3.3383
  Min:  -0.8428
  Max:  19.4448


### Fit null_model() - Predicts Mean

In [4]:
# Create null model specification
spec_null = null_model()

print("Null Model Specification:")
print(spec_null)
print("\nThis model will predict the mean of the training target for all observations.")

Null Model Specification:
ModelSpec(model_type='null_model', engine='parsnip', mode='regression', args={})

This model will predict the mean of the training target for all observations.


In [5]:
# Fit null model (no features needed, but we'll use the formula for consistency)
fit_null = spec_null.fit(formula='y ~ 1', data=train_data).evaluate(test_data)

# Generate predictions
pred_null = fit_null.predict(test_data)

print("‚úì Null model fitted")
print(f"\nPredicted value (constant): {pred_null['.pred'].iloc[0]:.4f}")
print(f"Training mean:              {train_data['y'].mean():.4f}")
print("\nNote: All predictions are identical (the training mean)")

‚úì Null model fitted

Predicted value (constant): 9.9444
Training mean:              9.9444

Note: All predictions are identical (the training mean)


### Compare to Linear Regression and Random Forest

In [6]:
# Fit linear regression
spec_linear = linear_reg()
fit_linear = spec_linear.fit(formula='y ~ x1 + x2 + x3', data=train_data).evaluate(test_data)
pred_linear = fit_linear.predict(test_data)

print("‚úì Linear regression fitted")

‚úì Linear regression fitted


In [7]:
# Fit random forest
spec_rf = rand_forest(trees=100).set_mode('regression')
fit_rf = spec_rf.fit(formula='y ~ x1 + x2 + x3', data=train_data).evaluate(test_data)
pred_rf = fit_rf.predict(test_data)

print("‚úì Random forest fitted")

‚úì Random forest fitted


# Extract stats DataFrames
_, _, stats_null = fit_null.extract_outputs()
_, _, stats_linear = fit_linear.extract_outputs()
_, _, stats_rf = fit_rf.extract_outputs()

print("Null Model Performance (Test Set):")
print("=" * 50)
test_stats_null = stats_null[stats_null['split'] == 'test']
for _, row in test_stats_null.iterrows():
    metric = row['metric']
    value = row['value']
    # Format numeric values, convert others to string
    if isinstance(value, (int, float, np.integer, np.floating)):
        print(f"{metric:12s}: {value:.6f}")
    else:
        print(f"{metric:12s}: {value}")

In [8]:
# Extract stats DataFrames
_, _, stats_null = fit_null.extract_outputs()
_, _, stats_linear = fit_linear.extract_outputs()
_, _, stats_rf = fit_rf.extract_outputs()

print("Null Model Performance (Test Set):")
print("=" * 50)
test_stats_null = stats_null[stats_null['split'] == 'test']
for _, row in test_stats_null.iterrows():
    metric = str(row['metric'])
    value = row['value']
    # Format numeric values, convert others to string
    if isinstance(value, (int, float)):
        print(f"{metric:12s}: {value:.6f}")
    else:
        print(f"{metric:12s}: {value}")

Null Model Performance (Test Set):
rmse        : 3.266588
mae         : 2.647944
mape        : 28.318774
r_squared   : -0.056764


In [9]:
# Visual comparison
fig = plot_model_comparison(
    stats_list=[stats_null, stats_linear, stats_rf],
    model_names=["Null Model (Mean)", "Linear Regression", "Random Forest"],
    metrics=["rmse", "mae", "r_squared"],
    split="test",
    plot_type="bar",
    title="Model Performance vs Null Baseline",
    height=500
)

fig.show()

print("\nüìä Interpretation:")
print("  ‚Ä¢ Null model: RMSE ‚âà Std(y) - no learning, just predicts mean")
print("  ‚Ä¢ Linear & RF: RMSE < Std(y) - they've learned something!")
print("  ‚Ä¢ R¬≤ = 0 for null model (baseline)")
print("  ‚Ä¢ If R¬≤ < 0, your model is WORSE than null model!")


üìä Interpretation:
  ‚Ä¢ Null model: RMSE ‚âà Std(y) - no learning, just predicts mean
  ‚Ä¢ Linear & RF: RMSE < Std(y) - they've learned something!
  ‚Ä¢ R¬≤ = 0 for null model (baseline)
  ‚Ä¢ If R¬≤ < 0, your model is WORSE than null model!


### Key Insights: null_model()

1. **RMSE of null model ‚âà Standard deviation of target**
   - This is the "no learning" baseline
   - Any model with RMSE > œÉ(y) is worse than predicting the mean

2. **R¬≤ is defined relative to null model**
   - R¬≤ = 1 - (SSE / SST) where SST = variance explained by null model
   - R¬≤ = 0 means you're as good as null model
   - R¬≤ < 0 means you're WORSE than null model

3. **Use cases:**
   - First baseline for ANY regression problem
   - Sanity check before complex modeling
   - Quick benchmark for model selection

## Part 2: naive_reg() - Time Series Baselines

For time series forecasting, we need different baselines that account for temporal structure.

### Three Naive Methods:

1. **Naive (Random Walk)**
   - Formula: ≈∑_t = y_{t-1}
   - Prediction: "Tomorrow will be like today"
   - Best for: Random walks, stock prices

2. **Seasonal Naive**
   - Formula: ≈∑_t = y_{t-seasonal_period}
   - Prediction: "This Monday will be like last Monday"
   - Best for: Strong seasonal patterns

3. **Drift Method**
   - Formula: ≈∑_t = y_{t-1} + (y_T - y_1) / (T - 1)
   - Prediction: "Continue the linear trend"
   - Best for: Data with consistent trend

### Setup: Create Time Series Data

In [10]:
# Create time series with trend + seasonality
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=365, freq='D')
time_index = np.arange(len(dates))

# Components
trend = time_index * 0.3  # Upward trend
weekly_season = 10 * np.sin(2 * np.pi * time_index / 7)  # Weekly pattern
noise = np.random.randn(len(dates)) * 3

y = trend + weekly_season + noise + 100

ts_data = pd.DataFrame({
    'date': dates,
    'value': y
})

# Split: 80% train, 20% test
split = initial_time_split(ts_data, prop=0.8)
train_ts = split.training()
test_ts = split.testing()

print(f"Training: {len(train_ts)} observations ({train_ts['date'].min()} to {train_ts['date'].max()})")
print(f"Testing:  {len(test_ts)} observations ({test_ts['date'].min()} to {test_ts['date'].max()})")
print(f"\nData characteristics:")
print(f"  - Trend: +0.3 per day")
print(f"  - Seasonality: Weekly (period=7)")
print(f"  - Noise: œÉ=3")

Training: 292 observations (2020-01-01 00:00:00 to 2020-10-18 00:00:00)
Testing:  73 observations (2020-10-19 00:00:00 to 2020-12-30 00:00:00)

Data characteristics:
  - Trend: +0.3 per day
  - Seasonality: Weekly (period=7)
  - Noise: œÉ=3


### Method 1: Naive (Last Value)

In [11]:
# Create naive model (method="naive" is default)
spec_naive = naive_reg(method="naive")

print("Naive Model (Random Walk):")
print(spec_naive)
print("\nPrediction strategy: ≈∑_t = y_{t-1}")
print("Use case: Random walks, efficient markets")

Naive Model (Random Walk):
ModelSpec(model_type='naive_reg', engine='parsnip', mode='regression', args={'method': 'naive'})

Prediction strategy: ≈∑_t = y_{t-1}
Use case: Random walks, efficient markets


In [12]:
# Fit naive model
fit_naive = spec_naive.fit(formula='value ~ 1', data=train_ts).evaluate(test_ts)
pred_naive = fit_naive.predict(test_ts)

print("‚úì Naive model fitted")
print(f"\nLast training value: {train_ts['value'].iloc[-1]:.4f}")
print(f"First prediction:    {pred_naive['.pred'].iloc[0]:.4f}")
print("\nNote: First prediction = last training value (random walk)")

‚úì Naive model fitted

Last training value: 181.4822
First prediction:    181.4822

Note: First prediction = last training value (random walk)


### Method 2: Seasonal Naive (Last Seasonal Value)

In [13]:
# Create seasonal naive model (weekly seasonality = 7 days)
spec_snaive = naive_reg(seasonal_period=7, method="seasonal_naive")

print("Seasonal Naive Model:")
print(spec_snaive)
print("\nPrediction strategy: ≈∑_t = y_{t-7}")
print("Use case: Strong seasonal patterns (e.g., Monday like last Monday)")

Seasonal Naive Model:
ModelSpec(model_type='naive_reg', engine='parsnip', mode='regression', args={'method': 'seasonal_naive', 'seasonal_period': 7})

Prediction strategy: ≈∑_t = y_{t-7}
Use case: Strong seasonal patterns (e.g., Monday like last Monday)


In [14]:
# Fit seasonal naive model
fit_snaive = spec_snaive.fit(formula='value ~ 1', data=train_ts).evaluate(test_ts)
pred_snaive = fit_snaive.predict(test_ts)

print("‚úì Seasonal naive model fitted")
print("\nPrediction logic:")
print("  - First test date (Thursday): uses last Thursday's value")
print("  - Each day: uses same weekday from 7 days ago")

‚úì Seasonal naive model fitted

Prediction logic:
  - First test date (Thursday): uses last Thursday's value
  - Each day: uses same weekday from 7 days ago


### Method 3: Drift (Linear Trend)

In [15]:
# Create drift model
spec_drift = naive_reg(method="drift")

print("Drift Model:")
print(spec_drift)
print("\nPrediction strategy: ≈∑_t = y_{t-1} + (y_T - y_1) / (T - 1)")
print("Use case: Data with consistent linear trend")

Drift Model:
ModelSpec(model_type='naive_reg', engine='parsnip', mode='regression', args={'method': 'drift'})

Prediction strategy: ≈∑_t = y_{t-1} + (y_T - y_1) / (T - 1)
Use case: Data with consistent linear trend


In [16]:
# Fit drift model
fit_drift = spec_drift.fit(formula='value ~ 1', data=train_ts).evaluate(test_ts)
pred_drift = fit_drift.predict(test_ts)

# Calculate drift rate
first_val = train_ts['value'].iloc[0]
last_val = train_ts['value'].iloc[-1]
n_obs = len(train_ts)
drift_rate = (last_val - first_val) / (n_obs - 1)

print("‚úì Drift model fitted")
print(f"\nDrift rate: {drift_rate:.4f} per day")
print(f"First training value: {first_val:.4f}")
print(f"Last training value:  {last_val:.4f}")
print(f"\nPredictions extrapolate this linear trend forward")

‚úì Drift model fitted

Drift rate: 0.2749 per day
First training value: 101.4901
Last training value:  181.4822

Predictions extrapolate this linear trend forward


### Compare All Three Naive Methods

In [17]:
# Extract stats
_, _, stats_naive = fit_naive.extract_outputs()
_, _, stats_snaive = fit_snaive.extract_outputs()
_, _, stats_drift = fit_drift.extract_outputs()

# Visual comparison
fig = plot_model_comparison(
    stats_list=[stats_naive, stats_snaive, stats_drift],
    model_names=["Naive (Last Value)", "Seasonal Naive (Weekly)", "Drift (Trend)"],
    metrics=["rmse", "mae", "r_squared"],
    split="test",
    plot_type="bar",
    title="Naive Methods Comparison",
    height=500
)

fig.show()

print("\nüìä Which naive method performs best?")
# Filter for test split and rmse metric - extract scalar properly
def get_metric_value(stats_df, metric_name, split_name='test'):
    filtered = stats_df[(stats_df['metric'] == metric_name) & (stats_df['split'] == split_name)]
    if len(filtered) > 0:
        val = filtered['value'].iloc[0]
        # If val is a DataFrame, extract the scalar
        if isinstance(val, pd.DataFrame):
            return val.iloc[0, 0]
        return float(val)
    return None

test_rmses = {
    "Naive": get_metric_value(stats_naive, 'rmse'),
    "Seasonal Naive": get_metric_value(stats_snaive, 'rmse'),
    "Drift": get_metric_value(stats_drift, 'rmse')
}

best_method = min(test_rmses, key=test_rmses.get)
print(f"\nBest naive method: {best_method}")
print(f"  RMSE: {test_rmses[best_method]:.4f}")
print("\nExpected: Seasonal Naive should win (data has weekly seasonality)")


üìä Which naive method performs best?

Best naive method: Drift
  RMSE: 9.9784

Expected: Seasonal Naive should win (data has weekly seasonality)


### Compare Naive Methods to ML Models

In [18]:
# Build ML models with feature engineering
rec = (
    recipe()
    .step_date('date', features=['dow', 'week'])
    .step_lag(['value'], lags=[1, 7, 14])  # Fixed: pass list ['value'] not string 'value'
    .step_naomit()  # Remove rows with NaN from lag features
    .step_normalize(['value_lag_1', 'value_lag_7', 'value_lag_14'])
)

# Linear regression with features
wf_linear = (
    workflow()
    .add_recipe(rec)
    .add_model(linear_reg())
)
fit_linear_ts = wf_linear.fit(train_ts).evaluate(test_ts)

# Random forest with features
wf_rf = (
    workflow()
    .add_recipe(rec)
    .add_model(rand_forest(trees=100).set_mode('regression'))
)
fit_rf_ts = wf_rf.fit(train_ts).evaluate(test_ts)

print("‚úì ML models fitted with date features and lags")

‚úì ML models fitted with date features and lags


In [19]:
# Extract stats
_, _, stats_linear_ts = fit_linear_ts.extract_outputs()
_, _, stats_rf_ts = fit_rf_ts.extract_outputs()

# Helper function to extract scalar values from stats DataFrame
def get_metric_value(stats_df, metric_name, split_name='test'):
    filtered = stats_df[(stats_df['metric'] == metric_name) & (stats_df['split'] == split_name)]
    if len(filtered) > 0:
        val = filtered['value'].iloc[0]
        # If val is a DataFrame, extract the scalar
        if isinstance(val, pd.DataFrame):
            return val.iloc[0, 0]
        return float(val)
    return None

# Compare all models
fig = plot_model_comparison(
    stats_list=[
        stats_naive, stats_snaive, stats_drift,
        stats_linear_ts, stats_rf_ts
    ],
    model_names=[
        "Naive", "Seasonal Naive", "Drift",
        "Linear Regression", "Random Forest"
    ],
    metrics=["rmse", "mae", "r_squared"],
    split="test",
    plot_type="bar",
    title="Naive Baselines vs ML Models",
    height=500,
    width=900
)

fig.show()

print("\nüìä Key Question: Do ML models beat the baselines?")
print("\nTest RMSE:")
all_rmses = {
    "Naive": get_metric_value(stats_naive, 'rmse'),
    "Seasonal Naive": get_metric_value(stats_snaive, 'rmse'),
    "Drift": get_metric_value(stats_drift, 'rmse'),
    "Linear Regression": get_metric_value(stats_linear_ts, 'rmse'),
    "Random Forest": get_metric_value(stats_rf_ts, 'rmse')
}

for model, rmse in sorted(all_rmses.items(), key=lambda x: x[1]):
    print(f"  {model:20s}: {rmse:.4f}")

best_overall = min(all_rmses, key=all_rmses.get)
print(f"\nüèÜ Best model: {best_overall}")

# Check if ML beat best naive
best_naive_rmse = min([all_rmses[k] for k in ["Naive", "Seasonal Naive", "Drift"]])
ml_beat_naive = min(all_rmses['Linear Regression'], all_rmses['Random Forest']) < best_naive_rmse

if ml_beat_naive:
    print("\n‚úì ML models beat naive baselines - they're adding value!")
else:
    print("\n‚ö†Ô∏è  ML models didn't beat naive baselines - reconsider your approach!")


üìä Key Question: Do ML models beat the baselines?

Test RMSE:
  Linear Regression   : 2.7982
  Drift               : 9.9784
  Random Forest       : 10.1583
  Seasonal Naive      : 15.1230
  Naive               : 19.4213

üèÜ Best model: Linear Regression

‚úì ML models beat naive baselines - they're adding value!


### When to Use Each Naive Method

| Method | Best For | Example Use Cases |
|--------|----------|------------------|
| **Naive** | Random walks, no pattern | Stock prices, exchange rates |
| **Seasonal Naive** | Strong seasonality | Retail sales, website traffic, temperature |
| **Drift** | Consistent linear trend | Population growth, cumulative metrics |

### Rule of Thumb:
1. **Always** try all three naive methods
2. The best naive method is your **minimum acceptable baseline**
3. If your ML model can't beat it, **don't deploy the ML model**

## Part 3: Benchmarking Workflow

Here's a recommended workflow for any modeling project:

In [20]:
# Step 1: Fit baseline(s)
print("Step 1: Establish Baseline")
print("=" * 50)

# Helper function to extract scalar values from stats DataFrame
def get_metric_value(stats_df, metric_name, split_name='test'):
    filtered = stats_df[(stats_df['metric'] == metric_name) & (stats_df['split'] == split_name)]
    if len(filtered) > 0:
        val = filtered['value'].iloc[0]
        if isinstance(val, pd.DataFrame):
            return val.iloc[0, 0]
        return float(val)
    return None

# For regression: null_model
baseline = null_model()
fit_baseline = baseline.fit(formula='y ~ 1', data=train_data).evaluate(test_data)
_, _, stats_baseline = fit_baseline.extract_outputs()
baseline_rmse = get_metric_value(stats_baseline, 'rmse')

print(f"Baseline RMSE: {baseline_rmse:.4f}")
print("This is the minimum bar. Any model must beat this.\n")

Step 1: Establish Baseline
Baseline RMSE: 3.2666
This is the minimum bar. Any model must beat this.



In [21]:
# Step 2: Try simple model
print("Step 2: Try Simple Model (Linear Regression)")
print("=" * 50)

# Helper function
def get_metric_value(stats_df, metric_name, split_name='test'):
    filtered = stats_df[(stats_df['metric'] == metric_name) & (stats_df['split'] == split_name)]
    if len(filtered) > 0:
        val = filtered['value'].iloc[0]
        if isinstance(val, pd.DataFrame):
            return val.iloc[0, 0]
        return float(val)
    return None

simple_model = linear_reg()
fit_simple = simple_model.fit(formula='y ~ x1 + x2 + x3', data=train_data).evaluate(test_data)
_, _, stats_simple = fit_simple.extract_outputs()
simple_rmse = get_metric_value(stats_simple, 'rmse')

print(f"Simple Model RMSE: {simple_rmse:.4f}")

improvement_pct = ((baseline_rmse - simple_rmse) / baseline_rmse) * 100
print(f"Improvement: {improvement_pct:.1f}% over baseline")

if simple_rmse < baseline_rmse:
    print("‚úì Simple model beats baseline - proceed to complex models\n")
else:
    print("‚ö†Ô∏è  Simple model doesn't beat baseline - check data quality!\n")

Step 2: Try Simple Model (Linear Regression)
Simple Model RMSE: 1.9269
Improvement: 41.0% over baseline
‚úì Simple model beats baseline - proceed to complex models



In [22]:
# Step 3: Try complex model (only if simple model worked)
print("Step 3: Try Complex Model (Random Forest)")
print("=" * 50)

# Helper function
def get_metric_value(stats_df, metric_name, split_name='test'):
    filtered = stats_df[(stats_df['metric'] == metric_name) & (stats_df['split'] == split_name)]
    if len(filtered) > 0:
        val = filtered['value'].iloc[0]
        if isinstance(val, pd.DataFrame):
            return val.iloc[0, 0]
        return float(val)
    return None

complex_model = rand_forest(trees=100).set_mode('regression')
fit_complex = complex_model.fit(formula='y ~ x1 + x2 + x3', data=train_data).evaluate(test_data)
_, _, stats_complex = fit_complex.extract_outputs()
complex_rmse = get_metric_value(stats_complex, 'rmse')

print(f"Complex Model RMSE: {complex_rmse:.4f}")

improvement_vs_simple = ((simple_rmse - complex_rmse) / simple_rmse) * 100
improvement_vs_baseline = ((baseline_rmse - complex_rmse) / baseline_rmse) * 100

print(f"Improvement vs simple model: {improvement_vs_simple:.1f}%")
print(f"Improvement vs baseline:     {improvement_vs_baseline:.1f}%")

if complex_rmse < simple_rmse * 0.95:  # At least 5% better
    print("‚úì Complex model significantly better - use it!")
elif complex_rmse < simple_rmse:
    print("‚ö†Ô∏è  Complex model slightly better - maybe not worth the complexity")
else:
    print("‚ö†Ô∏è  Complex model not better - stick with simple model!")

Step 3: Try Complex Model (Random Forest)
Complex Model RMSE: 2.1900
Improvement vs simple model: -13.7%
Improvement vs baseline:     33.0%
‚ö†Ô∏è  Complex model not better - stick with simple model!


In [23]:
# Step 4: Summarize
print("\n" + "=" * 50)
print("FINAL RECOMMENDATION")
print("=" * 50)

print("\nPerformance Ladder:")
results = [
    ("Baseline (null_model)", baseline_rmse),
    ("Simple (linear_reg)", simple_rmse),
    ("Complex (rand_forest)", complex_rmse)
]

for i, (name, rmse) in enumerate(sorted(results, key=lambda x: x[1]), 1):
    print(f"  {i}. {name:25s} RMSE = {rmse:.4f}")

print("\nüìä Decision Framework:")
print("  1. If NOTHING beats baseline ‚Üí Don't use ML, predict mean")
print("  2. If only simple beats baseline ‚Üí Use simple model")
print("  3. If complex beats simple by >5% ‚Üí Use complex model")
print("  4. If complex beats simple by <5% ‚Üí Stick with simple (interpretability)")


FINAL RECOMMENDATION

Performance Ladder:
  1. Simple (linear_reg)       RMSE = 1.9269
  2. Complex (rand_forest)     RMSE = 2.1900
  3. Baseline (null_model)     RMSE = 3.2666

üìä Decision Framework:
  1. If NOTHING beats baseline ‚Üí Don't use ML, predict mean
  2. If only simple beats baseline ‚Üí Use simple model
  3. If complex beats simple by >5% ‚Üí Use complex model
  4. If complex beats simple by <5% ‚Üí Stick with simple (interpretability)


## Part 4: Extract Outputs - Three-DataFrame Structure

All fitted models in py-tidymodels return three DataFrames via `extract_outputs()`:
1. **outputs**: Predictions with actuals
2. **residuals**: Residual diagnostics
3. **stats**: Performance metrics

Let's see this for baseline models:

In [24]:
# Extract from null model
outputs_null, residuals_null, stats_null_full = fit_null.extract_outputs()

print("DataFrame 1: outputs (predictions + actuals)")
print("=" * 50)
print(outputs_null.head(10))
print(f"\nShape: {outputs_null.shape}")
print(f"Columns: {list(outputs_null.columns)}")

DataFrame 1: outputs (predictions + actuals)
     actuals    fitted  forecast  residuals  split
0  14.319629  9.944448  9.944448   4.375181  train
1   9.199818  9.944448  9.944448  -0.744630  train
2  14.654971  9.944448  9.944448   4.710523  train
3  17.300454  9.944448  9.944448   7.356006  train
4   8.652111  9.944448  9.944448  -1.292337  train
5  11.380300  9.944448  9.944448   1.435852  train
6  13.237496  9.944448  9.944448   3.293048  train
7   9.930143  9.944448  9.944448  -0.014305  train
8   6.179946  9.944448  9.944448  -3.764502  train
9  19.444755  9.944448  9.944448   9.500307  train

Shape: (200, 5)
Columns: ['actuals', 'fitted', 'forecast', 'residuals', 'split']


In [25]:
print("\nDataFrame 2: residuals (diagnostic info)")
print("=" * 50)
print(residuals_null.head(10))
print(f"\nShape: {residuals_null.shape}")
print(f"Columns: {list(residuals_null.columns)}")


DataFrame 2: residuals (diagnostic info)
      variable  coefficient  std_error  p_value  ci_0.025  ci_0.975
0  (Intercept)     9.944448        NaN      NaN       NaN       NaN

Shape: (1, 6)
Columns: ['variable', 'coefficient', 'std_error', 'p_value', 'ci_0.025', 'ci_0.975']


In [26]:
print("\nDataFrame 3: stats (performance metrics)")
print("=" * 50)
print(stats_null_full)
print(f"\nShape: {stats_null_full.shape}")
print(f"\nMetrics available: {stats_null_full['metric'].unique()}")
print(f"Splits: {stats_null_full['split'].unique()}")


DataFrame 3: stats (performance metrics)
           metric      value  split
0            rmse   3.327120  train
1             mae   2.653543  train
2            mape  41.136218  train
3       r_squared   0.000000  train
4  baseline_value   9.944448  train
5            rmse   3.266588   test
6             mae   2.647944   test
7            mape  28.318774   test
8       r_squared  -0.056764   test

Shape: (9, 3)

Metrics available: ['rmse' 'mae' 'mape' 'r_squared' 'baseline_value']
Splits: ['train' 'test']


In [27]:
# Same structure for naive_reg
outputs_naive, residuals_naive, stats_naive_full = fit_naive.extract_outputs()

print("naive_reg() also returns three DataFrames:")
print("=" * 50)
print(f"outputs:   {outputs_naive.shape}")
print(f"residuals: {residuals_naive.shape}")
print(f"stats:     {stats_naive_full.shape}")
print("\nConsistent structure across ALL models in py-tidymodels!")

naive_reg() also returns three DataFrames:
outputs:   (365, 5)
residuals: (1, 6)
stats:     (9, 3)

Consistent structure across ALL models in py-tidymodels!


## Summary: Baseline Models

### null_model() - Universal Baseline

**What it does:**
- Regression: Predicts mean (or median) of training target
- Classification: Predicts mode (most frequent class)

**When to use:**
- First baseline for ANY regression/classification problem
- Reality check before complex modeling
- Quick benchmark for model selection

**Key insight:**
- RMSE of null model ‚âà œÉ(y)
- R¬≤ = 0 by definition
- If your model has R¬≤ < 0, it's worse than null model!

### naive_reg() - Time Series Baselines

**Three methods:**

1. **Naive (Last Value)**
   - Best for: Random walks, no pattern
   - Formula: ≈∑_t = y_{t-1}

2. **Seasonal Naive**
   - Best for: Strong seasonality
   - Formula: ≈∑_t = y_{t-seasonal_period}

3. **Drift (Linear Trend)**
   - Best for: Consistent linear trend
   - Formula: ≈∑_t = y_{t-1} + trend

**When to use:**
- Essential baseline for ANY time series forecasting
- Try all three, use the best as minimum bar
- If ML can't beat them, don't use ML!

### The Golden Rule

**If your model can't beat a simple baseline, it's not adding value.**

- Baselines are fast to implement (minutes, not hours)
- Baselines are easy to explain (everyone understands "predict the mean")
- Baselines provide minimum acceptable performance
- Complex models must justify their complexity by beating baselines

### Best Practice Workflow

1. Fit appropriate baseline (null_model or naive_reg)
2. Try simple model (e.g., linear regression)
3. Only try complex models if simple model beats baseline
4. Complex model must beat simple model by significant margin (>5%)
5. Always report performance relative to baseline

**Remember:** The best model is often the simplest one that beats the baseline!