# Demand Planning Regression with TabPFN

This notebook demonstrates how to use **TabPFN** for regression tasks in retail/CPG demand planning.

TabPFN provides state-of-the-art regression performance with built-in uncertainty quantification, making it ideal for scenarios where understanding prediction confidence is important.

**Use Cases Covered:**
1. **Price Elasticity Prediction** - Understand how price changes affect demand
2. **Promotion Lift Prediction** - Predict the sales impact of planned promotions

**Business Value:**
- Optimize pricing strategies with elasticity insights
- Plan promotions with accurate ROI forecasts
- Improve demand forecast accuracy

**Prerequisites:** Run `00_data_preparation` notebook first to set up the datasets.

**References:**
- [TabPFN Client GitHub](https://github.com/PriorLabs/tabpfn-client)
- [Prior Labs Documentation](https://docs.priorlabs.ai/)

## Compute Setup

We recommend running this notebook on **Serverless Compute** with the **Base Environment V4**.

## 1. Installation

In [None]:
%pip install tabpfn-client scikit-learn pandas matplotlib seaborn --quiet

In [None]:
dbutils.library.restartPython()

## 2. Authentication

See the `01_classification` notebook for detailed instructions on setting up Databricks Secrets.

In [None]:
import tabpfn_client

token = dbutils.secrets.get(scope="tabpfn-client", key="token")
tabpfn_client.set_access_token(token)

## 3. Configuration

In [None]:
CATALOG = "tabpfn_databricks"
SCHEMA = "default"

spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"USE SCHEMA {SCHEMA}")

## 4. Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

from tabpfn_client import TabPFNRegressor

## 5. Price Elasticity Prediction

**Business Context:** Revenue management and demand planning teams need to understand how price changes affect demand to:
- Set optimal prices for different products and markets
- Plan price increases without losing market share
- Design effective discount strategies

**Price Elasticity:** Measures the % change in demand for a 1% change in price.
- Elasticity of -2.0 means: 1% price increase → 2% demand decrease
- More negative = more price sensitive (elastic)
- Closer to 0 = less price sensitive (inelastic)

In [None]:
# Load Price Elasticity dataset from Delta table
df_elasticity = spark.table("price_elasticity").toPandas()

print(f"Dataset shape: {df_elasticity.shape}")
print(f"\nFeatures:")
print([col for col in df_elasticity.columns if col != 'price_elasticity'])
print(f"\nTarget (price_elasticity) statistics:")
print(df_elasticity['price_elasticity'].describe())

In [None]:
# Visualize elasticity distribution by category
fig, ax = plt.subplots(figsize=(10, 6))
df_elasticity.boxplot(column='price_elasticity', by='category', ax=ax)
ax.set_title('Price Elasticity by Product Category')
ax.set_xlabel('Category')
ax.set_ylabel('Price Elasticity')
plt.suptitle('')  # Remove automatic title
plt.tight_layout()
plt.show()

In [None]:
# Prepare features - encode categorical columns
df_elas_encoded = pd.get_dummies(df_elasticity, 
                                  columns=['category', 'price_tier', 'purchase_frequency'], 
                                  drop_first=True)

# Separate features and target
feature_cols = [col for col in df_elas_encoded.columns if col != 'price_elasticity']
X = df_elas_encoded[feature_cols].values
y = df_elasticity['price_elasticity'].values

print(f"Feature matrix shape: {X.shape}")

In [None]:
# Use a sample for faster demonstration (TabPFN works best with smaller datasets)
np.random.seed(42)
sample_size = min(2000, len(X))
sample_idx = np.random.choice(len(X), size=sample_size, replace=False)
X_sample = X[sample_idx]
y_sample = y[sample_idx]

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X_sample, y_sample, test_size=0.2, random_state=42
)

print(f"Training set size: {len(X_train)}")
print(f"Test set size: {len(X_test)}")

In [None]:
# Initialize and train TabPFN regressor
reg = TabPFNRegressor()
reg.fit(X_train, y_train)

# Make predictions
y_pred = reg.predict(X_test)

# Evaluate performance
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"TabPFN Price Elasticity Prediction Results:")
print(f"  RMSE: {rmse:.4f}")
print(f"  MAE:  {mae:.4f}")
print(f"  R²:   {r2:.4f}")

In [None]:
# Visualize predictions vs actual values
fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(y_test, y_pred, alpha=0.5, edgecolors='none')
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
ax.set_xlabel('Actual Price Elasticity')
ax.set_ylabel('Predicted Price Elasticity')
ax.set_title(f'Price Elasticity: Predicted vs Actual (R² = {r2:.3f})')
plt.tight_layout()
plt.show()

## 6. Uncertainty Quantification for Price Elasticity

TabPFN can provide prediction intervals, which is valuable for understanding model confidence in elasticity estimates.

In [None]:
# Get predictions with uncertainty (quantiles)
# Predict 5th, 50th (median), and 95th percentiles for 90% prediction interval
y_lower = reg.predict(X_test, output_type="quantiles", quantiles=[0.05]).flatten()
y_median = reg.predict(X_test, output_type="quantiles", quantiles=[0.5]).flatten()
y_upper = reg.predict(X_test, output_type="quantiles", quantiles=[0.95]).flatten()

# Calculate coverage (what percentage of true values fall within the prediction interval)
coverage = np.mean((y_test >= y_lower) & (y_test <= y_upper))
print(f"90% Prediction Interval Coverage: {coverage:.1%}")

In [None]:
# Visualize predictions with uncertainty
# Sort by predicted value for better visualization
sort_idx = np.argsort(y_median)
n_show = 50  # Show first 50 samples for clarity

fig, ax = plt.subplots(figsize=(12, 6))
x_range = np.arange(n_show)

ax.fill_between(x_range, 
                y_lower[sort_idx[:n_show]], 
                y_upper[sort_idx[:n_show]], 
                alpha=0.3, color='blue', label='90% Prediction Interval')
ax.plot(x_range, y_median[sort_idx[:n_show]], 'b-', linewidth=2, label='Predicted (median)')
ax.scatter(x_range, y_test[sort_idx[:n_show]], color='red', s=20, label='Actual', zorder=5)

ax.set_xlabel('Sample Index (sorted by prediction)')
ax.set_ylabel('Price Elasticity')
ax.set_title('Price Elasticity Prediction with Uncertainty Quantification')
ax.legend()
plt.tight_layout()
plt.show()

## 7. Promotion Lift Prediction

**Business Context:** Trade promotion managers need to predict the incremental sales lift from promotions to:
- Optimize promotion ROI
- Plan inventory for promotional periods
- Negotiate trade spend with retailers

**Promotion Lift:** The % increase in sales during a promotion compared to baseline.
- Lift of 100% means: Sales double during the promotion
- Higher lift = more effective promotion

In [None]:
# Load Promotion Lift dataset from Delta table
df_promo = spark.table("promotion_lift").toPandas()

print(f"Dataset shape: {df_promo.shape}")
print(f"\nTarget (promotion_lift_pct) statistics:")
print(df_promo['promotion_lift_pct'].describe())
print(f"\nPromotion type distribution:")
print(df_promo['promotion_type'].value_counts())

In [None]:
# Visualize promotion lift by type
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot by promotion type
df_promo.boxplot(column='promotion_lift_pct', by='promotion_type', ax=axes[0])
axes[0].set_title('Promotion Lift by Promotion Type')
axes[0].set_xlabel('Promotion Type')
axes[0].set_ylabel('Lift (%)')
plt.suptitle('')

# Scatter: Discount Depth vs Lift
axes[1].scatter(df_promo['discount_depth_pct'] * 100, df_promo['promotion_lift_pct'], alpha=0.3)
axes[1].set_xlabel('Discount Depth (%)')
axes[1].set_ylabel('Promotion Lift (%)')
axes[1].set_title('Promotion Lift vs Discount Depth')

plt.tight_layout()
plt.show()

In [None]:
# Prepare features - encode categorical columns
df_promo_encoded = pd.get_dummies(df_promo, 
                                   columns=['promotion_type', 'category'], 
                                   drop_first=True)

# Separate features and target
promo_feature_cols = [col for col in df_promo_encoded.columns if col != 'promotion_lift_pct']
X_promo = df_promo_encoded[promo_feature_cols].values
y_promo = df_promo['promotion_lift_pct'].values

print(f"Feature matrix shape: {X_promo.shape}")

# Sample and split
np.random.seed(42)
sample_size = min(2000, len(X_promo))
sample_idx = np.random.choice(len(X_promo), size=sample_size, replace=False)
X_promo_sample = X_promo[sample_idx]
y_promo_sample = y_promo[sample_idx]

X_train_p, X_test_p, y_train_p, y_test_p = train_test_split(
    X_promo_sample, y_promo_sample, test_size=0.2, random_state=42
)

print(f"Training: {len(X_train_p)}, Test: {len(X_test_p)}")

In [None]:
# Train TabPFN regressor for promotion lift
reg_promo = TabPFNRegressor()
reg_promo.fit(X_train_p, y_train_p)

# Make predictions
y_pred_promo = reg_promo.predict(X_test_p)

# Evaluate performance
rmse_promo = np.sqrt(mean_squared_error(y_test_p, y_pred_promo))
mae_promo = mean_absolute_error(y_test_p, y_pred_promo)
r2_promo = r2_score(y_test_p, y_pred_promo)

print(f"TabPFN Promotion Lift Prediction Results:")
print(f"  RMSE: {rmse_promo:.2f}%")
print(f"  MAE:  {mae_promo:.2f}%")
print(f"  R²:   {r2_promo:.4f}")

In [None]:
# Visualize predictions vs actual
fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(y_test_p, y_pred_promo, alpha=0.5, edgecolors='none')
ax.plot([y_test_p.min(), y_test_p.max()], [y_test_p.min(), y_test_p.max()], 'r--', lw=2)
ax.set_xlabel('Actual Promotion Lift (%)')
ax.set_ylabel('Predicted Promotion Lift (%)')
ax.set_title(f'Promotion Lift: Predicted vs Actual (R² = {r2_promo:.3f})')
plt.tight_layout()
plt.show()

## 8. Model Comparison

Let's compare TabPFN with other popular regression models on both use cases.

In [None]:
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import Ridge

# Define models
models = {
    "TabPFN": TabPFNRegressor(),
    "Random Forest": RandomForestRegressor(n_estimators=100, random_state=42),
    "Gradient Boosting": GradientBoostingRegressor(n_estimators=100, random_state=42),
    "Ridge Regression": Ridge(alpha=1.0),
}

# Evaluate each model on promotion lift prediction
print("Promotion Lift Prediction - Model Comparison:")
print("="*60)
results_promo = {}
for name, model in models.items():
    model.fit(X_train_p, y_train_p)
    y_pred_model = model.predict(X_test_p)
    
    results_promo[name] = {
        "RMSE": np.sqrt(mean_squared_error(y_test_p, y_pred_model)),
        "MAE": mean_absolute_error(y_test_p, y_pred_model),
        "R²": r2_score(y_test_p, y_pred_model)
    }
    print(f"{name:20s}: RMSE = {results_promo[name]['RMSE']:.2f}%, R² = {results_promo[name]['R²']:.4f}")

In [None]:
# Visualize comparison
df_results = pd.DataFrame(results_promo).T

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# RMSE comparison (lower is better)
colors = ['#2ecc71' if name == 'TabPFN' else '#3498db' for name in df_results.index]
df_results['RMSE'].sort_values(ascending=False).plot(kind='barh', ax=axes[0], color=colors)
axes[0].set_xlabel('RMSE (%) - Lower is better')
axes[0].set_title('Model Comparison - RMSE')

# R² comparison (higher is better)
df_results['R²'].sort_values().plot(kind='barh', ax=axes[1], color=colors)
axes[1].set_xlabel('R² - Higher is better')
axes[1].set_title('Model Comparison - R²')

plt.tight_layout()
plt.show()

## 9. Business Application: Promotion ROI Calculator

Let's demonstrate how the promotion lift predictions can be used for promotion planning.

In [None]:
# Create sample promotion scenarios
df_test_promo = df_promo.iloc[sample_idx][X_train_p.shape[0]:].copy().reset_index(drop=True)
df_test_promo['predicted_lift_pct'] = y_pred_promo

# Calculate ROI for each promotion
# Assumptions:
# - Average margin: 30%
# - Promo cost = discount_depth * baseline_sales + fixed_cost

avg_margin = 0.30
fixed_promo_cost = 500  # Display, feature ad costs

df_test_promo['baseline_revenue'] = df_test_promo['baseline_weekly_units'] * df_test_promo['base_price_usd']
df_test_promo['predicted_promo_units'] = df_test_promo['baseline_weekly_units'] * (1 + df_test_promo['predicted_lift_pct']/100)
df_test_promo['promo_revenue'] = df_test_promo['predicted_promo_units'] * df_test_promo['base_price_usd'] * (1 - df_test_promo['discount_depth_pct'])
df_test_promo['incremental_revenue'] = df_test_promo['promo_revenue'] - df_test_promo['baseline_revenue']
df_test_promo['promo_cost'] = (df_test_promo['discount_depth_pct'] * df_test_promo['promo_revenue']) + fixed_promo_cost
df_test_promo['incremental_profit'] = df_test_promo['incremental_revenue'] * avg_margin - df_test_promo['promo_cost']
df_test_promo['roi_pct'] = (df_test_promo['incremental_profit'] / df_test_promo['promo_cost']) * 100

print("Top 10 Highest ROI Promotions:")
top_roi = df_test_promo.nlargest(10, 'roi_pct')[[
    'promotion_type', 'category', 'discount_depth_pct', 
    'predicted_lift_pct', 'incremental_profit', 'roi_pct'
]]
top_roi['discount_depth_pct'] = (top_roi['discount_depth_pct'] * 100).round(1).astype(str) + '%'
top_roi['predicted_lift_pct'] = top_roi['predicted_lift_pct'].round(1).astype(str) + '%'
top_roi['roi_pct'] = top_roi['roi_pct'].round(1).astype(str) + '%'
top_roi['incremental_profit'] = '$' + top_roi['incremental_profit'].round(0).astype(int).astype(str)
display(top_roi)

In [None]:
# Summary statistics by promotion type
print("\nAverage Predicted Lift by Promotion Type:")
summary = df_test_promo.groupby('promotion_type').agg({
    'predicted_lift_pct': 'mean',
    'roi_pct': 'mean',
    'promotion_type': 'count'
}).rename(columns={'promotion_type': 'count'})
summary = summary.sort_values('roi_pct', ascending=False)
display(summary.round(1))

## Summary

In this notebook, we demonstrated:

- ✅ **Price Elasticity Prediction** - Predict price sensitivity by product and market
- ✅ **Promotion Lift Prediction** - Forecast incremental sales from promotions
- ✅ **Uncertainty Quantification** - Get prediction intervals for risk assessment
- ✅ **Model Comparison** - TabPFN vs. traditional regression algorithms
- ✅ **Business Application** - Promotion ROI calculation and planning

**Key Takeaways:**
1. TabPFN provides competitive regression performance without hyperparameter tuning
2. Built-in uncertainty quantification enables risk-aware decision making
3. Predictions can be directly integrated into pricing and promotion planning workflows

**Next Steps:**
- Run `03_outlier_detection` notebook for production anomaly detection
- Run `04_time_series_forecasting` notebook for demand forecasting
- Integrate predictions into demand planning systems