<a href="https://colab.research.google.com/github/john-d-noble/callcenter/blob/main/CB_Step_2_Baseline_Models_(Simple_Benchmarks).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:

import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
from sklearn.model_selection import TimeSeriesSplit

# Load the updated dataset
df = pd.read_csv('updated_final_merged_data.csv', index_col='Date', parse_dates=True)

# Assume 'Calls' is the target column
target = 'Calls'

# Prepare data: Sort by date if not already
df = df.sort_index()

# Define forecast horizon (e.g., 7 days for weekly)
horizon = 7

# Time series cross-validation: 5 splits
tscv = TimeSeriesSplit(n_splits=5)

# Function to calculate metrics
def calculate_metrics(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mape = mean_absolute_percentage_error(y_true, y_pred) * 100  # As percentage
    return {'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# Dictionary to store average metrics for each model
model_metrics = {}

# 1. Naive Forecast (Last observed value)
naive_preds = []
naive_trues = []
for train_idx, test_idx in tscv.split(df):
    train = df.iloc[train_idx]
    test = df.iloc[test_idx]

    # Predict last train value for all test points
    last_value = train[target].iloc[-1]
    pred = np.full(len(test), last_value)

    naive_preds.extend(pred)
    naive_trues.extend(test[target])

naive_metrics = calculate_metrics(naive_trues, naive_preds)
model_metrics['Naive'] = naive_metrics

# 2. Mean Forecast (Overall mean)
mean_value = df[target].mean()  # Global mean
mean_preds = np.full(len(df), mean_value)
mean_metrics = calculate_metrics(df[target], mean_preds)  # Evaluate on full data since it's constant
model_metrics['Mean'] = mean_metrics

# Median Forecast (Overall median)
median_value = df[target].median()
median_preds = np.full(len(df), median_value)
median_metrics = calculate_metrics(df[target], median_preds)
model_metrics['Median'] = median_metrics

# 3. Seasonal Naive (Same day last week, lag=7)
seasonal_preds = []
seasonal_trues = []
for train_idx, test_idx in tscv.split(df):
    train = df.iloc[train_idx]
    test = df.iloc[test_idx]

    # For each test point, predict the value from 7 days ago (if available)
    pred = []
    for i in test_idx:
        lag_idx = i - 7
        if lag_idx >= 0:
            pred.append(df.iloc[lag_idx][target])
        else:
            pred.append(train[target].mean())  # Fallback if no lag

    seasonal_preds.extend(pred)
    seasonal_trues.extend(test[target])

seasonal_metrics = calculate_metrics(seasonal_trues, seasonal_preds)
model_metrics['Seasonal Naive'] = seasonal_metrics

# Summarize performance
print("\nModel Performance Summary:")
metrics_df = pd.DataFrame(model_metrics).T
print(metrics_df)

# Pick winner: Lowest MAE (primary metric)
winner = metrics_df['MAE'].idxmin()
print(f"\nChampion Baseline Model: {winner}")
print(f"Metrics: {metrics_df.loc[winner].to_dict()}")



Model Performance Summary:
                        MAE         RMSE       MAPE
Naive           2351.456790  2942.377655  24.836730
Mean            1634.558511  2154.487520  18.230934
Median          1613.907787  2177.893679  17.377252
Seasonal Naive   907.700000  1359.046947   9.665718

Champion Baseline Model: Seasonal Naive
Metrics: {'MAE': 907.7, 'RMSE': 1359.0469468357978, 'MAPE': 9.665718451288303}


  df = pd.read_csv('updated_final_merged_data.csv', index_col='Date', parse_dates=True)


Summary: The baseline models evaluated provide a foundational benchmark for forecasting call center volume, helping us gauge the effectiveness of more sophisticated approaches later. These simple methods—Naive (repeating the last observed value), Mean (using the overall average), Median (using the overall median), and Seasonal Naive (repeating the value from the same day last week)—were tested using time-series cross-validation on the filled dataset. Performance was measured by Mean Absolute Error (MAE, for average deviation in calls), Root Mean Squared Error (RMSE, emphasizing larger errors), and Mean Absolute Percentage Error (MAPE, for relative accuracy).
The Naive model, which assumes persistence from the immediate prior day, performed the worst with an MAE of about 2,351 calls, RMSE of 2,942, and MAPE of 25%. This indicates that day-to-day changes in call volume are significant, making short-term repetition unreliable. The Mean and Median models improved upon this by leveraging central tendencies across the entire dataset, achieving MAEs of 1,635 and 1,614 calls respectively, with MAPEs around 18% and 17%. The slight edge of the Median over the Mean in MAE and MAPE suggests the data's distribution is somewhat skewed (as hinted in the EDA's histogram), where outliers pull the mean away from typical values.
However, the Seasonal Naive model stands out as the clear champion, with the lowest errors across all metrics: MAE of 908 calls, RMSE of 1,359, and MAPE under 10%. This superior performance underscores the strong weekly seasonality identified in the EDA (e.g., via decomposition and day-of-week averages), where patterns repeat every 7 days due to consistent business cycles, even after filling weekends/holidays. By accounting for this, the Seasonal Naive reduces errors by over 40% compared to the next-best baseline (Median), proving that incorporating basic seasonality yields substantial gains.
Overall, these results validate the EDA's findings on seasonality as a key driver and set a benchmark MAPE of ~10% for advanced models (e.g., SARIMA or Prophet) to beat. If they can't surpass this, it might suggest overcomplicating the forecast is unnecessary for this dataset. This positions us well to iterate toward a more accurate champion in subsequent modeling tiers.