<a href="https://colab.research.google.com/github/john-d-noble/callcenter/blob/main/CB_Step_3_Classical_Time_Series_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [13]:
!pip install numpy pmdarima prophet statsmodels

Collecting numpy
  Downloading numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/62.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━[0m [32m51.2/62.1 kB[0m [31m2.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pmdarima
  Using cached pmdarima-2.0.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.metadata (7.8 kB)
Collecting prophet
  Downloading prophet-1.1.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.5 kB)
Collecting statsmodels
  Downloading statsmodels-0.14.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (9.5 kB)
Downloading numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_

In [2]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
from sklearn.model_selection import TimeSeriesSplit
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Load the updated dataset
df = pd.read_csv('updated_final_merged_data.csv', index_col='Date', parse_dates=True)

# Assume 'Calls' is the target column
target = 'Calls'

# Prepare data: Sort by date if not already
df = df.sort_index()

# Define forecast horizon (e.g., 7 days for weekly)
horizon = 7

# Time series cross-validation: 5 splits
tscv = TimeSeriesSplit(n_splits=5)

# Function to calculate metrics
def calculate_metrics(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mape = mean_absolute_percentage_error(y_true, y_pred) * 100  # As percentage
    return {'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# Dictionary to store average metrics for each model
model_metrics = {}

# 1. ARIMA (using SARIMAX with fixed order (1,1,1), no seasonality)
arima_preds = []
arima_trues = []
for train_idx, test_idx in tscv.split(df):
    train = df.iloc[train_idx][target]
    test = df.iloc[test_idx][target]

    # Fit ARIMA (1,1,1)
    model = SARIMAX(train, order=(1,1,1))
    fit = model.fit(disp=False)

    # Forecast
    pred = fit.forecast(steps=len(test))

    arima_preds.extend(pred)
    arima_trues.extend(test)

arima_metrics = calculate_metrics(arima_trues, arima_preds)
model_metrics['ARIMA'] = arima_metrics

# 2. SARIMA (using SARIMAX with order (1,1,1) and seasonal_order (1,1,1,7))
sarima_preds = []
sarima_trues = []
for train_idx, test_idx in tscv.split(df):
    train = df.iloc[train_idx][target]
    test = df.iloc[test_idx][target]

    # Fit SARIMA (1,1,1)(1,1,1)[7]
    model = SARIMAX(train, order=(1,1,1), seasonal_order=(1,1,1,7))
    fit = model.fit(disp=False)

    # Forecast
    pred = fit.forecast(steps=len(test))

    sarima_preds.extend(pred)
    sarima_trues.extend(test)

sarima_metrics = calculate_metrics(sarima_trues, sarima_preds)
model_metrics['SARIMA'] = sarima_metrics

# 3. Exponential Smoothing (Holt-Winters, additive seasonality)
ets_preds = []
ets_trues = []
for train_idx, test_idx in tscv.split(df):
    train = df.iloc[train_idx][target]
    test = df.iloc[test_idx][target]

    # Fit ETS with additive trend and seasonality (period=7)
    model = ExponentialSmoothing(train, trend='add', seasonal='add', seasonal_periods=7)
    fit = model.fit(optimized=True)

    # Forecast
    pred = fit.forecast(steps=len(test))

    ets_preds.extend(pred)
    ets_trues.extend(test)

ets_metrics = calculate_metrics(ets_trues, ets_preds)
model_metrics['ETS'] = ets_metrics

# Summarize performance
print("\nModel Performance Summary:")
metrics_df = pd.DataFrame(model_metrics).T
print(metrics_df)

# Pick winner: Lowest MAE (primary metric)
winner = metrics_df['MAE'].idxmin()
print(f"\nChampion Classical Model: {winner}")
print(f"Metrics: {metrics_df.loc[winner].to_dict()}")

  df = pd.read_csv('updated_final_merged_data.csv', index_col='Date', parse_dates=True)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  warn('Non-invertible starting MA parameters found.'
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)



Model Performance Summary:
                MAE         RMSE       MAPE
ARIMA   2268.081615  2860.607690  24.432655
SARIMA  2560.832224  3163.069470  28.560471
ETS     2233.644482  2882.916658  22.573863

Champion Classical Model: ETS
Metrics: {'MAE': 2233.6444824990926, 'RMSE': 2882.9166581078257, 'MAPE': 22.573863404704717}


**## ### Combined Performance Summary: Baseline vs. Classical Models**

To provide a comprehensive overview, below are the performance tables for both the baseline models and the classical time series models. These were evaluated using the same metrics—Mean Absolute Error (MAE, in call counts), Root Mean Squared Error (RMSE, penalizing larger errors), and Mean Absolute Percentage Error (MAPE, as a percentage for relative accuracy)—via time-series cross-validation on the filled dataset. The baseline champion (Seasonal Naive) sets a strong benchmark, while the classical models aim to improve upon it by incorporating trends and seasonality more explicitly.

#### Baseline Models Performance
| Model          | MAE       | RMSE      | MAPE     |
|----------------|-----------|-----------|----------|
| Naive          | 2351.46  | 2942.38  | 24.84%  |
| Mean           | 1634.56  | 2154.49  | 18.23%  |
| Median         | 1613.91  | 2177.89  | 17.38%  |
| Seasonal Naive | 907.70   | 1359.05  | 9.67%   |

**Baseline Champion**: Seasonal Naive (lowest MAE of 908 calls and MAPE under 10%, leveraging the EDA's strong weekly seasonality).

#### Classical Models Performance
| Model   | MAE       | RMSE      | MAPE     |
|---------|-----------|-----------|----------|
| ARIMA   | 2268.08  | 2860.61  | 24.43%  |
| SARIMA  | 2560.83  | 3163.07  | 28.56%  |
| ETS     | 2233.64  | 2882.92  | 22.57%  |

**Classical Champion**: ETS (lowest MAE of 2,234 calls and MAPE of 23%, outperforming ARIMA and SARIMA but lagging behind baselines).

### Full Narrative Analysis
The baseline models serve as simple yet effective benchmarks, capturing the essence of the call volume data's patterns without complex parameterization. The Naive approach, which repeats the last observed value, struggles with daily variability (MAE ~2,351, MAPE 25%), while the Mean and Median leverage central tendencies for moderate improvements (MAEs around 1,614-1,635, MAPEs 17-18%). However, the Seasonal Naive dominates baselines by exploiting the EDA's identified weekly cycles, achieving a remarkably low MAE of 908 and MAPE under 10%—proving that straightforward repetition of same-day-last-week values aligns well with the dataset's recurring 7-day patterns, even after imputing weekends/holidays.

Transitioning to classical models, which incorporate differencing for stationarity (per EDA's ADF test) and explicit seasonality, we see mixed results. ARIMA handles non-seasonal trends adequately but ignores weekly effects, resulting in an MAE of 2,268 and MAPE of 24%—better than pure Naive but worse than Seasonal Naive. SARIMA, designed for seasonality (with weekly order), surprisingly performs the worst (MAE 2,561, MAPE 29%), possibly due to overfitting on outliers or noise in the filled data, failing to generalize despite the EDA's decomposition highlighting strong periodic components. ETS (Holt-Winters) fares best among classics, with an MAE of 2,234 and MAPE of 23%, as its smoothing of additive trends and seasonality provides stability amid the rolling volatility noted in EDA plots.

Comparatively, the classical models underperform the baseline champion: ETS's errors are over 2x higher than Seasonal Naive's, suggesting that the data's patterns are more effectively captured by simple seasonal persistence than by these more sophisticated univariate methods. This reinforces the EDA's emphasis on dominant weekly seasonality while indicating potential limitations like sensitivity to imputed values or insufficient handling of market correlations (e.g., VIX/CVOL). To advance, we should prioritize multivariate extensions (e.g., adding regressors to Prophet or SARIMAX) or machine learning hybrids in the next tier, aiming to beat the baseline's 10% MAPE threshold. If not, the efficient Seasonal Naive remains a practical choice for call center forecasting.