<div style="display: flex; justify-content: flex-start; align-items: center;">
    <a href="https://colab.research.google.com/github/msfasha/307304-Data-Mining/blob/main/20251/Module%206-Time%20Series%20Analysis/sarima_real_data_bakery_sales.ipynb" target="_blank">    
        <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" style="height: 25px; margin-right: 20px;">
    </a>
</div>

# SARIMA Model on Real Data: French Bakery Sales

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">SARIMA: Seasonal ARIMA Modeling</h3>
</div>

This notebook demonstrates an end-to-end **SARIMA (Seasonal ARIMA)** workflow on a **real dataset**: Daily sales from a French bakery.

**SARIMA** extends ARIMA to handle **seasonal patterns** in time series data, making it ideal for:
- Daily/weekly/monthly sales data
- Temperature and weather patterns
- Energy consumption
- Any data with repeating seasonal cycles

## What We'll Cover:
1. Loading and exploring real bakery sales data
2. Identifying seasonal patterns through decomposition
3. Stationarity testing and differencing
4. ACF/PACF analysis for seasonal components
5. SARIMA model selection and fitting
6. Forecasting and evaluation
7. Comparing SARIMA vs. non-seasonal ARIMA

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">1. Setup and Imports</h3>
</div>

In [None]:
# Install required packages if needed
# !pip install pandas numpy matplotlib statsmodels scikit-learn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

from scipy import stats
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_absolute_error, mean_squared_error

plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 4)

print("✓ Libraries imported successfully")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">2. Load and Explore the Data</h3>
</div>

We're using daily sales data from a French bakery. The dataset contains sales for various products over approximately 21 months (2021-2022).

We'll focus on **Traditional Baguette** sales - a product that shows clear daily and weekly patterns.

In [None]:
# Load the dataset
url = 'https://raw.githubusercontent.com/msfasha/307304-Data-Mining/main/datasets/daily_sales_french_bakery.csv'
df = pd.read_csv(url)

print("Dataset Overview:")
print(f"Total records: {len(df):,}")
print(f"Number of products: {df['unique_id'].nunique()}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst few rows:")
df.head()

In [None]:
# Filter for Traditional Baguette (most popular product)
baguette_df = df[df['unique_id'] == 'TRADITIONAL BAGUETTE'].copy()
baguette_df['ds'] = pd.to_datetime(baguette_df['ds'])
baguette_df = baguette_df.sort_values('ds').set_index('ds')

# Create time series
ts = baguette_df['y']

print("Traditional Baguette Sales:")
print(f"Period: {ts.index.min().date()} to {ts.index.max().date()}")
print(f"Number of days: {len(ts)}")
print(f"Mean daily sales: {ts.mean():.2f} units")
print(f"Std deviation: {ts.std():.2f}")
print(f"Min: {ts.min():.2f}, Max: {ts.max():.2f}")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">3. Visualize the Time Series</h3>
</div>

In [None]:
# Plot the complete time series
fig, ax = plt.subplots(figsize=(14, 5))
ts.plot(ax=ax, linewidth=1.5, color='#8F0177')
ax.set_title('Traditional Baguette - Daily Sales (2021-2022)', fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales (units)', fontsize=12)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Plot a zoomed-in view (first 90 days)
fig, ax = plt.subplots(figsize=(14, 5))
ts[:90].plot(ax=ax, linewidth=2, marker='o', markersize=4, color='#8F0177')
ax.set_title('First 90 Days - Weekly Pattern Visible', fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales (units)', fontsize=12)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Observation: The sales show clear weekly patterns (likely lower on certain days of the week)")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">4. Seasonal Decomposition</h3>
</div>

Decompose the time series into:
- **Trend**: Long-term progression
- **Seasonal**: Repeating short-term cycle
- **Residual**: Irregular variation

In [None]:
# Perform seasonal decomposition with weekly seasonality (period=7)
decomposition = seasonal_decompose(ts, model='additive', period=7)

# Plot decomposition
fig, axes = plt.subplots(4, 1, figsize=(14, 10))

# Original
ts.plot(ax=axes[0], color='#8F0177')
axes[0].set_ylabel('Original', fontsize=11)
axes[0].set_title('Time Series Decomposition (Weekly Seasonality)', fontsize=14, fontweight='bold')

# Trend
decomposition.trend.plot(ax=axes[1], color='#2E86AB')
axes[1].set_ylabel('Trend', fontsize=11)

# Seasonal
decomposition.seasonal.plot(ax=axes[2], color='#A23B72')
axes[2].set_ylabel('Seasonal', fontsize=11)

# Residual
decomposition.resid.plot(ax=axes[3], color='#F18F01')
axes[3].set_ylabel('Residual', fontsize=11)
axes[3].set_xlabel('Date', fontsize=11)

plt.tight_layout()
plt.show()

print("Key Insights:")
print("- Trend: Shows overall sales patterns over time")
print("- Seasonal: Clear 7-day (weekly) repeating pattern")
print("- Residual: Irregular fluctuations after removing trend and seasonality")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">5. Stationarity Tests</h3>
</div>

Before modeling, we need to check if the series is stationary.

**Stationary series** = constant mean, variance, and autocorrelation over time

In [None]:
def check_stationarity(timeseries, title=''):
    """
    Perform ADF and KPSS tests for stationarity
    """
    print(f"\n{'='*60}")
    print(f"Stationarity Tests: {title}")
    print('='*60)
    
    # Augmented Dickey-Fuller Test
    adf_result = adfuller(timeseries.dropna())
    print("\n1. ADF Test (Null Hypothesis: Series is non-stationary)")
    print(f"   ADF Statistic: {adf_result[0]:.4f}")
    print(f"   p-value: {adf_result[1]:.4f}")
    print(f"   Critical Values:")
    for key, value in adf_result[4].items():
        print(f"      {key}: {value:.4f}")
    
    if adf_result[1] < 0.05:
        print("   ✓ Result: Series is STATIONARY (reject null hypothesis)")
    else:
        print("   ✗ Result: Series is NON-STATIONARY (fail to reject null hypothesis)")
    
    # KPSS Test
    kpss_result = kpss(timeseries.dropna(), regression='c')
    print("\n2. KPSS Test (Null Hypothesis: Series is stationary)")
    print(f"   KPSS Statistic: {kpss_result[0]:.4f}")
    print(f"   p-value: {kpss_result[1]:.4f}")
    print(f"   Critical Values:")
    for key, value in kpss_result[3].items():
        print(f"      {key}: {value:.4f}")
    
    if kpss_result[1] > 0.05:
        print("   ✓ Result: Series is STATIONARY (fail to reject null hypothesis)")
    else:
        print("   ✗ Result: Series is NON-STATIONARY (reject null hypothesis)")

# Test original series
check_stationarity(ts, 'Original Series')

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">6. ACF and PACF Analysis</h3>
</div>

**ACF (Autocorrelation)** and **PACF (Partial Autocorrelation)** plots help identify:
- Non-seasonal orders: p, d, q
- Seasonal orders: P, D, Q
- Seasonal period: m (7 for weekly data)

In [None]:
# Plot ACF and PACF
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# ACF - shows correlation at different lags
plot_acf(ts, lags=50, ax=axes[0])
axes[0].set_title('Autocorrelation Function (ACF)', fontsize=12, fontweight='bold')
axes[0].axvline(x=7, color='red', linestyle='--', alpha=0.5, label='Weekly lag')
axes[0].axvline(x=14, color='red', linestyle='--', alpha=0.5)
axes[0].axvline(x=21, color='red', linestyle='--', alpha=0.5)
axes[0].legend()

# PACF - shows direct correlation after removing intermediate correlations
plot_pacf(ts, lags=50, ax=axes[1])
axes[1].set_title('Partial Autocorrelation Function (PACF)', fontsize=12, fontweight='bold')
axes[1].axvline(x=7, color='red', linestyle='--', alpha=0.5, label='Weekly lag')
axes[1].axvline(x=14, color='red', linestyle='--', alpha=0.5)
axes[1].axvline(x=21, color='red', linestyle='--', alpha=0.5)
axes[1].legend()

plt.tight_layout()
plt.show()

print("Interpretation:")
print("- Strong spikes at lags 7, 14, 21 confirm weekly seasonality (period = 7)")
print("- ACF shows slow decay → suggests differencing may be needed")
print("- This pattern indicates SARIMA with seasonal period m=7 would be appropriate")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">7. Train-Test Split</h3>
</div>

In [None]:
# Split data: 80% train, 20% test
train_size = int(len(ts) * 0.8)
train, test = ts[:train_size], ts[train_size:]

print(f"Training set: {len(train)} days ({train.index.min().date()} to {train.index.max().date()})")
print(f"Test set: {len(test)} days ({test.index.min().date()} to {test.index.max().date()})")

# Visualize split
fig, ax = plt.subplots(figsize=(14, 5))
train.plot(ax=ax, label='Training', color='#8F0177', linewidth=1.5)
test.plot(ax=ax, label='Test', color='#F18F01', linewidth=1.5)
ax.set_title('Train-Test Split', fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales (units)', fontsize=12)
ax.legend(loc='best', fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">8. SARIMA Model Notation</h3>
</div>

**SARIMA(p, d, q)(P, D, Q)m** has two sets of parameters:

### Non-Seasonal Parameters:
- **p**: AR (AutoRegressive) order
- **d**: Differencing order
- **q**: MA (Moving Average) order

### Seasonal Parameters:
- **P**: Seasonal AR order
- **D**: Seasonal differencing order
- **Q**: Seasonal MA order
- **m**: Seasonal period (7 for weekly, 12 for monthly, etc.)

### Example:
**SARIMA(1, 1, 1)(1, 1, 1)7** means:
- Non-seasonal: AR(1), one difference, MA(1)
- Seasonal: AR(1), one seasonal difference, MA(1), with period 7

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">9. Model Selection - Grid Search</h3>
</div>

We'll test multiple SARIMA configurations and select the best one based on AIC (Akaike Information Criterion).

In [None]:
# Define parameter ranges for grid search
# Using smaller ranges for faster computation
p_range = [0, 1, 2]
d_range = [0, 1]
q_range = [0, 1, 2]
P_range = [0, 1]
D_range = [1]
Q_range = [0, 1]
m = 7  # Weekly seasonality

print("Starting SARIMA grid search...")
print("This may take a few minutes...\n")

best_aic = np.inf
best_order = None
best_seasonal_order = None
results = []

total_models = len(p_range) * len(d_range) * len(q_range) * len(P_range) * len(D_range) * len(Q_range)
model_count = 0

for p in p_range:
    for d in d_range:
        for q in q_range:
            for P in P_range:
                for D in D_range:
                    for Q in Q_range:
                        try:
                            model_count += 1
                            order = (p, d, q)
                            seasonal_order = (P, D, Q, m)
                            
                            model = SARIMAX(train, 
                                          order=order,
                                          seasonal_order=seasonal_order,
                                          enforce_stationarity=False,
                                          enforce_invertibility=False)
                            
                            fitted_model = model.fit(disp=False)
                            aic = fitted_model.aic
                            
                            results.append({
                                'order': order,
                                'seasonal_order': seasonal_order,
                                'AIC': aic
                            })
                            
                            if aic < best_aic:
                                best_aic = aic
                                best_order = order
                                best_seasonal_order = seasonal_order
                            
                            if model_count % 10 == 0:
                                print(f"Progress: {model_count}/{total_models} models tested...")
                                
                        except Exception as e:
                            continue

print(f"\n{'='*60}")
print("Grid Search Complete!")
print('='*60)
print(f"Best Model: SARIMA{best_order}{best_seasonal_order}")
print(f"Best AIC: {best_aic:.2f}")
print(f"\nTotal models evaluated: {len(results)}")

# Show top 5 models
results_df = pd.DataFrame(results).sort_values('AIC').head(10)
print("\nTop 10 Models:")
print(results_df.to_string(index=False))

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">10. Fit Best SARIMA Model</h3>
</div>

In [None]:
# Fit the best model
best_sarima_model = SARIMAX(train,
                            order=best_order,
                            seasonal_order=best_seasonal_order,
                            enforce_stationarity=False,
                            enforce_invertibility=False)

best_sarima_fit = best_sarima_model.fit(disp=False)

# Display model summary
print(best_sarima_fit.summary())

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">11. Forecasting</h3>
</div>

In [None]:
# Generate forecast for test period
sarima_forecast = best_sarima_fit.forecast(steps=len(test))
sarima_forecast.index = test.index

# Calculate confidence intervals
forecast_df = best_sarima_fit.get_forecast(steps=len(test))
forecast_ci = forecast_df.conf_int()
forecast_ci.index = test.index

# Visualize forecast
fig, ax = plt.subplots(figsize=(14, 6))

# Plot training data
train.plot(ax=ax, label='Training Data', color='#8F0177', linewidth=1.5)

# Plot test data
test.plot(ax=ax, label='Actual Test Data', color='#2E86AB', linewidth=2)

# Plot forecast
sarima_forecast.plot(ax=ax, label='SARIMA Forecast', color='#F18F01', linewidth=2, linestyle='--')

# Plot confidence interval
ax.fill_between(forecast_ci.index,
                forecast_ci.iloc[:, 0],
                forecast_ci.iloc[:, 1],
                color='#F18F01', alpha=0.2, label='95% Confidence Interval')

ax.set_title(f'SARIMA{best_order}{best_seasonal_order} Forecast vs Actual', 
             fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales (units)', fontsize=12)
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Zoom in on test period only
fig, ax = plt.subplots(figsize=(14, 6))
test.plot(ax=ax, label='Actual', color='#2E86AB', linewidth=2, marker='o', markersize=4)
sarima_forecast.plot(ax=ax, label='SARIMA Forecast', color='#F18F01', linewidth=2, marker='s', markersize=4)
ax.fill_between(forecast_ci.index,
                forecast_ci.iloc[:, 0],
                forecast_ci.iloc[:, 1],
                color='#F18F01', alpha=0.2)
ax.set_title('Test Period: SARIMA Forecast vs Actual', fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales (units)', fontsize=12)
ax.legend(loc='best', fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">12. Model Evaluation</h3>
</div>

In [None]:
def evaluate_forecast(actual, predicted, model_name=''):
    """
    Calculate forecasting metrics
    """
    mae = mean_absolute_error(actual, predicted)
    rmse = np.sqrt(mean_squared_error(actual, predicted))
    mape = np.mean(np.abs((actual - predicted) / actual)) * 100
    
    return pd.Series({
        'Model': model_name,
        'MAE': mae,
        'RMSE': rmse,
        'MAPE (%)': mape
    })

# Evaluate SARIMA
sarima_metrics = evaluate_forecast(test, sarima_forecast, 
                                   f'SARIMA{best_order}{best_seasonal_order}')

print("\n" + "="*60)
print("SARIMA Model Performance")
print("="*60)
print(f"MAE:  {sarima_metrics['MAE']:.2f} units")
print(f"RMSE: {sarima_metrics['RMSE']:.2f} units")
print(f"MAPE: {sarima_metrics['MAPE (%)']:.2f}%")
print("\nInterpretation:")
print(f"- On average, predictions are off by {sarima_metrics['MAE']:.2f} units")
print(f"- RMSE of {sarima_metrics['RMSE']:.2f} (penalizes larger errors more)")
print(f"- MAPE of {sarima_metrics['MAPE (%)']:.2f}% shows percentage error")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">13. Residual Diagnostics</h3>
</div>

Check if residuals behave like white noise (random, uncorrelated, normally distributed)

In [None]:
# Get residuals
residuals = best_sarima_fit.resid

# Plot residuals
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Residuals over time
residuals.plot(ax=axes[0, 0], color='#8F0177')
axes[0, 0].axhline(y=0, color='red', linestyle='--', alpha=0.7)
axes[0, 0].set_title('Residuals Over Time', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Residual', fontsize=11)
axes[0, 0].grid(True, alpha=0.3)

# 2. Histogram of residuals
axes[0, 1].hist(residuals.dropna(), bins=30, color='#8F0177', edgecolor='black', alpha=0.7)
axes[0, 1].set_title('Residual Distribution', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Residual', fontsize=11)
axes[0, 1].set_ylabel('Frequency', fontsize=11)
axes[0, 1].grid(True, alpha=0.3, axis='y')

# 3. ACF of residuals
plot_acf(residuals.dropna(), lags=40, ax=axes[1, 0])
axes[1, 0].set_title('ACF of Residuals', fontsize=12, fontweight='bold')

# 4. Q-Q plot
stats.probplot(residuals.dropna(), dist="norm", plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot', fontsize=12, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Statistical tests on residuals
print("\nResidual Statistics:")
print("="*60)
print(f"Mean: {residuals.mean():.4f} (should be ≈ 0)")
print(f"Std Dev: {residuals.std():.4f}")
print(f"Skewness: {residuals.skew():.4f} (should be ≈ 0)")
print(f"Kurtosis: {residuals.kurtosis():.4f} (should be ≈ 0)")

print("\nInterpretation:")
print("✓ Residuals should be randomly scattered around zero")
print("✓ Histogram should look approximately normal")
print("✓ ACF should show no significant autocorrelation (white noise)")
print("✓ Q-Q plot points should follow the diagonal line")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">14. Comparison: SARIMA vs. ARIMA</h3>
</div>

Let's compare SARIMA (with seasonality) against a non-seasonal ARIMA model to demonstrate the importance of modeling seasonality.

In [None]:
# Fit a non-seasonal ARIMA model for comparison
print("Fitting non-seasonal ARIMA model...\n")

# Use simple ARIMA(1,1,1) - no seasonal component
arima_model = ARIMA(train, order=(1, 1, 1))
arima_fit = arima_model.fit()

# Forecast with ARIMA
arima_forecast = arima_fit.forecast(steps=len(test))
arima_forecast.index = test.index

# Evaluate ARIMA
arima_metrics = evaluate_forecast(test, arima_forecast, 'ARIMA(1,1,1)')

# Compare models
comparison_df = pd.DataFrame([arima_metrics, sarima_metrics])
comparison_df = comparison_df.set_index('Model')

print("="*60)
print("Model Comparison: ARIMA vs SARIMA")
print("="*60)
print(comparison_df.to_string())
print("\n")

# Calculate improvement
mae_improvement = ((arima_metrics['MAE'] - sarima_metrics['MAE']) / arima_metrics['MAE']) * 100
rmse_improvement = ((arima_metrics['RMSE'] - sarima_metrics['RMSE']) / arima_metrics['RMSE']) * 100

print("Performance Improvement (SARIMA vs ARIMA):")
print(f"  MAE improved by: {mae_improvement:.1f}%")
print(f"  RMSE improved by: {rmse_improvement:.1f}%")
print("\n✓ SARIMA outperforms ARIMA by capturing weekly seasonal patterns!")

In [None]:
# Visual comparison
fig, ax = plt.subplots(figsize=(14, 6))

# Plot actual test data
test.plot(ax=ax, label='Actual', color='#2E86AB', linewidth=2.5, marker='o', markersize=5)

# Plot ARIMA forecast
arima_forecast.plot(ax=ax, label='ARIMA(1,1,1) Forecast', 
                    color='#A23B72', linewidth=2, linestyle=':', marker='x', markersize=5)

# Plot SARIMA forecast
sarima_forecast.plot(ax=ax, label=f'SARIMA{best_order}{best_seasonal_order} Forecast',
                     color='#F18F01', linewidth=2, linestyle='--', marker='s', markersize=4)

ax.set_title('Comparison: ARIMA vs SARIMA Forecasts', fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales (units)', fontsize=12)
ax.legend(loc='best', fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Observation: SARIMA follows the weekly pattern much more closely than ARIMA!")

<div style="
    background-color:#8F0177;
    padding:15px;
    border-radius:8px;
    color:white;
    display:flex;
    align-items:center;
">
    <h3 style="margin:0;">15. Summary and Key Takeaways</h3>
</div>

## What We Learned:

### 1. **SARIMA vs ARIMA**
- **ARIMA**: Good for non-seasonal time series
- **SARIMA**: Essential when data has repeating seasonal patterns
- SARIMA captured the weekly bakery sales pattern that ARIMA missed

### 2. **SARIMA Model Structure**
- **SARIMA(p,d,q)(P,D,Q)m**
- Non-seasonal parameters (p,d,q) + Seasonal parameters (P,D,Q,m)
- For daily data with weekly patterns: m=7

### 3. **Modeling Workflow**
1. ✓ Visualize and understand your data
2. ✓ Identify seasonal patterns (decomposition, ACF/PACF)
3. ✓ Test for stationarity
4. ✓ Grid search to find optimal parameters
5. ✓ Validate with residual diagnostics
6. ✓ Compare with simpler models

### 4. **When to Use SARIMA**
- Daily sales with weekly patterns
- Monthly data with yearly seasonality
- Hourly data with daily patterns
- Any time series with regular, repeating cycles

### 5. **Performance Metrics**
- **MAE**: Average absolute error (same units as data)
- **RMSE**: Penalizes large errors more heavily
- **MAPE**: Percentage error (scale-independent)

## Next Steps:
- Try different seasonal periods (m values)
- Experiment with exogenous variables (SARIMAX)
- Compare with machine learning approaches
- Apply to your own seasonal time series data!

---

**Important Resources:**
- Statsmodels documentation: https://www.statsmodels.org/
- Time series analysis guide: https://otexts.com/fpp2/

*End of SARIMA Tutorial*