# Lab 09: Planning for Growth at Washington Cafe

Congratulations! You've just gotten your dream student work-study job as a data analyst for Washington Cafe. As you can guess from the name, Washginton Cafe was started by a group of students at Washington University in St. Louis three years ago. At the time, they were ambitious sophomores. Because putting so much of their passion into Washington Cafe took time away from studies, they barely graduated. They love the restaurant, and it's been successful enough that they want to grow it. And they know enough from their studies that the best way to do this will involve gathering insights from the data they've been collecting for the past three years.

**That's where you fit it!**

## Chapter 1: The Small Diner

Your first order of business is to just get some plans in place for managing the labor demand for the small diner that Washington Cafe currently is.  You've got weekly labor data from the past two years of business. Each month, it seems like the management team is scrambling to figure out the shift schedule and, more often than not, the dining room is either short on staff or staff are sitting aorund doing nothing. Not the best experience for diners or for the staff.

Start with a simple Auto-Regressive (AR) model to forecase the "typical" labor needs. There really isn't much to go on at this point other than previous labor usage, so that's where we'll start.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math

# These ACF/PACF plotters from statsmodels are helpful
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# From statsmodels, we can load in useful time series analysis tools
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX

# And scikit-learn has useful tools for evaluating the usefulness of a model
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [None]:
# Since we have to read and setup data for each phase...
def read_data(file):
    df = pd.read_csv(file, parse_dates=['week_start_date'])
    # Convert the start date time and specify that the requency is
    # "weekly, starting on Monday"
    df = df.set_index('week_start_date').asfreq('W-MON')
    df.drop(columns='restaurant', inplace=True)
    return df

labor1 = read_data('washington_cafe_stage1_diner_2018_2020.csv')
labor1.head()

In [None]:
plt.rcParams["figure.figsize"] = (10, 3)

# Create a function to do all the time series plots
def ts_plots(df, title):
    # Visualize time series raw
    df.plot(title=title)
    plt.show()
    
    # ACF/PACF
    plot_acf(df, lags=52)
    plt.show()
    
    plot_pacf(df, lags=52, method='ywm')
    plt.show()

# Do timeline, acf, and pacf
ts_plots(labor1, 'Stage 1 - Diner')

**Main Plot** - Doesn't seem to be much pattern in the labor hours worked each week. That's consistent with what we've been told: There's limited planning going on and people just work whatever shifts they want. Lots of shortage and lots of excess relative to the actual demand.

**Auto Correlation Function** - The ACF tells us how closely correlated each data point is with the data point that lags by 1, 2, 3, ... 24 time periods. In this case:
* The ACF drops off immediately
* There doesn't appear to be any seasonality (with strong bumps as certain points)

**Partial Auto Correlation Function** - The PACF tells us how much each data point is correlated with the lagging data points after taking all other lagging data points into consideration.
* The PACF just drops off immediately also


So, we can conclude exactly what we expected, there aren't any other hidden patterns to the data. We can simply using a auto-regression model for our planning purposes.

In [None]:
# Let's reserve our last 10 weeks for the testing period and 
# the other 94 weeks before that for training
labor1_train = labor1.iloc[:-10]
labor1_test = labor1.iloc[-10:]

ar_model = AutoReg(labor1_train, lags=1).fit()
ar_model.summary()

In [None]:
pred1 = ar_model.predict(start=labor1_test.index[0], end=labor1_test.index[-1])

mae = mean_absolute_error(labor1_test, pred1)
rmse = math.sqrt(mean_squared_error(labor1_test, pred1))

print(f'Mean Absolute Error:     {mae}')
print(f'Root Mean Squared Error: {rmse}')

In [None]:
# Plot forecast vs actual
def plot_predict(df_train, df_test, df_pred, title):
    plt.plot(df_train.index, df_train.values, label='train')
    plt.plot(df_test.index, df_test.values, label='test')
    plt.plot(df_pred.index, df_pred.values, label='forecast')
    plt.title(title)
    plt.legend(); plt.show()

plot_predict(labor1_train, labor1_test, pred1, 'Stage 1 - AR Forecast')

The actual test data observations (orange) bounce all of the place just like the training data observations.  Our model doesn't know much better that to basically repeat whatever it saw yesterday.  So, you tell the management team to just look at the average over the past year and plan on that being the go-forward plan... for now.

## 2. Washington Cafe is becoming a local favorite

After another year, you decide to reevalute the labor trends. You've had lots of other projects going on around favorite dishes, where patrons are coming from, cost of ingredients. But now it's time to look back at the labor trends again.  You decide to look at the ACF and PACF again as a starting point.

In [None]:
labor2 = read_data('washington_cafe_stage2_local_favorite_2018_2022.csv')
labor2.head()

labor2 = labor2[-104:]
labor2.head()

In [None]:
ts_plots(labor2, 'Stage 2 - Local Favorite')

In [None]:
# Let's reserve our last 10 weeks for the testing period and 
# And we can see a clear change in the past 2 years, so let's just look at the past 2 years
labor2_train = labor2.iloc[-104:-10]
labor2_test = labor2.iloc[-10:]

# This time, we'll use a moving average model
# (0,0,1) means:
# p = auto-regressive - 0 means none
# d = differences - 0 means none
# q = moving average - 
#     1 means use the error from the 1 previous forecast to correct the next
ma_model = ARIMA(labor2_train, order=(0,0,1)).fit()

ma_model.summary()

In [None]:
pred2 = ma_model.predict(start=labor2_test.index[0], end=labor2_test.index[-1])

mae = mean_absolute_error(labor2_test, pred2)
rmse = math.sqrt(mean_squared_error(labor2_test, pred2))

print(f'Mean Absolute Error:     {mae}')
print(f'Root Mean Squared Error: {rmse}')

In [None]:
plot_predict(labor2_train, labor2_test, pred2, 'Stage 2 - MA Forecast')

## Business is Booming!

Finally, the business starts to take off and there's a major influx of new business. The hard work, some marketing genius, and great food has paid off. Let's use the data to see if we can predict the growth rather than just react to it after a few stressful shifts.

In [None]:
labor3 = read_data('washington_cafe_stage3_boom_2023.csv')
labor3.head()

In [None]:
def ts_plots(df, title):
    # Visualize time series raw
    df.plot(title=title)
    plt.show()
    
    # ACF/PACF
    plot_acf(df, lags=20)
    plt.show()
    
    plot_pacf(df, lags=20, method='ywm')
    plt.show()

ts_plots(labor3, 'Stage 3 - Boom')

There is a clear trend in our data as we see the ACF taper off as the lags increase.

In the PACF, there is clear correlation in the value before it, but it quickly drops off.

In [None]:
y3_train = labor3[:-10]
y3_test = labor3[-10:]

# 'White noise': (0,0,0),
    # 'Random walk': (0,1,0),
    # 'Constant': (0,2,0),
    # '1st-order regression': (1,0,0),
    # '2nd-order regression': (2,0,0),
    # 'Differenced 1st-order': (1,2,0),
    # 'Simple exponential smoothing': (0,1,1),
    # '1st-order moving average': (0,0,1),
    # '2nd-order moving average': (0,0,2),
    # 'ARMA': (1,0,1),
    # 'ARIMA': (1,1,1),
    # 'Damped-trend linear exponential smoothing': (1,1,2),
    # 'Linear exponential smoothing 1': (0,2,1),
    # 'Linear exponential smoothing 2': (0,2,2)

plt.rcParams['figure.figsize'] = (10,6)

params = {
    'White noise': (0,0,0),
    'Random walk': (0,1,0),
    'Constant': (0,2,0),
    '1st-order regression': (1,0,0),
    '2nd-order regression': (2,0,0),
    'Differenced 1st-order': (1,2,0),
    'Simple exponential smoothing': (0,1,1),
    '1st-order moving average': (0,0,1),
    '2nd-order moving average': (0,0,2),
    'ARMA': (1,0,1),
    'ARIMA': (1,1,1),
    'Damped-trend linear exponential smoothing': (1,1,2),
    'Linear exponential smoothing 1': (0,2,1),
    'Linear exponential smoothing 2': (0,2,2)
}

preds = {}

for label, param in params.items():
    model = ARIMA(y3_train, order=param).fit()
    pred = model.predict(start=y3_test.index[0], end=y3_test.index[-1])
    preds[label] = pred

plt.plot(y3_train.index, y3_train.values, label='train')
plt.plot(y3_test.index, y3_test.values, label='test')

for label, pred in preds.items():
    plt.plot(pred.index, pred.values, linestyle='dashed', label=(label + ' ' + str(params[label])))

plt.title('Stage 3 — ARIMA Forecasts')
plt.legend()
plt.show()

In [None]:
model3 = ARIMA(y3_train, order=(0,2,1)).fit()

pred3 = model3.predict(start=y3_test.index[0], end=y3_test.index[-1])

mae = mean_absolute_error(y3_test, pred3)
rmse = math.sqrt(mean_squared_error(y3_test, pred3))

print(f'Mean Absolute Error:     {mae}')
print(f'Root Mean Squared Error: {rmse}')

In [None]:
plot_predict(y3_train, y3_test, pred3, 'Stage 3 - Linear Exponential Smoothing Forecast')

## Into Fine Dining

With that huge growth in business last year, Washington Cafe has decided to transform into a fine dining establishment. As a result, there's more seasonal fluctuation in business (e.g., parent's weekend and holidays). Let's take a look and see if we can build a seasonal model, too.

In [None]:
labor4 = read_data('washington_cafe_stage4_fine_dining_2024_2026.csv')
labor4.head()

In [None]:
def ts_plots(df, title):
    # Visualize time series raw
    df.plot(title=title)
    plt.show()
    
    # ACF/PACF
    plot_acf(df, lags=52)
    plt.show()
    
    plot_pacf(df, lags=52, method='ywm')
    plt.show()

ts_plots(labor4, 'Stage 4 - Fine Dining')

In [None]:
# The seasonality shows up in how high the ACF stays in the first chart
# So, we need to difference over the past 52 weeks
labor4_diff = labor4.diff(52).dropna()

ts_plots(labor4_diff, 'Stage 4 (Differenced) - Fine Dining')

In [None]:
labor4.shape

In [None]:
y4_train = labor4[:-10]
y4_test = labor4[-10:]

# SARIMAX(p,d,q)(P,D,Q, s) — starting point (1,1,1)(1,1,1,52)
model4 = SARIMAX(y4_train,
                order=(1,1,0),
                seasonal_order=(0,1,1,52),
                enforce_stationarity=False,
                enforce_invertibility=False).fit()
model4.summary()

In [None]:
pred4 = model4.predict(start=y4_test.index[0], end=y4_test.index[-1])

mae = mean_absolute_error(y4_test, pred4)
rmse = math.sqrt(mean_squared_error(y4_test, pred4))

print(f'Mean Absolute Error:     {mae}')
print(f'Root Mean Squared Error: {rmse}')

In [None]:
plot_predict(y4_train, y4_test, pred4, 'Stage 4 - Fine Dining')