The purpose of this notebook is to practice using time series forecasting with neural networks.

This is a [Kaggle dataset / competition](https://www.kaggle.com/datasets/kannanaikkal/food-demand-forecasting).

We have 145 weeks worth of order data for a meal delivery service with 77 centers and 51 unique meal offerings. The goal is to forecast how many of each type of meal each center will order in the next 10 weeks.

First, import some standard libraries (including the ZipFile class from zipfile, for extracting data from a file within a zip archive.)

In [1]:
from zipfile import ZipFile

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

Next, load the data as a Pandas dataframe.

In [2]:
with ZipFile('Data/meal_delivery_archive.zip') as zipArchive:
    with zipArchive.open('train.csv') as f:
        raw = pd.read_csv(f)
        
raw.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders
0,1379560,1,55,1885,136.83,152.29,0,0,177
1,1466964,1,55,1993,136.83,135.83,0,0,270
2,1346989,1,55,2539,134.86,135.86,0,0,189
3,1338232,1,55,2139,339.5,437.53,0,0,54
4,1448490,1,55,2631,243.5,242.5,0,0,40


First, let's check how many meal delivery centers are in the dataset (represented by center_id) and how many unique meals are served.

In [12]:
len(raw.center_id.unique())

77

In [13]:
len(raw.meal_id.unique())

51

In [45]:
len(raw.week.unique())

145

There are 77 centers, 51 meals and 145 weeks in the dataset.

In [3]:
data = raw.loc[:, ['week', 'center_id', 'meal_id', 'num_orders']].copy()

In [4]:
data.head()

Unnamed: 0,week,center_id,meal_id,num_orders
0,1,55,1885,177
1,1,55,1993,270
2,1,55,2539,189
3,1,55,2139,54
4,1,55,2631,40


In [5]:
145*0.2

29.0

We will reserve the last 30 weeks for testing, and the 30 weeks prior to that for validation.

In [8]:
145 - 60

85

In [9]:
145 - 30

115

In [10]:
train, val, test = data.loc[data.week<85], data.loc[(data.week>=85) & (data.week<115)], data.loc[data.week>=115]

In [11]:
print(train.shape, val.shape, test.shape)

(257086, 4) (97701, 4) (101761, 4)


In [32]:
train.head()

Unnamed: 0,week,center_id,meal_id,num_orders
0,1,55,1885,177
1,1,55,1993,270
2,1,55,2539,189
3,1,55,2139,54
4,1,55,2631,40


When preparing the data for modeling with an LSTM (long short-term memory) neural network in Keras, it is important that we have a consistent order to the values for "num_orders" with no gaps. For example, if center 12 didn't order any of meal 1198 in week 3, we need to have a "num_orders" value of 0 at that index, not a missing row.

The below cells build a new Pandas series using data from the training data, grouped by week, center id and meal id, and a list of expected indices that includes all combinations of week, center id and meal id, in order, to build such a series.

In [39]:
grouped_train = train.groupby(['week', 'center_id', 'meal_id']).sum()

In [52]:
grouped_train.loc[week_marker == 1]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,num_orders
week,center_id,meal_id,Unnamed: 3_level_1
1,10,1062,865
1,10,1109,2672
1,10,1198,269
1,10,1207,769
1,10,1216,54
1,...,...,...
1,186,2631,40
1,186,2640,121
1,186,2707,161
1,186,2760,69


Here, I am building the index for a series with every combination of week number, center id and meal id (in tuple form, in order) as the index.

In [147]:
expected_indices = []

for week_no in range(1, 85):
    for center_id in sorted(data.center_id.unique()):
        for meal_id in sorted(data.meal_id.unique()):
            expected_indices.append((week_no, center_id, meal_id))

Then, I am iterating over every (week, center_id, meal_id) tuple in the expected index and filling in the number of orders from the dataset, if it is given, and otherwise, assuming that there were 0 orders for that meal to that center for that week.

In [68]:
filled_in_values = []

for index in expected_indices:
    try:
        num_orders = grouped_train.loc[index, 'num_orders']
    except:
        num_orders = 0
    filled_in_values.append(num_orders)

In [79]:
reindexed_train = pd.Series(filled_in_values, index=expected_indices, name='num_orders')

In [80]:
reindexed_train.head()

(1, 10, 1062)     865
(1, 10, 1109)    2672
(1, 10, 1198)     269
(1, 10, 1207)     769
(1, 10, 1216)      54
Name: num_orders, dtype: int64

The next step in preparing the data for modeling is to put it into the three-dimensional numpy array format expected by a Keras LSTM.

The expected  shape is (sample_size, window_size, num_features) where window_size refers to how far back the model looks when making its prediction for the next week.

So, for example, if we select a window_size of 5, our input array will look something like this:  

\[ \[ list of week 1 orders \], \[ list of week 2 orders \], ..., \[ list of week 5 orders \] \]
    
And the corresponding output will be the list of week 6 orders:

\[ list of week 6 orders \]

Note that each `list of week i orders` is extraordinarily long: there are 77 centers and 51 meals that can be ordered by each center, so there are $77*51$ values for `num_orders` in a given week. These are in a predictable order due to our data preprocessing, so the model can learn the relationship for each center and meal combination.

The code below iterates over the number of training examples with a given window size, and uses the "filled_in_values" list which gives the number of orders for each week, center-id and meal_id *in order* to create the numpy arrays for X and y specified above.

Depending on the window size, we will need to create a different input array, so specify that initially.

In [96]:
window_size = 5

# constant for this dataset:
num_center_meal_combos = 77*51

X = []
y = []
for i in range(1, 85-window_size):
    window = []
    for week_no in range(i, i + window_size):
        window_start = (week_no - 1)*num_center_meal_combos
        window_stop = week_no*num_center_meal_combos
        window.append(filled_in_values[window_start:window_stop])
    X.append(window)
    label_stop = (week_no + 1)*num_center_meal_combos
    label = filled_in_values[window_stop:label_stop]
    y.append(label)
X = np.array(X)
y = np.array(y)
print(X.shape, y.shape)

(79, 5, 3927) (79, 3927)


Check results:

In [97]:
X[:2, :, :10]

array([[[ 865, 2672,  269,  769,   54,  324,   53,  177,  595,   95],
        [ 782, 1864,  136,  458,   94,  458,   67,  285,  458,   81],
        [ 851, 1161,  418,  418,   96,  337,   27,  756,  445,   40],
        [1202, 1376,  243,  459,  230,  393,   69,  339,  366,   41],
        [ 958, 1511,  150,  312,  189,  162,   26,  501,  432,   15]],

       [[ 782, 1864,  136,  458,   94,  458,   67,  285,  458,   81],
        [ 851, 1161,  418,  418,   96,  337,   27,  756,  445,   40],
        [1202, 1376,  243,  459,  230,  393,   69,  339,  366,   41],
        [ 958, 1511,  150,  312,  189,  162,   26,  501,  432,   15],
        [1094, 2105,  176,  296,  148,   80,   82,   69,  743,   53]]])

The array above shows us the first two "windows" of our time series data, where each sub-list is a single time step (and, as such, we have 5 such lists with a window size of 5), and we see the number of orders for the first 10 meal-center combinations. In the output array, we should see the corresponding *next* week of orders for the first 10 meal-center combinations:

In [98]:
y[:2, :10]

array([[1094, 2105,  176,  296,  148,   80,   82,   69,  743,   53],
       [1513, 1916,  405,  150,  149,  148,   15,   55,  431,   53]])

Now, we can build a predictive model. Out of curiosity, the first model I build uses an LSTM layer, but allows the state to be reset after each batch of the training data.

In [113]:
model1 = Sequential()
model1.add(InputLayer((5, 3927))) # specify window size, number of features for input layer
model1.add(LSTM(20, activation='relu'))
model1.add(Dense(y.shape[-1])) # make 77*51 predictions for next week
model1.compile(loss='mean_absolute_percentage_error', optimizer='adam')

In [114]:
model1.fit(X, y)



<keras.callbacks.History at 0x17fe798d0>

In [115]:
model1.fit(X, y, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x18723b550>

With the state being reset after every batch, we still see some improvements with each epoch (particularly from the first to the second.)

For a second model, we will specify that the LSTM is stateful, meaning that Keras will never reset the state of that cell.

When we do so, we have to specify the batch size when we create the LSTM layer. The number of training examples is originally 79, which I believe is prime, so I am leaving off the last 3 examples to allow the training sample size to be evenly divisible by a batch size of 4.

Then we reset the model's state manually after each epoch. But out of curiosity, if we don't reset the state, what is the effect after 3 epochs of training?

In [130]:
model2 = Sequential()
model2.add(LSTM(20, activation='relu', batch_input_shape=(4, 5, 3927), stateful=True))
model2.add(Dense(y.shape[-1]))
model2.compile(loss='mean_absolute_percentage_error', optimizer='adam')

In [127]:
X.shape[0] - 3

76

In [139]:
X[:76].shape

(76, 5, 3927)

In [138]:
model2.fit(X[:76], y[:76], epochs=3, batch_size=4)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x1877bb640>

With *not* resetting the state after each epoch, the model understandably doesn't do very well. (It is using information from the *end* of the 84 weeks to predict the *beginning* of the 84 week cycle in the next epoch, which would only ever work out coincidentally.)

Run the model properly, with resetting the LSTM's state after each epoch:

In [140]:
num_epochs = 5

model3 = Sequential()
model3.add(LSTM(20, activation='relu', batch_input_shape=(4, 5, 3927), stateful=True))
model3.add(Dense(y.shape[-1]))
model3.compile(loss='mean_absolute_percentage_error', optimizer='adam')

for i in range(num_epochs):
    model3.fit(X[:76], y[:76], epochs=1, batch_size=4)
    model3.reset_states()
    
model3.history.history



{'loss': [74325.421875]}

All of this is good and fun, but we need validation data to properly tune our model.

Let's preprocess our validation data similarly to how we preprocessed the training data:

In [141]:
grouped_val = val.groupby(['week', 'center_id', 'meal_id']).sum()

In [143]:
val.week.unique()

array([ 85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,
        98,  99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
       111, 112, 113, 114])

In [146]:
val_expected_indices = []

for week_no in range(85, 115):
    for center_id in sorted(data.center_id.unique()):
        for meal_id in sorted(data.meal_id.unique()):
            val_expected_indices.append((week_no, center_id, meal_id))

In [149]:
val_filled_in_values = []

for index in val_expected_indices:
    try:
        num_orders = grouped_val.loc[index, 'num_orders']
    except:
        num_orders = 0
    val_filled_in_values.append(num_orders)

In [158]:
len(val_filled_in_values)

117810

In [156]:
window_size

5

In [157]:
num_center_meal_combos

3927

In [159]:
val.week.unique()

array([ 85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,
        98,  99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
       111, 112, 113, 114])

In [154]:
X_val = []
y_val = []
for i in range(85, 115-window_size):
    window = []
    for week_no in range(i, i + window_size):
        window_start = (week_no - 1)*num_center_meal_combos
        window_stop = week_no*num_center_meal_combos
        window.append(val_filled_in_values[window_start:window_stop])
    X_val.append(window)
    label_stop = (week_no + 1)*num_center_meal_combos
    label = val_filled_in_values[window_stop:label_stop]
    y_val.append(label)
X_val = np.array(X_val)
y_val = np.array(y_val)
print(X_val.shape, y_val.shape)

(25, 5, 0) (25, 0)


In [155]:
X_val

array([], shape=(25, 5, 0), dtype=float64)