The purpose of this notebook was to practice using time series forecasting with neural networks.

Unfortunately, using a neural network was not an effective method for forecasting with this dataset.

This is a [Kaggle dataset / competition](https://www.kaggle.com/datasets/kannanaikkal/food-demand-forecasting).

We have 145 weeks worth of order data for a meal delivery service with 77 centers and 51 unique meal offerings. The goal is to forecast how many of each type of meal each center will order in the next 10 weeks.

First, import some standard libraries (including the ZipFile class from zipfile, for extracting data from a file within a zip archive.)

In [1]:
from zipfile import ZipFile

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

Next, load the data as a Pandas dataframe.

In [2]:
with ZipFile('Data/meal_delivery_archive.zip') as zipArchive:
    with zipArchive.open('train.csv') as f:
        raw = pd.read_csv(f)
        
raw.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders
0,1379560,1,55,1885,136.83,152.29,0,0,177
1,1466964,1,55,1993,136.83,135.83,0,0,270
2,1346989,1,55,2539,134.86,135.86,0,0,189
3,1338232,1,55,2139,339.5,437.53,0,0,54
4,1448490,1,55,2631,243.5,242.5,0,0,40


First, let's check how many meal delivery centers are in the dataset (represented by center_id) and how many unique meals are served.

In [3]:
len(raw.center_id.unique())

77

In [4]:
len(raw.meal_id.unique())

51

In [5]:
len(raw.week.unique())

145

There are 77 centers, 51 meals and 145 weeks in the dataset.

In [6]:
data = raw.loc[:, ['week', 'center_id', 'meal_id', 'num_orders']].copy()

In [7]:
data.head()

Unnamed: 0,week,center_id,meal_id,num_orders
0,1,55,1885,177
1,1,55,1993,270
2,1,55,2539,189
3,1,55,2139,54
4,1,55,2631,40


When preparing the data for modeling with an LSTM (long short-term memory) neural network in Keras, it is important that we have a consistent order to the values for "num_orders" with no gaps. For example, if center 12 didn't order any of meal 1198 in week 3, we need to have a "num_orders" value of 0 at that index, not a missing row.

The below cells build a new Pandas series using data from the training data, grouped by week, center id and meal id, and a list of expected indices that includes all combinations of week, center id and meal id, in order, to build such a series.

In [8]:
grouped_data = data.groupby(['week', 'center_id', 'meal_id']).sum()

In [9]:
grouped_data

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,num_orders
week,center_id,meal_id,Unnamed: 3_level_1
1,10,1062,865
1,10,1109,2672
1,10,1198,269
1,10,1207,769
1,10,1216,54
...,...,...,...
145,186,2704,67
145,186,2707,175
145,186,2760,96
145,186,2826,162


Here, I am building the index for a series with every combination of week number, center id and meal id (in tuple form, in order) as the index.

In [10]:
# ⏰ this cell may take seconds to run
%%time

expected_indices = []

for week_no in range(1, 146):
    for center_id in sorted(data.center_id.unique()):
        for meal_id in sorted(data.meal_id.unique()):
            expected_indices.append((week_no, center_id, meal_id))

Then, I am iterating over every (week, center_id, meal_id) tuple in the expected index and filling in the number of orders from the dataset, if it is given, and otherwise, assuming that there were 0 orders for that meal to that center for that week.

In [11]:
# ⏰ this cell may take 20 seconds to run
filled_in_values = []

for index in expected_indices:
    try:
        num_orders = grouped_data.loc[index, 'num_orders']
    except:
        num_orders = 0
    filled_in_values.append(num_orders)

CPU times: user 19.7 s, sys: 160 ms, total: 19.9 s
Wall time: 19.9 s


In [12]:
pd.Series(filled_in_values, index=expected_indices, name='num_orders')

(1, 10, 1062)        865
(1, 10, 1109)       2672
(1, 10, 1198)        269
(1, 10, 1207)        769
(1, 10, 1216)         54
                    ... 
(145, 186, 2707)     175
(145, 186, 2760)      96
(145, 186, 2826)     162
(145, 186, 2867)      28
(145, 186, 2956)       0
Name: num_orders, Length: 569415, dtype: int64

The next step in preparing the data for modeling is to put it into the three-dimensional numpy array format expected by a Keras LSTM.

The expected  shape is (sample_size, window_size, num_features) where window_size refers to how far back the model looks when making its prediction for the next week.

So, for example, if we select a window_size of 5, our input array will look something like this:  

\[ \[ list of week 1 orders \], \[ list of week 2 orders \], ..., \[ list of week 5 orders \] \]
    
And the corresponding output will be the list of week 6 orders:

\[ list of week 6 orders \]

Note that each `list of week i orders` is extraordinarily long: there are 77 centers and 51 meals that can be ordered by each center, so there are $77*51$ values for `num_orders` in a given week. These are in a predictable order due to our data preprocessing, so the model can learn the relationship for each center and meal combination.

The code below iterates over the number of training examples with a given window size, and uses the "filled_in_values" list which gives the number of orders for each week, center-id and meal_id *in order* to create the numpy arrays for X and y specified above.

Depending on the window size, we will need to create a different input array, so specify that initially.

In [14]:
window_size = 5

# constant for this dataset:
num_center_meal_combos = 77*51

X = []
y = []
for i in range(1, 146-window_size):
    window = []
    for week_no in range(i, i + window_size):
        window_start = (week_no - 1)*num_center_meal_combos
        window_stop = week_no*num_center_meal_combos
        window.append(filled_in_values[window_start:window_stop])
    X.append(window)
    label_stop = (week_no + 1)*num_center_meal_combos
    label = filled_in_values[window_stop:label_stop]
    y.append(label)
X = np.array(X)
y = np.array(y)
print(X.shape, y.shape)

(140, 5, 3927) (140, 3927)


Check results:

In [15]:
X[:2, :, :10]

array([[[ 865, 2672,  269,  769,   54,  324,   53,  177,  595,   95],
        [ 782, 1864,  136,  458,   94,  458,   67,  285,  458,   81],
        [ 851, 1161,  418,  418,   96,  337,   27,  756,  445,   40],
        [1202, 1376,  243,  459,  230,  393,   69,  339,  366,   41],
        [ 958, 1511,  150,  312,  189,  162,   26,  501,  432,   15]],

       [[ 782, 1864,  136,  458,   94,  458,   67,  285,  458,   81],
        [ 851, 1161,  418,  418,   96,  337,   27,  756,  445,   40],
        [1202, 1376,  243,  459,  230,  393,   69,  339,  366,   41],
        [ 958, 1511,  150,  312,  189,  162,   26,  501,  432,   15],
        [1094, 2105,  176,  296,  148,   80,   82,   69,  743,   53]]])

The array above shows us the first two "windows" of our time series data, where each sub-list is a single time step (and, as such, we have 5 such lists with a window size of 5), and we see the number of orders for the first 10 meal-center combinations.

In the output array, we should see the corresponding *next* week of orders for the first 10 meal-center combinations:

In [16]:
y[:2, :10]

array([[1094, 2105,  176,  296,  148,   80,   82,   69,  743,   53],
       [1513, 1916,  405,  150,  149,  148,   15,   55,  431,   53]])

Now let's split the data into training and validation data. (There is a separate file with "test" data that we can use as a holdout test set.)

Because our goal is to predict 10 weeks out on the test set (the data in "test.csv"), let's use the last 10 weeks of "train.csv" data as validation data. To truly evaluate our model's performance, we would incorporate each new data point as training data after we test the model's prediction on that time step, but making predictions for all 10 time-steps at once without re-training will suffice initially.

In [17]:
X_train, X_val, y_train, y_val = X[:-10], X[-10:], y[:-10], y[-10:]

In [18]:
print(X_train.shape, X_val.shape, y_train.shape, y_val.shape)

(130, 5, 3927) (10, 5, 3927) (130, 3927) (10, 3927)


Now, we can build a predictive model.

Import the necessary libraries and packages.

In [19]:
from keras.models import Sequential
from keras.layers import *

2023-04-10 13:19:27.485296: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


I want to use an LSTM layer, and I don't want the LSTM cell state to reset automatically after each batch, so I am going to do the following:

- Specify `stateful=True` in the initialization of the LSTM layer
- This forces me to specify the batch size at the initialization of the LSTM layer also (choose batch size of 10, which divides evenly into training and validation data sample size)
- Manually run however many epochs & reset the cell state at the end of each one
- Manually log training and validation loss & accuracy at the end of each epoch

In [20]:
# specify batch size (for lstm layer), the width of the lstm layer and output size (centers*meals)
batch_size = 10
lstm_width = 20
output_size = y.shape[-1]

# specify network architecture: Sequential, with one LSTM stateful layer & one Dense layer with linear activation
baseline_model = Sequential()
baseline_model.add(LSTM(lstm_width,
                        activation='relu',
                        batch_input_shape=(batch_size, window_size, output_size),
                        stateful=True))
baseline_model.add(Dense(output_size))

# compile the model
baseline_model.compile(loss='mean_absolute_percentage_error', optimizer='adam', metrics=['acc'])

2023-04-10 13:19:34.159397: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [21]:
# number of epochs to train
num_epochs = 5

# record history from each epoch
history = {'loss': [], 'acc': [], 'val_loss': [], 'val_acc': []}

# manually train for num_epochs epochs
for i in range(num_epochs):
    baseline_model.fit(X_train,
                       y_train,
                       batch_size=batch_size,
                       validation_data=(X_val, y_val),
                       validation_batch_size=batch_size,
                       epochs=1)
    # record loss and accuracy for training and val this epoch
    for key in baseline_model.history.history.keys():
        history[key].append(baseline_model.history.history[key])
    # reset LSTM cell state
    baseline_model.reset_states()



Loss does decrease (for the training and validation sets) but it there is still virtually no accuracy by the end of training.

In [22]:
# number of epochs to train
num_epochs = 20

# record history from each epoch
history = {'loss': [], 'acc': [], 'val_loss': [], 'val_acc': []}

# manually train for num_epochs epochs
for i in range(num_epochs):
    baseline_model.fit(X_train,
                       y_train,
                       batch_size=batch_size,
                       validation_data=(X_val, y_val),
                       validation_batch_size=batch_size,
                       epochs=1)
    # record loss and accuracy for training and val this epoch
    for key in baseline_model.history.history.keys():
        history[key].append(baseline_model.history.history[key])
    # reset LSTM cell state
    baseline_model.reset_states()



Let's try:

- different window sizes
- different widths for the lstm layer
- an additional hidden layer with

In [26]:
def get_lstm_model(lstm_width, batch_size):
    global output_size
    global window_size
    model = Sequential()
    model.add(LSTM(lstm_width,
                   activation='relu',
                   batch_input_shape=(batch_size, window_size, output_size),
                   stateful=True))
    model.add(Dense(output_size))
    model.compile(loss='mean_absolute_percentage_error', optimizer='adam', metrics=['acc'])
    return model

In [27]:
wider_model = get_lstm_model(40, 10)

In [31]:
def train_and_evaluate(model, num_epochs, batch_size):
    # record history from each epoch
    history = {'loss': [], 'acc': [], 'val_loss': [], 'val_acc': []}
    
    for i in range(num_epochs):
        model.fit(X_train,
                  y_train,
                  batch_size=batch_size,
                  validation_data=(X_val, y_val),
                  validation_batch_size=batch_size,
                  epochs=1)
        # record loss and accuracy for training and val this epoch
        for key in model.history.history.keys():
            history[key].append(model.history.history[key])
        # reset LSTM cell state
        model.reset_states()
    
    return history

In [33]:
history = train_and_evaluate(wider_model, num_epochs=10, batch_size=10)



In [35]:
X_train.shape

(130, 5, 3927)

In [34]:
output_size

3927

In [None]:
def get_lstm_model(lstm_width, batch_size):
    global output_size
    global window_size
    model = Sequential()
    model.add(LSTM(lstm_width,
                   activation='relu',
                   batch_input_shape=(batch_size, window_size, output_size),
                   stateful=True))
    model.add(Dense(output_size))
    model.compile(loss='mean_absolute_percentage_error', optimizer='adam', metrics=['acc'])
    return model

In [36]:
much_wider_model = get_lstm_model(3927, 10)

In [39]:
history = train_and_evaluate(much_wider_model, num_epochs=5, batch_size=10)



In [None]:
deeper_model = Sequential()
deeper_model.add(LSTM(3927,
                      activation='relu',
                      batch_input_shape=(batch_size, window_size, output_size),
                      stateful=True))
deeper_model.add(Dense(3927))