# Bonus assignment

**Joris LIMONIER**

---

In this assignment, we try to predict the number of passengers through time. We will use the airline dataset.


## Data Preprocessing

In [1]:
from pathlib import Path

import airline_passengers as ap
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import torch
import torch.nn as nn
import torch.optim as optim

pio.templates.default = "plotly_white"

In [2]:
%reload_ext autoreload
%autoreload 2

### Load the dataset


In [3]:
filepath = Path("airline_passenger.txt")
passengers = pd.read_csv(
  filepath,
  parse_dates=["date"],
  names=["date", "passengers"],
  index_col="date",
  header=0,
  dtype={"passengers": "float32"},
)
passengers

Unnamed: 0_level_0,passengers
date,Unnamed: 1_level_1
1949-01-01,112.0
1949-02-01,118.0
1949-03-01,132.0
1949-04-01,129.0
1949-05-01,121.0
...,...
1960-08-01,606.0
1960-09-01,508.0
1960-10-01,461.0
1960-11-01,390.0


### Split the dataset into train and test
We use 1/3 of the dataset for testing. The remaining 2/3 is further split into training and validation.

We also scale the data with respect to the training data as the validation and test data should not be used except for model evaluation and testing.

In [4]:
val_size = 0.1  # proportion of the training set is used for validation
test_size = 1 / 3  # proportion of the data is used for testing

train, val, test = ap.ttv_split(df=passengers, val_size=val_size, test_size=test_size)

# Scale the data
train, val, test = ap.scale_wrt(train, val, test, wrt=train, feature_range=(0, 1))
print(f"{len(train) = }, {len(val) = }, {len(test) = }")


len(train) = 81, len(val) = 15, len(test) = 48


We plot the train, validation and test sets with different colors.

In [5]:
ap.plot_tts(train=train, val=val, test=test)

## Model

We define the model and start training.

We define constants to use for training:
- `seq_length`: the number of time steps to use for training, *i.e.* the number of previous months to use to predict the next month
- `n_epochs`: the maximum number of epochs to train for
- `batch_size`: the batch size to use for training

In [6]:
seq_length = 1
n_epochs = 200
batch_size = 2
patience = 30
n_features = train.shape[1]
device = "cuda"

data_module = ap.PassengerDataModule(
  train=train,
  val=val,
  test=test,
  seq_length=seq_length,
  batch_size=batch_size,
  device=device,
)


# Create the model, optimizer and loss function
lstm = ap.PassengerLSTM(
  input_size=n_features, lstm_hidden_size=50, num_layers=4, output_size=1, device=device
)
optimizer = optim.AdamW(lstm.parameters(), lr=0.0002)
loss_fn = nn.MSELoss()

early_stopping = ap.EarlyStopping(patience=patience, verbose=True)

# Train the model
predictor = ap.PassengerPredictor(
  data_module=data_module, model=lstm, optimizer=optimizer, loss_fn=loss_fn
)
train_losses, val_losses = predictor.train(
  model=lstm,
  optimizer=optimizer,
  loss_fn=loss_fn,
  n_epochs=n_epochs,
  early_stopping=early_stopping,
  verbose=2
)


# Plot the losses
losses = pd.DataFrame({"train": train_losses, "val": val_losses})
px.line(losses, y=["train", "val"], title="Losses", log_y=True)


Epoch 0: train loss 0.1154, val loss 0.5431
Epoch 10: train loss 0.0497, val loss 0.2349
Epoch 20: train loss 0.0469, val loss 0.2116
Epoch 30: train loss 0.0410, val loss 0.1625
Epoch 40: train loss 0.0268, val loss 0.0637
Epoch 50: train loss 0.0106, val loss 0.0245
Epoch 60: train loss 0.0089, val loss 0.0288
Epoch 70: train loss 0.0084, val loss 0.0266
EarlyStopping counter: 25 / 30
EarlyStopping counter: 26 / 30
EarlyStopping counter: 27 / 30
EarlyStopping counter: 28 / 30
EarlyStopping counter: 29 / 30
EarlyStopping counter: 30 / 30
Restoring model from epoch 47 ...
Early stopping at epoch 77, best val loss 0.020216 (epoch 47)


In [7]:
y_pred = predictor.predict(model=lstm, dataloader=data_module.test_dataloader)
pred = pd.DataFrame(y_pred.flatten(), index=test.index[seq_length:], columns=["passengers"])

y_pred_val = predictor.predict(model=lstm, dataloader=data_module.val_dataloader)
pred_val = pd.DataFrame(y_pred_val.flatten(), index=val.index[seq_length:], columns=["passengers"])

fig = ap.plot_tts(train=train, val=val, test=test)
fig.add_trace(go.Scatter(x=pred.index, y=pred["passengers"], name="pred", mode="lines"))
fig.add_trace(go.Scatter(x=pred_val.index, y=pred_val["passengers"], name="pred_val", mode="lines"))

test_pred_error = ap.compute_pred_error(
  y_pred=y_pred,
  y_true=test["passengers"].values,
  loss_fn=loss_fn,
  seq_length=seq_length,
)
val_pred_error = ap.compute_pred_error(
  y_pred=y_pred_val,
  y_true=val["passengers"].values,
  loss_fn=loss_fn,
  seq_length=seq_length,
)

fig.layout.title.text += f". Val error: {val_pred_error:.4f}, Test error: {test_pred_error:.4f}"


We see that the predictions of the model for time $t+1$ pretty much consist of the value that is passed to it at time $t$. This is because the model does not have any data besides the value at time $t$ to predict the value at time $t+1$. Also, for a given value $v_t$ at time $t$, the model will always predict the same value $v_{t+1}$ at time $t+1$.

We can change this behavior by feeding more data into the model, such as the current year, the current month, the current season, etc. We will perform these changes in the next section.

## Feature engineering

### Add more features

We create extra features to feed into the model.


In [8]:
passengers_augmented = passengers.copy()
passengers_augmented["month"] = passengers_augmented.index.month
passengers_augmented["year"] = passengers_augmented.index.year
passengers_augmented["season"] = passengers_augmented.index.month % 12 // 3 + 1
passengers_augmented.head()

Unnamed: 0_level_0,passengers,month,year,season
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1949-01-01,112.0,1,1949,1
1949-02-01,118.0,2,1949,1
1949-03-01,132.0,3,1949,2
1949-04-01,129.0,4,1949,2
1949-05-01,121.0,5,1949,2


### Visualize the data
#### Number of passengers per month for each year

Let us visualize the number of passengers per month for each year.

In [9]:
px.line(
  passengers_augmented,
  x="month",
  y="passengers",
  color="year",
  title="Passengers by month",
).show()


We see that the number of passengers per year tends to increase for any given month (except on some rare occasions, _e.g._ february 1953, which had more passengers than february 1954). We expect that the model will be able to learn this behavior.

#### Number of passengers per month

Let us now visualize the number of passengers per month.


In [10]:
# Make a df with the month and number of passengers
passengers_month = (
  passengers_augmented.groupby("month").mean().reset_index().drop(columns="year")
)
px.bar(passengers_month, x="month", y="passengers", title="Average passengers by month").show()


We see that the number of passengers per month is not constant. Summer months tend to have more passengers than winter months. We expect that the model will be able to learn this behavior too.

#### Number of passengers per season

Let us now visualize the number of passengers per season. We encode the seasons as follows:
- 1 : winter
- 2 : spring
- 3 : summer
- 4 : fall

In [11]:
passengers_season = (
  passengers_augmented.groupby("season").mean().reset_index()[["season", "passengers"]]
)
fig = px.bar(
  passengers_season, x="season", y="passengers", title="Average passengers by season"
)
# Add season number and name to the x-axis
fig.update_xaxes(
  ticktext=["Winter", "Spring", "Summer", "Autumn"],
  tickvals=[1, 2, 3, 4],
  title_text="Season",
)

fig.show()


We see that as mentioned before, the number of passengers is higher in summer than in winter. Although the granularity of the `month` feature is higher, this `season` feature may give the model more general information.

### One-hot encoding

We replace the `month` column by a one-hot encoding of the month. We don't reproduce this procedure for the `year` column as months are taken from a closed and cyclic set. Years on the other hand will take values that were not present in the training set. Furthermore, years are properly ordered so if two years are close in our dataset, they are also close in the real world. Contrarily, months 12 (December) and 1 (January) are close in the real world but far in our dataset.

We also produce a one-hot encoding of the season for the same reasons. Note that this produces a column that corresponds to a `is_summer` feature, which could be useful and which we would have computed, had the one-hot encoding not produced it.


In [12]:
passengers_augmented = ap.ohe(df=passengers_augmented, columns=["month", "season"])
passengers_augmented.head()


Unnamed: 0_level_0,passengers,year,month_2,month_3,month_4,month_5,month_6,month_7,month_8,month_9,month_10,month_11,month_12,season_2,season_3,season_4
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1949-01-01,112.0,1949,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1949-02-01,118.0,1949,1,0,0,0,0,0,0,0,0,0,0,0,0,0
1949-03-01,132.0,1949,0,1,0,0,0,0,0,0,0,0,0,1,0,0
1949-04-01,129.0,1949,0,0,1,0,0,0,0,0,0,0,0,1,0,0
1949-05-01,121.0,1949,0,0,0,1,0,0,0,0,0,0,0,1,0,0



#### Correlation between features

Let us now visualize the correlation between the features (dark colors mean high correlation).

In [13]:
# Plot the correlation between the new features and the target
corr = passengers_augmented.corr().round(4)
# Use blue red color scale
fig = px.imshow(
  corr.values,
  color_continuous_scale="blues",
  color_continuous_midpoint=0,
  title="Correlation between features",
)

fig.update_xaxes(
  title="Features", ticktext=corr.columns, tickvals=np.arange(len(corr.columns))
)
fig.update_yaxes(
  title="Features", ticktext=corr.columns, tickvals=np.arange(len(corr.columns))
)
fig.show()

We see that the target (`passengers`) is highly correlated with the `year` feature. This is expected as the number of passengers tends to increase over time. We also see that the target is highly correlated with the `season_3` feature. This is also expected as this feature represents the summer season, which is the season with the most passengers. Finally, we see that the target is correlated with the `month_7` and `month_8` features. This is also expected as these features represent the months of July and August, which are the months with the most passengers.


### Split the dataset into train, validation and test

We split the dataset into train, validation and test sets

In [14]:
val_size = 0.1  # proportion of the training set is used for validation
test_size = 1 / 3  # proportion of the data is used for testing

# Split the data
train, val, test = ap.ttv_split(df=passengers_augmented, val_size=val_size, test_size=test_size)

# Scale the data
train, val, test = ap.scale_wrt(train, val, test, wrt=train, feature_range=(0, 1))
print(f"{len(train) = }, {len(val) = }, {len(test) = }")


len(train) = 81, len(val) = 15, len(test) = 48


We now make predictions using all features computed above.

In [15]:
seq_length = 1
n_epochs = 2000
batch_size = 8
patience = 120
n_features = train.shape[1]
device = "cuda"

data_module = ap.PassengerDataModule(
  train=train,
  val=val,
  test=test,
  seq_length=seq_length,
  batch_size=batch_size,
  device=device,
  target_col="passengers",
)


# Create the model, optimizer and loss function
lstm = ap.PassengerLSTM(
  input_size=n_features,
  lstm_hidden_size=100,
  lstm_dropout=0.1,
  num_layers=3,
  output_size=1,
  device=device,
  fc_sizes=[50, 10],
)
optimizer = optim.AdamW(lstm.parameters(), lr=0.0001)
loss_fn = nn.MSELoss()
early_stopping = ap.EarlyStopping(patience=patience, verbose=2)

# Train the model
predictor = ap.PassengerPredictor(
  data_module=data_module, model=lstm, optimizer=optimizer, loss_fn=loss_fn
)
train_losses, val_losses = predictor.train(
  model=lstm,
  optimizer=optimizer,
  loss_fn=loss_fn,
  n_epochs=n_epochs,
  early_stopping=early_stopping,
)


# Plot the losses
losses = pd.DataFrame({"train": train_losses, "val": val_losses})
px.line(losses, y=["train", "val"], title="Losses", log_y=True)


Epoch 0: train loss 0.3208, val loss 1.0299
Epoch 10: train loss 0.1645, val loss 0.6934
Epoch 20: train loss 0.0417, val loss 0.2518
Epoch 30: train loss 0.0391, val loss 0.2055
Epoch 40: train loss 0.0351, val loss 0.1872
Epoch 50: train loss 0.0300, val loss 0.1681
Epoch 60: train loss 0.0266, val loss 0.1488
Epoch 70: train loss 0.0221, val loss 0.1264
Epoch 80: train loss 0.0191, val loss 0.1097
Epoch 90: train loss 0.0159, val loss 0.0872
Epoch 100: train loss 0.0126, val loss 0.0708
Epoch 110: train loss 0.0105, val loss 0.0536
EarlyStopping counter: 1 / 120
Epoch 120: train loss 0.0086, val loss 0.0402
EarlyStopping counter: 1 / 120
Epoch 130: train loss 0.0060, val loss 0.0271
EarlyStopping counter: 2 / 120
Epoch 140: train loss 0.0049, val loss 0.0157
EarlyStopping counter: 1 / 120
EarlyStopping counter: 2 / 120
EarlyStopping counter: 1 / 120
Epoch 150: train loss 0.0029, val loss 0.0083
EarlyStopping counter: 2 / 120
EarlyStopping counter: 1 / 120
EarlyStopping counter: 2 / 

In [16]:
y_pred = predictor.predict(model=lstm, dataloader=data_module.test_dataloader)
pred = pd.DataFrame(
  y_pred.flatten(), index=test.index[seq_length:], columns=["passengers"]
)

y_pred_val = predictor.predict(model=lstm, dataloader=data_module.val_dataloader)
pred_val = pd.DataFrame(
  y_pred_val.flatten(), index=val.index[seq_length:], columns=["passengers"]
)

test_pred_error = ap.compute_pred_error(
  y_pred=y_pred,
  y_true=test["passengers"].values,
  loss_fn=loss_fn,
  seq_length=seq_length,
)
val_pred_error = ap.compute_pred_error(
  y_pred=y_pred_val,
  y_true=val["passengers"].values,
  loss_fn=loss_fn,
  seq_length=seq_length,
)

fig = ap.plot_tts(train=train, val=val, test=test)
fig.add_trace(go.Scatter(x=pred.index, y=pred["passengers"], name="pred", mode="lines"))
fig.add_trace(
  go.Scatter(x=pred_val.index, y=pred_val["passengers"], name="pred_val", mode="lines")
)
# fig.update_layout(title="Passengers prediction")
fig.layout.title.text += f". Val error: {val_pred_error:.4f}, Test error: {test_pred_error:.4f}"
fig


### Hyperparameter search
We search for hyperparameters in a random search fashion. We use the validation set to evaluate the model.

From previous search, we discarded SGD as it repeatedly yielded the worst results.

In [17]:
hyperparameters_res = ap.hyperparameter_random_search(
  train=train,
  val=val,
  test=test,
  seq_length=1,
  n_epochs=1500,
  batch_sizes=[4, 8, 16, 32],
  patience=80,
  n_experiments=10,
  n_trials=1,
  lr_list=[0.00005, 0.0001, 0.001],
  optimizer_list=[optim.AdamW, optim.RMSprop, optim.Adagrad, optim.Adam],
  lstm_list_hidden_sizes=[50, 100, 200],
  lstm_list_dropout=[0.1, 0.2, 0.3],
  lstm_list_num_layers=[1, 2, 3],
  fc_list_sizes=[[50, 10], [50, 20, 10], [100, 50, 10], [200, 100, 50, 10]],
)


--> Experiment 0, trial 0 Epoch 0: train loss 0.0234, val loss 0.0915
Restoring model from epoch 18 ...
Early stopping at epoch 98, best val loss 0.001351 (epoch 18)
train error: 0.0020, val error: 0.0020, test error: 0.0138
--> Experiment 1, trial 0 Epoch 0: train loss 0.0772, val loss 0.3585
Restoring model from epoch 102 ...
Early stopping at epoch 182, best val loss 0.000380 (epoch 102)
train error: 0.0013, val error: 0.0007, test error: 0.0044
--> Experiment 2, trial 0 Epoch 0: train loss 0.0871, val loss 0.4935
Epoch 250: train loss 0.0013, val loss 0.0029
Epoch 500: train loss 0.0009, val loss 0.0015
Restoring model from epoch 592 ...
Early stopping at epoch 672, best val loss 0.000876 (epoch 592)
train error: 0.0012, val error: 0.0014, test error: 0.0069
--> Experiment 3, trial 0 Epoch 0: train loss 0.1025, val loss 0.4322
Epoch 250: train loss 0.0924, val loss 0.4018
Epoch 500: train loss 0.0887, val loss 0.3894
Epoch 750: train loss 0.0860, val loss 0.3801
Epoch 1000: train l

Here are the best hyperparameters found with respect to the validation loss:

In [18]:
hyperparameters_res.sort_values(by="val_error")

Unnamed: 0,test_error,val_error,train_error,experiment,trial,lr,opt,batch_size,lstm_hidden_size,lstm_dropout,lstm_num_layers,fc_sizes
1,0.00438,0.000709,0.001347,1,0,0.001,<class 'torch.optim.adam.Adam'>,8,200,0.2,2,"[50, 10]"
6,0.006914,0.00111,0.000979,6,0,0.0001,<class 'torch.optim.rmsprop.RMSprop'>,16,200,0.2,2,"[50, 20, 10]"
2,0.006855,0.001388,0.001155,2,0,5e-05,<class 'torch.optim.adam.Adam'>,8,200,0.2,2,"[100, 50, 10]"
9,0.011575,0.001689,0.001209,9,0,0.0001,<class 'torch.optim.rmsprop.RMSprop'>,4,50,0.1,3,"[200, 100, 50, 10]"
0,0.013811,0.001953,0.001974,0,0,0.001,<class 'torch.optim.rmsprop.RMSprop'>,4,200,0.2,2,"[50, 10]"
7,0.008579,0.00208,0.001802,7,0,0.001,<class 'torch.optim.adagrad.Adagrad'>,8,200,0.3,3,"[200, 100, 50, 10]"
4,0.007901,0.002486,0.000734,4,0,5e-05,<class 'torch.optim.rmsprop.RMSprop'>,32,50,0.0,1,"[50, 10]"
8,0.014078,0.004056,0.004696,8,0,0.001,<class 'torch.optim.adam.Adam'>,32,50,0.3,3,"[50, 10]"
5,0.009827,0.011909,0.008454,5,0,0.001,<class 'torch.optim.adagrad.Adagrad'>,32,100,0.3,3,"[50, 20, 10]"
3,0.978642,0.360367,0.08058,3,0,5e-05,<class 'torch.optim.adagrad.Adagrad'>,32,50,0.2,3,"[50, 10]"


In [19]:
seq_length = 1
n_epochs = 2000
batch_size = 8
patience = 120
lr = 0.001
n_features = train.shape[1]
device = "cuda"

data_module = ap.PassengerDataModule(
  train=train,
  val=val,
  test=test,
  seq_length=seq_length,
  batch_size=batch_size,
  device=device,
  target_col="passengers",
)


# Create the model, optimizer and loss function
lstm = ap.PassengerLSTM(
  input_size=n_features,
  lstm_hidden_size=200,
  lstm_dropout=0.2,
  num_layers=2,
  output_size=1,
  device=device,
  fc_sizes=[50, 10],
)
optimizer = optim.Adam(lstm.parameters(), lr=0.0001)
loss_fn = nn.MSELoss()
early_stopping = ap.EarlyStopping(patience=patience, verbose=2)

# Train the model
predictor = ap.PassengerPredictor(
  data_module=data_module, model=lstm, optimizer=optimizer, loss_fn=loss_fn
)
train_losses, val_losses = predictor.train(
  model=lstm,
  optimizer=optimizer,
  loss_fn=loss_fn,
  n_epochs=n_epochs,
  early_stopping=early_stopping,
)


# Plot the losses
losses = pd.DataFrame({"train": train_losses, "val": val_losses})
px.line(losses, y=["train", "val"], title="Losses", log_y=True)


Epoch 0: train loss 0.0771, val loss 0.4555
Epoch 10: train loss 0.0512, val loss 0.3278
Epoch 20: train loss 0.0397, val loss 0.2203
Epoch 30: train loss 0.0324, val loss 0.1727
Epoch 40: train loss 0.0249, val loss 0.1332
Epoch 50: train loss 0.0169, val loss 0.0943
Epoch 60: train loss 0.0108, val loss 0.0569
Epoch 70: train loss 0.0054, val loss 0.0248
Epoch 80: train loss 0.0025, val loss 0.0085
EarlyStopping counter: 1 / 120
Epoch 90: train loss 0.0015, val loss 0.0035
EarlyStopping counter: 2 / 120
EarlyStopping counter: 3 / 120
EarlyStopping counter: 4 / 120
EarlyStopping counter: 5 / 120
EarlyStopping counter: 1 / 120
EarlyStopping counter: 2 / 120
EarlyStopping counter: 3 / 120
EarlyStopping counter: 4 / 120
Epoch 100: train loss 0.0016, val loss 0.0027
EarlyStopping counter: 1 / 120
EarlyStopping counter: 2 / 120
EarlyStopping counter: 3 / 120
EarlyStopping counter: 4 / 120
EarlyStopping counter: 5 / 120
EarlyStopping counter: 1 / 120
EarlyStopping counter: 1 / 120
Epoch 110

Here are the predictions plotted with the train, validation and test sets.

In [20]:
y_pred = predictor.predict(model=lstm, dataloader=data_module.test_dataloader)
pred = pd.DataFrame(
  y_pred.flatten(), index=test.index[seq_length:], columns=["passengers"]
)

y_pred_val = predictor.predict(model=lstm, dataloader=data_module.val_dataloader)
pred_val = pd.DataFrame(
  y_pred_val.flatten(), index=val.index[seq_length:], columns=["passengers"]
)

test_pred_error = ap.compute_pred_error(
  y_pred=y_pred,
  y_true=test["passengers"].values,
  loss_fn=loss_fn,
  seq_length=seq_length,
)
val_pred_error = ap.compute_pred_error(
  y_pred=y_pred_val,
  y_true=val["passengers"].values,
  loss_fn=loss_fn,
  seq_length=seq_length,
)

fig = ap.plot_tts(train=train, val=val, test=test)
fig.add_trace(go.Scatter(x=pred.index, y=pred["passengers"], name="pred", mode="lines"))
fig.add_trace(
  go.Scatter(x=pred_val.index, y=pred_val["passengers"], name="pred_val", mode="lines")
)
# fig.update_layout(title="Passengers prediction")
fig.layout.title.text += f". Val error: {val_pred_error:.4f}, Test error: {test_pred_error:.4f}"
fig
