# Bonus assignment

**Joris LIMONIER**

---

In this assignment, we try to predict the number of passengers through time. We will use the airline dataset.


## Data Preprocessing

In [17]:
from pathlib import Path

import airline_passengers as ap
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from torch.utils.data import DataLoader, Dataset, TensorDataset
from tqdm import tqdm

pio.templates.default = "plotly_white"

In [18]:
%reload_ext autoreload
%autoreload 2

### Load the dataset


In [19]:
filepath = Path("airline_passenger.txt")
passengers = pd.read_csv(
  filepath,
  parse_dates=["month"],
  names=["month", "passengers"],
  index_col="month",
  header=0,
  dtype={"passengers": "float32"},
)
passengers

Unnamed: 0_level_0,passengers
month,Unnamed: 1_level_1
1949-01-01,112.0
1949-02-01,118.0
1949-03-01,132.0
1949-04-01,129.0
1949-05-01,121.0
...,...
1960-08-01,606.0
1960-09-01,508.0
1960-10-01,461.0
1960-11-01,390.0


### Split the dataset into train and test
We use 1/3 of the dataset for testing. The remaining 2/3 is further split into 80% training and 20% validation.

In [20]:
val_size = 0.1  # % of the training set is used for validation
test_size = 1/3  # % of the data is used for testing

train, val, test = ap.ttv_split(df=passengers, val_size=val_size, test_size=test_size)

# Scale the data
train, val, test = ap.scale_wrt(train, val, test, wrt=train)

In [21]:
0.2 and not 0 and 0.1

0.1

We plot the train, validation and test sets with different colors.

In [22]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=train.index, y=train.passengers, name="train", mode="lines"))
fig.add_trace(go.Scatter(x=val.index, y=val.passengers, name="val", mode="lines"))
fig.add_trace(go.Scatter(x=test.index, y=test.passengers, name="test", mode="lines"))
fig.update_layout(title="Airline passengers")
fig.show()


We define constants to use for training:
- `SEQ_LENGTH`: the number of time steps to use for training, *i.e.* the number of previous months to use to predict the next month
- `N_EPOCHS`: the maximum number of epochs to train for
- `BATCH_SIZE`: the batch size to use for training

In [23]:
SEQ_LENGTH = 2
N_EPOCHS = 150
BATCH_SIZE = 4

data_module = ap.PassengerDataModule(
  train=train,
  val=val,
  test=test,
  seq_length=SEQ_LENGTH,
  batch_size=BATCH_SIZE,
)

# Create the model, optimizer and loss function
lstm = ap.PassengerLSTM(input_size=1, hidden_size=50, num_layers=3, output_size=1)
optimizer = optim.Adam(lstm.parameters(), lr=0.0002)
loss_fn = nn.MSELoss()

# Train the model
predictor = ap.PassengerPredictor(
  data_module=data_module, model=lstm, optimizer=optimizer, loss_fn=loss_fn
)
train_losses, val_losses = predictor.train(
  model=lstm,
  optimizer=optimizer,
  loss_fn=loss_fn,
  n_epochs=N_EPOCHS,
)

# Plot the losses
losses = pd.DataFrame({"train": train_losses, "val": val_losses})
px.line(losses, y=["train", "val"], title="Losses")


Epoch 0: train loss 0.1104, val loss 0.5112
Epoch 1: train loss 0.0906, val loss 0.4546
Epoch 2: train loss 0.0755, val loss 0.4031
Epoch 3: train loss 0.0642, val loss 0.3566
Epoch 4: train loss 0.0566, val loss 0.3170
Epoch 5: train loss 0.0525, val loss 0.2863
Epoch 6: train loss 0.0509, val loss 0.2647
Epoch 7: train loss 0.0506, val loss 0.2509
Epoch 8: train loss 0.0507, val loss 0.2425
Epoch 9: train loss 0.0508, val loss 0.2373
Epoch 10: train loss 0.0507, val loss 0.2338
Epoch 11: train loss 0.0506, val loss 0.2312
Epoch 12: train loss 0.0504, val loss 0.2290
Epoch 13: train loss 0.0502, val loss 0.2270
Epoch 14: train loss 0.0499, val loss 0.2250
Epoch 15: train loss 0.0497, val loss 0.2229
Epoch 16: train loss 0.0494, val loss 0.2207
Epoch 17: train loss 0.0491, val loss 0.2184
Epoch 18: train loss 0.0487, val loss 0.2159
Epoch 19: train loss 0.0483, val loss 0.2132
Epoch 20: train loss 0.0479, val loss 0.2103
Epoch 21: train loss 0.0475, val loss 0.2071
Epoch 22: train loss

In [46]:
y_pred = predictor.predict(lstm, data_module.test_dataloader)
pred = pd.DataFrame(y_pred.flatten(), index=test.index[SEQ_LENGTH:], columns=["passengers"])

y_pred_val = predictor.predict(lstm, data_module.val_dataloader)
pred_val = pd.DataFrame(y_pred_val.flatten(), index=val.index[SEQ_LENGTH:], columns=["passengers"])


DatetimeIndex(['1957-01-01', '1957-02-01', '1957-03-01', '1957-04-01',
               '1957-05-01', '1957-06-01', '1957-07-01', '1957-08-01',
               '1957-09-01', '1957-10-01', '1957-11-01', '1957-12-01',
               '1958-01-01', '1958-02-01', '1958-03-01', '1958-04-01',
               '1958-05-01', '1958-06-01', '1958-07-01', '1958-08-01',
               '1958-09-01', '1958-10-01', '1958-11-01', '1958-12-01',
               '1959-01-01', '1959-02-01', '1959-03-01', '1959-04-01',
               '1959-05-01', '1959-06-01', '1959-07-01', '1959-08-01',
               '1959-09-01', '1959-10-01', '1959-11-01', '1959-12-01',
               '1960-01-01', '1960-02-01', '1960-03-01', '1960-04-01',
               '1960-05-01', '1960-06-01', '1960-07-01', '1960-08-01',
               '1960-09-01', '1960-10-01', '1960-11-01', '1960-12-01'],
              dtype='datetime64[ns]', name='month', freq=None)

In [51]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=train.index, y=train["passengers"], name="train", mode="lines"))
fig.add_trace(go.Scatter(x=test.index, y=test["passengers"], name="test", mode="lines"))
fig.add_trace(go.Scatter(x=pred.index, y=pred["passengers"], name="pred", mode="lines"))
fig.update_layout(title="Airline passengers")


## Model

We define the model and start training.

## Evaluation

We load the best model and evaluate it on the test set. However, we see that the model doesn't work since it always outputs the same value for all time steps. In our experiments, we tried to use different architectures (hidden size, number of stacked layers, dropout, learning rate, train/val/test splits), but we always got the same result. One possible explanation is that the data should be normalized before training, but I doubt it as, in my opinion, this should only be necessary if we have several features with different scales. In our case, we have only one feature (the number of passengers), so the network should be able to learn the scale.

It is also worth noting that I sent an email to abid.ali@inria.fr and francois.bremond@inria.fr to ask for help, but I didn't get any response. This is a pity as I would have liked to know what I did wrong and I believe there is not much change needed to make the model work.