[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nkeriven/ensta-mt12/blob/main/notebooks/07_RNN_LSTM/N2_LSTM_example.ipynb)

LSTM example on forecasting a scalar temperature value from a vector of past values. 

This Notebook is largely inspired from 
https://www.tensorflow.org/tutorials/structured_data/time_series
which runs under TensorFlow2. 



In [None]:
# import keras
# if tensorflow2, replace by
import os

import numpy as np
import pandas as pd
import tensorflow.keras as keras

# to insure reproductability of different runs
np.random.seed(1)
# from tensorflow import set_random_seed
# set_random_seed(2)

# if tensorflow2, replace the two lines above by
from tensorflow import random

random.set_seed(2)

import matplotlib.pyplot as plt

This tutorial uses a weather time series dataset recorded by the Max Planck Institute for Biogeochemistry.

This dataset contains 14 different features such as air temperature, atmospheric pressure, and humidity. These were collected every 10 minutes, beginning in 2003. For efficiency, you will use only the data collected between 2009 and 2016. This section of the dataset was prepared by François Chollet for his book Deep Learning with Python.

In [None]:
zip_path = keras.utils.get_file(
    origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
    fname='jena_climate_2009_2016.csv.zip', extract= True)
csv_path, _ = os.path.splitext(zip_path)
# csv_path
df = pd.read_csv(csv_path)
#filename = "jena_climate_2009_2016.csv"
#df = pd.read_csv(filename) #This is a pandas DataFrame
df.head()

An observation is recorded every 10 minutes. This means that, for a single hour, you will have 6 observations. Similarly, a single day will contain 144 (6x24) observations.

Given a specific time, let's say you want to predict the temperature 6 hours in the future. In order to make this prediction, you choose to use a set of preceding N observations. Thus, you would create a window containing the last N points. For example, 5 days of observation correspond to 720(5x144) observations to train the model. Many such configurations are possible, making this dataset a good one to experiment with.


##  Forecast a univariate time series

Here, you will train a model using only a single feature (temperature), and use it to make predictions for that value in the future.

Let's first extract only the temperature from the dataset.


In [None]:
# Extract univariate time series
Temp = df["T (degC)"]
Temp.index = df["Date Time"]
print(f"the time series contains {Temp.shape[0]} sample")
print("\n First five rows:")
Temp.head()

In [None]:
# plot time series
Temp.plot(subplots=True, figsize=(12, 6))

### Rescaling

LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1 (if sigmoid activation) or -1 to +1 (if tanh), also called normalizing. We can easily normalize the dataset using the MinMaxScaler preprocessing class from the scikit-learn library.


In [None]:
data = np.array(Temp.values, dtype='float32')[:,np.newaxis]
data.shape

In [None]:
from sklearn.preprocessing import MinMaxScaler

data = np.array(Temp.values, dtype='float32')[:,np.newaxis]

#data = np.reshape(Temp.values, (420551, 1))
#data = data.astype("float32")

scaler = MinMaxScaler(feature_range=(-1, 1))  # compatiblity with choice activation=tanh
data_sc = scaler.fit_transform(data)

print(data[:5,])
scaler.inverse_transform(data_sc)[:5]

#### Exercise
Implement the minmaxscaler yourself, verify your result

### Define training and testing sets

In [None]:
# split into train and test sets
train_size = 300000
test_size = len(data_sc) - train_size
train, test = data_sc[0:train_size, :], data_sc[train_size : len(data_sc), :]
print(len(train), len(test))

### Question
why don't we use the train/test splitting function from sklearn ?

### Rearrange data into a matrix forms

We need to create a dataset of $p$-dimensional samples and labels. Each time step $i$ will correspond to a sample: the previous time steps $[x_{i-p+1}, \ldots, x_{i-1}, x_i]$ are used as the sample, from which we want to predict some future value $x_{i+\ell}$.

The following function create such a data matrix. $p$ is called look_back, and $\ell$ is called forecast.

In [None]:
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1, forecast=0):
    dataX, dataY = [], []
    for i in range(look_back, len(dataset) - forecast):  # current time index is i
        a = dataset[i-look_back : i, 0]
        dataX.append(a)
        dataY.append(
            dataset[i + forecast, 0]
        )  # from a, predicts at time index i
    return np.array(dataX), np.array(dataY)

On the example below, only 20 past observations are used to predict future  temperature value 6 samples ahead (This correspond to a 1 hour)

In [None]:
# reshape into X=t and Y=t+1
look_back = 20
forecast = 6
trainX, trainY = create_dataset(train, look_back, forecast)
testX, testY = create_dataset(test, look_back, forecast)
trainX.shape
trainY = trainY.reshape((-1, 1))
testY = testY.reshape((-1, 1))
print(f'Number of training samples:{trainX.shape[0]}, number of test samples:{testX.shape[0]}')

### The LSTM network expects the input data (X) to be provided with a specific array structure in the form of: [samples, time steps, features].

Currently, our data is in the form: [samples, features] and we are framing the problem as one time step for each sample. We can transform the prepared train and test input data into the expected structure using numpy.reshape() as follows:

In [None]:
# reshape input to be [samples, time steps, features]
trainX = trainX[:,:,np.newaxis]
testX = testX[:,:,np.newaxis]
trainX.shape

In [None]:
# May be useful to test the code on shorter data set...

# trainX=trainX[:200,:,:]
# trainY=trainY[:200]
# testX=testX[:200,:,:]
# testY=testY[:200]
# trainY.shape

In [None]:
# Save the original datasets
trainX_orig, trainY_orig, testX_orig, testY_orig,= trainX, trainY, testX, testY

## Learn a LSTM model 

A batch method is used during the optization phase. The size of the batches is set here to 256 samples (256 consecutive -in time- vectors X of look_back (=20) numerical values, and correponding outputs Y.

The cell states are initialized at random. Obviously, the paramater shuffle has to be set to shuffle=False, as shuffling the data would destroy all time dependencies that are seeked by the LSTM. 

As by default, cell states are not reinitialized after an epoch, we prefer here to force these states to reinitilaize afetr each epoch.
(Cell states are reinitialized, but not the internal networks -remind that there are 4 of them in a LSTM- are kept to  their current value.




In [None]:
#import math

from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.models import Sequential

size_of_batch = 256

# adapt length of trainX and trainY to be divisible by size_of_batch
L = int(len(trainX) / 256) * 256
trainX = trainX_orig[:L, :, :]
trainY = trainY_orig[:L]

trainDate = Temp.index[:L]
df_train_scaled = pd.DataFrame(index=trainDate)
df_train_scaled["Actual"] = trainY

In [None]:
trainDate

In [None]:
nb_epochs = 10
neurons = 8  # size of both hidden stat (h) and cell state (C)
# create and fit the LSTM network
model = Sequential()
model.add(
    LSTM(
        neurons,
        activation="tanh",
        recurrent_activation="sigmoid",
        batch_input_shape=(size_of_batch, trainX.shape[1], 1),
        stateful=True,
    )
)
model.add(Dense(1))
# use stochastic gradient optimizer
opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss="mean_squared_error", optimizer=opt)
model.summary()

It is usual to reset the hidden states of RNN between epochs (technically, we should even do it between batches). See https://adgefficiency.com/tf2-lstm-hidden/

In [None]:
nb_epochs = 5
# Train the network
for i in range(nb_epochs):
    model.fit(
        trainX, trainY, epochs=1, batch_size=size_of_batch, verbose=1, shuffle=False
    )
    model.reset_states()

In [None]:
# make predictions
trainPredict = model.predict(trainX, batch_size=size_of_batch)
df_train_scaled["Predict"] = trainPredict

# adapt length of trainX and trainY to size_of_batch
L_test = int(len(testX) / 256) * 256
testX = testX_orig[:L_test, :, :]
testY = testY_orig[:L_test]

testDate = Temp.index[train_size : train_size + L_test]
testPredict = model.predict(testX, batch_size=size_of_batch)
df_test_scaled = pd.DataFrame(index=testDate)
df_test_scaled["Actual"] = testY
df_test_scaled["Predict"] = testPredict

In [None]:
print(testDate)

In [None]:
import matplotlib.pyplot as plt

df_test_scaled.plot(subplots=False, figsize=(18, 6))

## Back in the original space
To evaluate RMSE performance and visualize the results in the original space, one needs to rescale back the data to their original scale. 

In [None]:
# invert predictions
trainPredict = scaler.inverse_transform(df_train_scaled["Predict"].values.reshape(-1, 1))
trainY = scaler.inverse_transform(df_train_scaled["Actual"].values.reshape(-1, 1))
testPredict = scaler.inverse_transform(df_test_scaled["Predict"].values.reshape(-1, 1))
testY = scaler.inverse_transform(df_test_scaled["Actual"].values.reshape(-1, 1))

df_test = pd.DataFrame(index=testDate)
df_test["Actual"] = testY
df_test["Predict"] = testPredict
df_test.plot(subplots=False, figsize=(15, 6))

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

# calculate root mean squared error
trainScore = np.sqrt(mean_squared_error(trainY, trainPredict))
r2_trainScore = r2_score(trainY, trainPredict)
print(
    "Train Score:\tRMSE is {:.2f} (°C),\t R^2 is {:2.1f}%".format(
        trainScore, 100 * r2_trainScore
    )
)
testScore = np.sqrt(mean_squared_error(testY, testPredict))
r2_testScore = r2_score(testY, testPredict)
print(
    "Test Score:\tRMSE is {:.2f} (°C),\t R^2 is {:2.1f}%".format(testScore, 100 * r2_testScore)
)