# LSTM (tuning)
**This is the secondary notebook for the LSTM model.** Please refer to *LSTM.ipynb* for the main notebook.

In this notebook, we perform a grid search for the permutation of hyperparameters that will give the lowest ```val_loss```. This took extremely long as we not only tested many permutations, but also trained 10 different models per configuration to find the average loss. This was necessary as the building and fitting of LSTM models give volatile results.

**We created a separate notebook as the process took about 7 hours to run.**

The sections are as follows:
1. [Data Importing and Normalization](#1.-Data-Importing-and-Normalization)
2. [Functions](#2.-Functions): Helper functions for other sections
3. [Early Stopping](#3.-Effect-of-Early-Stopping): Investigation of effect of ```EarlyStopping```
3. [Grid Search](#4.-Grid-Search): Actual grid search

In [17]:
import numpy as np
import pandas as pd
import time
from tensorflow import keras as keras
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

# 1. Data Importing and Normalization

In [3]:
data = pd.read_csv("aapl.us.txt")
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,OpenInt
0,1984-09-07,0.42388,0.42902,0.41874,0.42388,23220030,0
1,1984-09-10,0.42388,0.42516,0.41366,0.42134,18022532,0
2,1984-09-11,0.42516,0.43668,0.42516,0.42902,42498199,0
3,1984-09-12,0.42902,0.43157,0.41618,0.41618,37125801,0
4,1984-09-13,0.43927,0.44052,0.43927,0.43927,57822062,0


In [6]:
raw_data = data['Close'] # normal data (non-stationary)
diff_data = data['Close'].diff() # stationary data
raw_data = raw_data.values.reshape(-1,1)

# Perform train:val:test split of 0.6 : 0.2 : 0.2
# Input data is stationary
split1, split2 = int(len(data)*0.60), int(len(data)*0.80)
train_data, val_data, test_data = diff_data[:split1], diff_data[split1:split2], diff_data[split2:]
training_set, val_set = train_data.values.reshape(-1,1), val_data.values.reshape(-1,1) # ravel data

# Normalization
scaler = MinMaxScaler(feature_range = (0,1))
scaled_training_set = scaler.fit_transform(training_set)
scaled_val_set = scaler.transform(val_set)

# 2. Functions

## Functions for Data Processing & Model Building

In [7]:
# Prepare data for autoregression
def prepare_autoreg(data, timestep, train=True):
    x, y = [], []
    start = timestep+1 if train else timestep
    for i in range(start,len(data)):
        x.append(data[i-timestep:i,0])
        y.append(data[i,0])
    x, y = np.array(x), np.array(y)
    x = x.reshape(x.shape[0],x.shape[1],1) # expand 1 dim for LSTM training
    return x, y

In [8]:
# Build LSTM model based on given timestep and number of nodes
def build_lstm(timestep, nodes):
    lstm = Sequential()
    lstm.add(LSTM(nodes, return_sequences=True, input_shape=(timestep,1)))
    lstm.add(LSTM(nodes))
    lstm.add(Dense(nodes))
    lstm.add(Dense(1))
    lstm.compile(loss='mean_squared_error', optimizer='adam')
    return lstm

## Functions for Early Stopping

In [9]:
def LSTM_model(data, step):
    (x_train, y_train, x_val, y_val) = data
    model = build_lstm(step, 35)
    history = model.fit(x_train, y_train, epochs=40, batch_size=32, validation_data=(x_val,y_val), verbose = 0)
    return model, history

def LSTM_model_early(data, step, tolerant):
    (x_train, y_train, x_val, y_val) = data
    model = build_lstm(step, 35)
    callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=tolerant)
    history = model.fit(x_train, y_train, epochs=40, batch_size=32, validation_data=(x_val,y_val), callbacks=[callback], verbose = 0)
    return model, history

In [19]:
def trial_early_stopping(data, step):
    loss_full, loss_early, val_loss_full, val_loss_early = [],[],[],[]
    time_full, time_early = [], []
    
    for tolerant in range(10):
        print(tolerant, end=' ')
        
        # No EarlyStopping
        start = time.time()
        model, history = LSTM_model(data, step)
        end = time.time()
        loss_full.append(history.history['loss'][-1]) # final val_loss
        val_loss_full.append(history.history['val_loss'][-1]) # final val_loss
        time_full.append(end - start)
    
        # With EarlyStopping
        start = time.time()
        model_early, history_early = LSTM_model_early(data, step, tolerant)
        end = time.time()
        loss_early.append(history_early.history['loss'][-1])
        val_loss_early.append(history_early.history['val_loss'][-1])
        time_early.append(end - start)
        
    avg_loss_full, avg_loss_early = np.mean(loss_full), np.mean(loss_early)
    avg_val_loss_full, avg_val_loss_early = np.mean(val_loss_full), np.mean(val_loss_early)
    avg_time_full, avg_time_early = np.mean(time_full), np.mean(time_early)
    
    print('\nval_loss (no EarlyStopping):',avg_val_loss_full, ', val_loss (with EarlyStopping):',avg_val_loss_early)
    print('loss (no EarlyStopping):',avg_loss_full, ', loss (with EarlyStopping):',avg_loss_early)
    print('Time (no EarlyStopping):',avg_time_full, ', Time (with EarlyStopping):',avg_time_early)

## Functions for Grid Search

In [11]:
# Build LSTM model based on given config of [timestep, nodes]
def model_fit(config):
    timestep, n_nodes = config # unpack config
    x_train, y_train = prepare_autoreg(scaled_training_set, timestep, train=True)
    x_val, y_val = prepare_autoreg(scaled_val_set, timestep, train=False)
    
    model = keras.models.Sequential()
    model.add(keras.layers.LSTM(units=n_nodes, return_sequences = True, input_shape = (x_train.shape[1],1)))
    model.add(keras.layers.LSTM(units=n_nodes))
    model.add(keras.layers.Dense(units=n_nodes))
    model.add(keras.layers.Dense(1))
    model.compile(optimizer = 'adam', loss = 'mean_squared_error')
    callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
    history = model.fit(x_train, y_train, epochs=40, batch_size=256, verbose=0, callbacks=[callback], validation_data=(x_val,y_val))
    
    return model, history

In [12]:
# Build a model with the given config and return the final validation loss (MSE)
def model_predict(config):
    model, history = model_fit(config)
    val_loss = history.history['val_loss'][-1]
    return val_loss

# Return average loss of 10 models with the same config
def repeat_evaluate(config, n_repeats=10):
    key = config
    scores = [model_predict(config) for _ in range(n_repeats)]
    result = np.mean(scores)
    print('> Model%s %.9f' % (key, result))
    return (key, result)

In [13]:
# Create list of param permutations
def model_configs():
    # define scope of configs
    timestep = [30,35,40,45,50,55,60]
    n_nodes = [20,25,30,35,40,45]
    
    # create configs
    configs = [[k,j] for k in timestep for j in n_nodes] 
    print('Total configs: %d' % len(configs))
    return configs

# Perform grid search with the given list of param permuations
def grid_search(cfg_list):
    scores = [repeat_evaluate(cfg) for cfg in cfg_list]
    scores.sort(key=lambda tup: tup[1]) # sort configs by error, asc
    return scores

# 3. Effect of Early Stopping

In [21]:
step = 60 # 2 months
xtr, ytr = prepare_autoreg(scaled_training_set, step, train=True)
xval, yval = prepare_autoreg(scaled_val_set, step, train=False)
trial_early_stopping((xtr, ytr, xval, yval), step)

0 1 2 3 4 5 6 7 8 9 
val_loss (no EarlyStopping): 0.03417550846934318 , val_loss (with EarlyStopping): 0.03443170674145222
loss (no EarlyStopping): 0.0006339070736430585 , loss (with EarlyStopping): 0.0006309737800620496
Time (no EarlyStopping): 147.19956872463226 , Time (with EarlyStopping): 52.568574619293216


The differences for ```loss``` and ```val_loss``` between training with ```EarlyStopping``` and no ```EarlyStopping``` are very small, but ```EarlyStopping``` is able to save a significant amount of time for training and grid search. Hence, we'll be using ```EarlyStopping``` for all subsequent models to save time.

# 4. Grid Search
In this section, we perform a grid search for the permutation of (timestep, nodes) that gives us the lowest final ```val_loss``` (MSE). In total, we're training 42 × 10 = 420 models.

**The following cell took 6 hours to run.**

In [25]:
cfg_list = model_configs()
scores = grid_search(cfg_list) # grid search

Total configs: 42
> Model[[30, 20]] 0.033783221
> Model[[30, 25]] 0.033915004
> Model[[30, 30]] 0.033942702
> Model[[30, 35]] 0.034053292
> Model[[30, 40]] 0.034091226
> Model[[30, 45]] 0.034064770
> Model[[35, 20]] 0.033987318
> Model[[35, 25]] 0.033959962
> Model[[35, 30]] 0.034036282
> Model[[35, 35]] 0.034094409
> Model[[35, 40]] 0.034176888
> Model[[35, 45]] 0.034173197
> Model[[40, 20]] 0.033969628
> Model[[40, 25]] 0.034078548
> Model[[40, 30]] 0.034147567
> Model[[40, 35]] 0.034317990
> Model[[40, 40]] 0.034291252
> Model[[40, 45]] 0.034310182
> Model[[45, 20]] 0.034099816
> Model[[45, 25]] 0.034271905
> Model[[45, 30]] 0.034289877
> Model[[45, 35]] 0.034294938
> Model[[45, 40]] 0.034368554
> Model[[45, 45]] 0.034424908
> Model[[50, 20]] 0.034210833
> Model[[50, 25]] 0.034273899
> Model[[50, 30]] 0.034368247
> Model[[50, 35]] 0.034412097
> Model[[50, 40]] 0.034471647
> Model[[50, 45]] 0.034488676
> Model[[55, 20]] 0.034223896
> Model[[55, 25]] 0.034367745
> Model[[55, 30]] 0.03

We sort the configurations in ascending order of validation loss. Hence, we see that the best configuration is:
```timestep=55```, ```nodes=45```.

In [14]:
scores

[([55, 45], 1.8594361154100626e-05),
 ([40, 45], 1.908317608467769e-05),
 ([45, 45], 2.3069215058058034e-05),
 ([30, 30], 2.4106969158310675e-05),
 ([55, 40], 2.57116237662558e-05),
 ([40, 30], 2.623627524371841e-05),
 ([30, 20], 2.6778622213896597e-05),
 ([30, 40], 2.7507651520863873e-05),
 ([30, 45], 2.773977239485248e-05),
 ([30, 35], 2.7765418781200425e-05),
 ([45, 30], 3.0388454069907313e-05),
 ([55, 25], 3.0440280079346847e-05),
 ([35, 40], 3.111043806711677e-05),
 ([45, 40], 3.134401686111232e-05),
 ([60, 35], 3.172740334775881e-05),
 ([50, 40], 3.178510323778028e-05),
 ([30, 25], 3.223334015274304e-05),
 ([35, 20], 3.2875748911465055e-05),
 ([40, 20], 3.321805406812928e-05),
 ([45, 35], 3.36800759214384e-05),
 ([35, 45], 3.3881416857184374e-05),
 ([60, 45], 3.4214837523904865e-05),
 ([50, 30], 3.464116753093549e-05),
 ([40, 35], 3.4682147543207975e-05),
 ([60, 40], 3.49417562574672e-05),
 ([50, 45], 3.5890101844415764e-05),
 ([35, 30], 3.7502321356441824e-05),
 ([55, 20], 3.809