# Recurrent Neural Networks

Recurrent Neural Networks are designed to learn sequence data with temporal dependencies such as speech and other time series. Recurrent Neural Networks take the time dimension into account by introducing a recursive connection with a time delay of -1.


The idea behind RNNs is to use sequential information. In a feedforward neural network we assume that all inputs (and outputs) are independent of each other, but especially in finance that is a bad assumption. If you want to predict future returns it is probably better to know previous information about past returns of the same security. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations and you already know that they have a *memory* which captures information about what has been calculated so far. A RNN module is presented in the figure below

<img src="images/unrolledRNN.png" width="500">

One can implement this module either with `tensorflow` or `pytorch`. The unit can be considered as a replacement for a single neuron that have a feedback loop in addition. *This way the model is able to consider the time*.

The core reason that recurrent nets are more exciting is that they allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both. There are different examples of RNNs:

<img src="images/rnntypes.png" width="500">


However, there are different recurrent units that are known to perform better in practice. Why is that? We can find a detailed answer about drawbacks of RNN module and its extensions [here](http://colah.github.io/posts/2015-08-Understanding-LSTMs/).

Any references for Recurrent Neural Networks can be found in Goodfellow's book, which has a dedicate chapter to this neural network family.

Let's now dive into the code, to see how we need to prepare data to be ingested by a RNN and what are the pros and cons of it.

# Time series forecasting using Pytorch

In [None]:
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt

import random,os
import torch
import torch.optim as optim
import torch.nn as nn
from torch.utils.data import Dataset, TensorDataset, DataLoader, Subset
from collections import OrderedDict


from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler

device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
def create_input_data(series, n_lags=1, n_leads=1):
    '''
    Function for transforming time series into input acceptable by a multilayer perceptron.
    
    Parameters
    ----------
    series : np.array
        The time series to be transformed
    n_lags : int
        The number of lagged observations to consider as features
    n_leads : int
        The number of future periods we want to forecast for
        
    Returns
    -------
    X : np.array
        Array of features
    y : np.array
        Array of target
    '''
    X = []
    y = []
    for step in range(len(series) - n_lags - n_leads + 1):
        end_step = step + n_lags
        forward_end = end_step + n_leads
        X.append(series[step:end_step])
        y.append(series[end_step:forward_end])
    return np.array(X), np.array(y)

# custom function for reproducibility


def custom_set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)


Define the parameters:

In [None]:
# data
TICKER = 'AAPL'
START_DATE = '2010-01-02'
END_DATE = '2019-12-31'
VALID_START = '2019-07-01'
N_LAGS = 12

# neural network 
BATCH_SIZE = 16
N_EPOCHS = 100

Download and prepare the data:

In [None]:
df = yf.download(TICKER, 
                 start=START_DATE, 
                 end=END_DATE,
                 progress=False)

df = df.resample('W-MON').last() # weekly frequency from Monday
valid_size = df.loc[VALID_START:END_DATE].shape[0]
prices = df['Adj Close'].values.reshape(-1, 1)

In [None]:
fig, ax = plt.subplots()

ax.plot(df.index, prices)
ax.set(title=f"{TICKER}'s Stock price", 
       xlabel='Time', 
       ylabel='Price ($)');

Scale the time series of prices:

In [None]:
valid_ind = len(prices) - valid_size

minmax = MinMaxScaler(feature_range=(0, 1))

prices_train = prices[:valid_ind]
prices_valid = prices[valid_ind:]

minmax.fit(prices_train)

prices_train = minmax.transform(prices_train)
prices_valid = minmax.transform(prices_valid)

prices_scaled = np.concatenate((prices_train, 
                                prices_valid)).flatten()
#plt.plot(prices_scaled)

In [None]:
prices_scaled.shape

Transform the time series into input for the RNN:

In [None]:
X, y = create_input_data(prices_scaled, N_LAGS)

In [None]:
X.shape

In [None]:
X[0]

Obtain the naïve forecast:

In [None]:
naive_pred = prices[len(prices)-valid_size-1:-1]
y_valid = prices[len(prices)-valid_size:]

naive_mse = mean_squared_error(y_valid, naive_pred)
naive_rmse = np.sqrt(naive_mse)
print(f"Naive forecast - MSE: {naive_mse:.4f}, RMSE: {naive_rmse:.4f}")

Prepare the `DataLoader` objects:

In [None]:
# set seed for reproducibility
custom_set_seed(42)

valid_ind = len(X) - valid_size

X_tensor = torch.from_numpy(X).float().reshape(X.shape[0], 
                                               X.shape[1], 
                                               1)
y_tensor = torch.from_numpy(y).float().reshape(X.shape[0], 1)

dataset = TensorDataset(X_tensor, y_tensor)

train_dataset = Subset(dataset, list(range(valid_ind)))
valid_dataset = Subset(dataset, list(range(valid_ind, len(X))))

train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=BATCH_SIZE, 
                          shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset, 
                          batch_size=BATCH_SIZE)

In [None]:
X_tensor.shape

Check the size of the datasets:

In [None]:
print(f'Size of datasets - training: {len(train_loader.dataset)} | validation: {len(valid_loader.dataset)}')

Define the model:

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, n_layers, output_size):
        super(RNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, 
                          n_layers, batch_first=True,
                          nonlinearity='relu')
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        output, _ = self.rnn(x)
        output = self.fc(output[:,-1,:]) 
        return output

Instantiate the model, the loss function and the optimizer:

In [None]:
model = RNN(input_size=1, hidden_size=6, 
            n_layers=1, output_size=1).to(device)
loss_fn = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Train the network:

In [None]:
PRINT_EVERY = 10
train_losses, valid_losses = [], []

for epoch in range(N_EPOCHS):
    running_loss_train = 0
    running_loss_valid = 0

    model.train()
    
    for x_batch, y_batch in train_loader:
        
        optimizer.zero_grad()
        
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)
        y_hat = model(x_batch)
        loss = torch.sqrt(loss_fn(y_batch, y_hat))
        loss.backward()
        optimizer.step()
        running_loss_train += loss.item() * x_batch.size(0)
        
    epoch_loss_train = running_loss_train / len(train_loader.dataset)
    train_losses.append(epoch_loss_train)

    with torch.no_grad():
        model.eval()
        for x_val, y_val in valid_loader:
            x_val = x_val.to(device)
            y_val = y_val.to(device)
            y_hat = model(x_val)
            loss = torch.sqrt(loss_fn(y_val, y_hat))
            running_loss_valid += loss.item() * x_val.size(0)
            
        epoch_loss_valid = running_loss_valid / len(valid_loader.dataset)
            
        if epoch > 0 and epoch_loss_valid < min(valid_losses):
            best_epoch = epoch
            torch.save(model.state_dict(), './outputs/rnn_checkpoint.pth')
            
        valid_losses.append(epoch_loss_valid)

    if epoch % PRINT_EVERY == 0:
        print(f"<{epoch}> - Train. loss: {epoch_loss_train:.4f} \t Valid. loss: {epoch_loss_valid:.4f}")
        
print(f'Lowest loss recorded in epoch: {best_epoch}')

Plot the losses over epochs:

In [None]:
train_losses = np.array(train_losses)
valid_losses = np.array(valid_losses)

fig, ax = plt.subplots()

ax.plot(train_losses, color='blue', label='Training loss')
ax.plot(valid_losses, color='red', label='Validation loss')

ax.set(title="Loss over epochs", 
       xlabel='Epoch', 
       ylabel='Loss')
ax.legend()

# plt.tight_layout()
# plt.savefig('images/ch10_im14.png')
plt.show()

Load the best model (with the lowest validation loss):

In [None]:
state_dict = torch.load('outputs/rnn_checkpoint.pth')
model.load_state_dict(state_dict)

Obtain the predictions:

In [None]:
y_pred = []

with torch.no_grad():
    
    model.eval()
    
    for x_val, y_val in valid_loader:
        x_val = x_val.to(device)
        y_hat = model(x_val)
        y_pred.append(y_hat)
        
y_pred = torch.cat(y_pred).numpy()
y_pred = minmax.inverse_transform(y_pred).flatten()

Evaluate the predictions:

In [None]:
rnn_mse = mean_squared_error(y_valid, y_pred)
rnn_rmse = np.sqrt(rnn_mse)
print(f"RNN's forecast - MSE: {rnn_mse:.4f}, RMSE: {rnn_rmse:.4f}")

fig, ax = plt.subplots(figsize=(10,8))

ax.plot(y_valid, color='blue', label='Actual')
ax.plot(y_pred, color='red', label='RNN')
ax.plot(naive_pred, color='green', label='Naïve')

ax.set(title="RNN's Forecasts", 
       xlabel='Time', 
       ylabel='Price ($)')
ax.legend()

# plt.tight_layout()
plt.show()

# Time series forecasting using Tensorflow

In [None]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from IPython.display import Image
import sys, os


import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler, StandardScaler, QuantileTransformer
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer

In [None]:
def to_tensor(data,
              date_features=None,
              add_cyclic_date=False,
              lookback=30,
              transformer_x=None,
              use_transformer=False,
              # return_time_idx=False,
              rolling_split=False,
              verbose=False):
    """
    Transform inputs to 3-D tensors
    and y as the one time step ahead.


    Args:
        * data: data to create time series targets and features for
        LSTM.
        * lookback:

    ---
    Shape of data:
        features: (total trading days, history for regression, no of features)
        labels: (total trading days, no of features)

    Return:
    """

    if add_cyclic_date:
        x = np.concatenate((data, date_features), axis=1)
    # just do a copy of same data
    else:
        x = data
    y = data

    # repeat this for train-val-test
    xtrain, xval, ytrain, yval = train_test_split(
        x, y, shuffle=False, random_state=42, test_size=0.25)

    xval, xtest, yval, ytest = train_test_split(
        xval, yval, shuffle=False, random_state=42, test_size=0.5)

    imputer_x = SimpleImputer(strategy='median')
    xtrain = imputer_x.fit_transform(xtrain)
    xval = imputer_x.transform(xval)
    xtest = imputer_x.transform(xtest)

    imputer_y = SimpleImputer(strategy='median')
    ytrain = imputer_y.fit_transform(ytrain)
    yval = imputer_y.transform(yval)
    ytest = imputer_y.transform(ytest)

    if use_transformer:
        if transformer_x is None:
            transformer_x = MinMaxScaler()
            xtrain = transformer_x.fit_transform(xtrain)
            xval = transformer_x.transform(xval)
            xtest = transformer_x.transform(xtest)

    xtrain, ytrain = helper_train_test(xtrain, ytrain, lookback)
    xval, yval = helper_train_test(xval, yval, lookback)
    xtest, ytest = helper_train_test(xtest, ytest, lookback)

    return xtrain, xval, xtest, ytrain, yval, ytest


def data_pipe(df,
              transformer_x=None,
              use_transformer=False,
              # return_time_idx=True,
              use_tf_data=False,
              add_cyclic_date=False):
    """Data pipe splits data in train-val-test, then
    it does preprocessing on it. This logic might be implemented
    on the layes itself.

    Args:
        * df: dataframe with data to train.
        * transformer: use a data transformer for
        preprocessing.
        * use_tf_data: if True use the tf-data-set class.
    ---


    Return
    ---
    A dictionary with each of train, val and test sets.

    """


    """ split should be done in the to_tensor function """
    # df_train, df_val, df_test = time_series_split(df)
    # TODO: get time indexes here
    # concat those inside the to_tensor function
    # transform to sine
    if add_cyclic_date:
        data_datefeatures = add_cyclic_datepart(add_datefeatures(df))

        xtrain, xval, xtest, ytrain, yval, ytest = to_tensor(
            df, data_datefeatures, use_transformer=use_transformer)
    else:
        xtrain, xval, xtest, ytrain, yval, ytest = to_tensor(
            df, use_transformer=use_transformer)

    if use_tf_data:
        data_train, data_val, data_test = train_val_tf(
            xtrain, ytrain, xval, yval, xtest, ytest)

        return dict(data_train=data_train,
                    data_val=data_val,
                    data_test=data_test)

    return dict(xtrain=xtrain, ytrain=ytrain,
                xval=xval, yval=yval,
                xtest=xtest, ytest=ytest)

def helper_train_test(data_x, data_y, lookback):
    """Helper function for the creation of time series data"""
    x, y = [], []

    time_length = len(data_x)

    for i in range(time_length - lookback):
        x.append(data_x[i: i + lookback])
        y.append(data_y[i + lookback])

    return np.array(x), np.array(y)

In [None]:
df = pd.read_csv('data/closing_prices.csv')
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)


# df.index.rename('Date', inplace=True)
# df.rename(columns={'Unnamed: 0', 'Date'}, inplace=True)
df.head()

In [None]:
df.iloc[:, :10].plot(figsize=(15, 6));

Train model on 10 stocks.

## RNN

In [None]:
def two_layered_rnn(
    units=20, 
    input_shape=1, 
    output_shape=1, 
    learning_rate=0.01
):
    model = tf.keras.models.Sequential()
    
    model.add(tf.keras.layers.SimpleRNN(
        units,
        return_sequences=True,
        input_shape=[None, input_shape])
             )
    
    model.add(tf.keras.layers.SimpleRNN(units))
    model.add(tf.keras.layers.Dense(output_shape))
    
    model.compile(
        optimizer=tf.keras.optimizers.RMSprop(learning_rate),
        loss='mse')
    
    return model

In [None]:
def training_loop(model):
    metrics_df = pd.DataFrame()

    optim_param_dict = {}

    for c in df.columns:
        optim_param = pd.DataFrame()
        if df.loc[:, c].isnull().sum()/len(df) < 0.5:
            df.loc[:, c].plot(title=f'{c}');
            plt.show();
            print(c)
            print(df[c].shape)
            
            first_valid = df.loc[:, c].first_valid_index()
            
            data_dict = data_pipe(
                df.loc[first_valid:, c].values.reshape(-1, 1), 
                use_tf_data=False,
                use_transformer=True
            )

            xtrain, ytrain, xval, yval, xtest, ytest = (
                data_dict['xtrain'], data_dict['ytrain'], 
                data_dict['xval'], data_dict['yval'],  
                data_dict['xtest'], data_dict['ytest']
            )

            num_outputs = ytrain.shape[-1]
            hyper_lstm = None
            model = None
            model = two_layered_rnn()

            history = model.fit(xtrain,
                            ytrain,
                            batch_size=128,
                            epochs=20,
                            validation_data=(xval, yval),
                            verbose=1)
            
            pd.DataFrame(history.history).plot(figsize=(8, 5))
            # plt.gca().set_ylim(0, 500)
            plt.show();

            print('#' * 50)




In [None]:
model = two_layered_rnn()
training_loop(model)

## Alternative to standard RNNs 

RNNs suffer from exploding or vanishing gradients. RNNs can have a hard time to learn long term dependencies.

**Solutions**:
* Exploding gradients can be addressed by gradient clipping
* Vanishing gradients can be addressed by gater recurrent units



**Examples of gated recurrent units**:
* Long Short Term Memory Networks (LSTM)

<img src="images/lstm.jpg" width="600">

* Gated Recurrent Unit (GRU)

<img src="images/gru.png" width="650">

In [None]:
def two_layered_lstm(units=20, input_shape=1, output_shape=1, learning_rate=0.01):
    model = tf.keras.models.Sequential()
    
    model.add(tf.keras.layers.LSTM(
        units,
        return_sequences=True,
        input_shape=[None, input_shape])
             )
    
    model.add(tf.keras.layers.LSTM(units))
    model.add(tf.keras.layers.Dense(output_shape))
    model.compile(
        optimizer=tf.keras.optimizers.RMSprop(learning_rate),
        loss=tf.keras.losses.Huber(),
        metrics=['mae', 'mse'])
    
    return model

In [None]:
model = two_layered_lstm()
training_loop(model)

# Optional: Working on another dataset...

In [None]:
import pandas as pd
import numpy as np
import os
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler
import numpy as np
import keras
from sklearn.metrics import mean_squared_error
from keras import regularizers
from scipy.ndimage.interpolation import shift


In [None]:
cls = pd.read_csv('data/closing_prices_tiingo.csv')
cls.set_index('date', inplace=True)


In [None]:
display(cls.head())
print(cls.iloc[0:1, 1].name)

## Preprocessing
Functions to preprocess data

In [None]:
def transform_data(x_train, x_dev, x_test, normalize=False):
    """ Do imputing and scaling in two steps. If done in
    pipeline then it is not possible to inverse-transform.
    No need to inverse transform imputing"""

    imputer = SimpleImputer(strategy='median')
    x_train = imputer.fit_transform(x_train)
    x_dev = imputer.transform(x_dev)
    x_test = imputer.transform(x_test)

    if normalize:
        scaler = MinMaxScaler(feature_range=(-1, 1))
        train_x = scaler.fit_transform(x_train)
        dev_x = scaler.transform(x_dev)
        test_x = scaler.transform(x_test)
    else:
        train_x = x_train
        dev_x = x_dev
        test_x = x_test
        scaler = None

    return train_x, dev_x, test_x, scaler

In [None]:
def split_train_dev_test(seq, dev=0.85,
                         timesteps=30,
                         normalize=False,
                         to_ret=False,
                         differenced=False):

    x_train, x_dev, x_test, names = get_train_dev_test(seq, dev=dev)
    print('shape of train, dev and test sets:',
          x_train.shape, x_dev.shape, x_test.shape)

    # code added 2018-03-23
    if differenced:
        x_train = x_train.diff()
        x_dev = x_dev.diff()
        x_test = x_test.diff()

    if to_ret:
        x_train = x_train.apply(to_return)
        x_dev = x_dev.apply(to_return)
        x_test = x_test.apply(to_return)

    train_x, dev_x, test_x, scaler = transform_data(x_train,
                                                    x_dev, x_test,
                                                    normalize=normalize)

    train_x = to_tensor(train_x, timesteps=timesteps)
    x_train = train_x[:, :-1, :]
    y_train = train_x[:, -1]

    dev_x = to_tensor(dev_x, timesteps=timesteps)
    x_dev = dev_x[:, :-1, :]
    y_dev = dev_x[:, -1]

    test_x = to_tensor(test_x, timesteps=timesteps)
    x_test = test_x[:, :-1, :]
    y_test = test_x[:, -1]
    print('printing from the split-train-dev-test function:')
    print('y_dev raw data:')
#    print(scaler.inverse_transform(y_dev))

    return x_train, y_train, x_dev, y_dev, x_test, y_test, scaler, names


In [None]:
def to_return(x, period=1):
    """ This function supposes that the input is a
    dataframe"""
    x_shifted = x.shift(periods=period, axis='index')
    return (x - x_shifted)/x_shifted

In [None]:
def to_tensor(data, timesteps=30):
    x = np.array([data[i:i + timesteps]
                  for i in range(len(data) - timesteps)], dtype=float)
    return x

In [None]:
def get_train_dev_test(data_x, dev=0.85, drop_col=0.05):
    """ Split data in
    - training,
    - development
    - test data (also called live data)
    """
    """ We choose to keep the last 10 % of the data
    as test data (live trading, live data). The tests
    are performed when the model is finnished training"""

    # Check if 10% or more are NAs if so drop those stocks
    # inform how many are droped and how many are left.
    dropped_stocks = []
    for col in data_x.columns:
        if data_x[col].isnull().sum()/len(data_x) > drop_col:
            dropped_stocks.append(col)
            data_x = data_x.drop(col, axis=1)
            print('Stock {} has been dropped as it had more than {} % Nas'.\
                  format(col, drop_col * 100))

    print('Number of stocks dropped:', len(dropped_stocks))
    print('Number of stocks that are kept: ', len(data_x.columns))

    test_idx = int(0.9 * len(data_x))
    x_test = data_x.iloc[test_idx:, :]

    """Get the first 90% of the data for train-dev"""
    train_dev_x = data_x.iloc[:test_idx]

    dev_idx = int(dev * len(train_dev_x))
    x_train = train_dev_x.iloc[:dev_idx, :]
    x_dev = train_dev_x.iloc[dev_idx:, :]

    return x_train, x_dev, x_test, data_x.columns

## Functions for plotting

In [None]:
def plot_compare(y_dev, dev_predict):
    k = y_dev.shape[1]
    start = 0
    step = 5
    end = step
    for i in range(0, k, step):
        ax = dev_predict.iloc[:, start:end].plot(
                subplots=True,
                figsize=(15, 20),
                label='Predictions on dev-set', color='DarkBlue')

        y_dev.iloc[:, start:end].plot(ax=ax, subplots=True, figsize=(15, 20),
                  title='Real data together with predictions', label='real data',
                  color='DarkGreen')
        end += i

In [None]:
def plot_error_curves(history):
    f1, axarr1 = plt.subplots(2, 1, sharex=True, figsize=(8, 10))
    axarr1[0].plot(history.history['loss'])
    axarr1[0].set_title('Training Loss')
    axarr1[1].plot(history.history['val_loss'])
    axarr1[1].set_title('Dev Loss')
    axarr1[1].set_xlabel('Epochs')
#    f1.suptitle('MSE for stock: {}'.format(st_name))
    plt.show()

In [None]:
import math
import time
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout, BatchNormalization
from sklearn.svm import SVR
import tensorflow as tf

## Build the LSTM model

In [None]:
def build_model(drop_rate, lr, units, decay,
                look_back,
                no_features=1,
                no_outputs=1):
    """ Arguments
    drop_rate -- drop_rate in dropout
    learning_rate -- learning rate
    time_steps -- time steps for the sequence
    units -- number of hidden units or neurons
    decay -- proportion of decay for learning rate
    no_features -- number of features, default 1 for a 1-dim time series
    no_outputs -- number of targets or outputs from the model
    """
    optim = tf.keras.optimizers.Adam(lr=lr,
                            beta_1=0.9,
                            beta_2=0.999,
                            decay=1e-6, 
                            clipnorm=1.0)

    # TODO: how to implement many layers in keras?
    ret_seq = False
    model = Sequential()
    model.add(BatchNormalization(input_shape=(look_back, no_features)))
    model.add(LSTM(units,input_shape=(look_back, no_features),
                   return_sequences=ret_seq,
                   kernel_regularizer=regularizers.l2(0.01),
                   recurrent_regularizer=regularizers.l2(0.01),
                   activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(drop_rate))
    model.add(BatchNormalization())
    model.add(Dense(no_outputs,
                    kernel_regularizer=regularizers.l2(0.01)))
    model.compile(loss='mean_squared_error',
                  optimizer=optim,
                  metrics=['mse'])
    return model

## Function to fit an LSTM
This function returns fitted model as well as history. History object can be used for plotting loss during training for train and test sets.

In [None]:
def train_lstm(X_train, Y_train, X_dev, Y_dev,
               epochs, drop_rate, batch_size, decay,
               look_back, lr, units):

    num_features = X_train.shape[-1]
    num_outputs = Y_train.shape[-1]
    model = build_model(drop_rate,
                        lr,
                        units,
                        decay,
                        look_back,
                        no_features=num_features,
                        no_outputs=num_outputs)

    history = model.fit(X_train,
                        Y_train,
                        epochs,
                        batch_size,
                        verbose=0,
                        validation_data=(X_dev, Y_dev),
                        shuffle=False)
    return model, history

## Train function
Function to do training with hyperparameter optimization.

In [None]:
def performance(model, x, y, scaler):
    pred = model.predict(x)
    dim = len(pred.shape)

    if dim < 2:
        pred = pred.reshape(-1, 1)

    pred_reversed = get_inv(pred, scaler)
    ytrue_reversed = get_inv(y, scaler)
    ratio = hit_ratio(ytrue_reversed, pred_reversed)

    return pred_reversed, ytrue_reversed, ratio

In [None]:
def train(x_train, y_train, x_dev, y_dev,
          scaler, patience=2, normalize=True):
    """ Training function, calls train_lstm and optimizes
    hyperparameters"""
    count = 0
    optimal_parameters = {}

    old_hit = 0
    accepted_hit = 0
    accepted_mse = 0
    old_mse = 1e6

    old_mse_train = 1e6
    old_hit_train = 0
    accepted_hit_train = 0
    accepted_mse_train = 0
    """ Maybe good to save for each new better result?"""
    while  (count < patience): # can be made mor robust
        print('')
        print('.' * 50)
        print('While iteration:', count)
        params = {'drop_rate': np.random.uniform(0.1, 0.5),
                  'lr': 10 ** np.random.uniform(-4, -2),
                  'units': np.random.randint(50, 200),
                  'epochs': np.random.randint(200, 450),
                  'decay': 1e-6,
                  'look_back': x_train.shape[1],
                  'batch_size': np.random.choice(
                          np.array([2**5, 2**6, 2**7, 2**8]))}
        print('')
        print(params)
        print('')
        model, history = train_lstm(x_train, y_train, x_dev, y_dev, **params)
        train_predict = model.predict(x_train)
        dev_predict = model.predict(x_dev)

        if np.any(np.isnan(train_predict)) or np.any(np.isinf(train_predict))\
        or np.any(np.isnan(y_train)) or np.any(np.isinf(y_train)):
            print('')
            print('check if predictions for training dataset are too big!')
            print('are there any Nas or inf? Yes')
            print('train_predict =', train_predict)
            print('y_train =', y_train)
        if np.any(np.isnan(dev_predict)) or np.any(np.isinf(dev_predict))\
        or np.any(np.isinf(y_dev)) or np.any(np.isinf(y_dev)):
            print('')
            print('check if predictions for dev dataset are too big!')
            print('are there any Nas or inf? Yes')
            print('dev_predict =', dev_predict)
            print('y_dev =', y_dev)

        mse_dev = mean_squared_error(y_dev, dev_predict)
        mse_train = mean_squared_error(y_train, train_predict)

        if normalize:
            hit_train = hit_ratio(scaler.inverse_transform(y_train),
                                  scaler.inverse_transform(train_predict),
                                  returns=True)
            hit_dev = hit_ratio(scaler.inverse_transform(y_dev),
                                scaler.inverse_transform(dev_predict),
                                returns=True)

        else:
            hit_train = hit_ratio(y_train, train_predict)
            hit_dev = hit_ratio(y_dev, dev_predict)

        # if diff betwenn train mse and dev mse
        # less than one then no so much overfit
        print('mse for train set:', mse_train)
        print('mse for dev set:', mse_dev)
        print('' )
        print('hit-ratio train:', hit_train)
        print('hit-ratio dev:', hit_dev)

        # TODO: add sequence length as hyper parameter
        # and then pop from optimal parameters
        if (np.abs(mse_train - mse_dev) < .4) and (mse_train < 1. and mse_dev < 1.)\
        and (np.abs(hit_dev - hit_train) < 0.2):
            old_hit = hit_dev
            old_mse = mse_dev
            old_hit_train = hit_train
            old_mse_train = mse_train
            # new code 2018-03-27
            # added if cond and put parameter update under condition
            if accepted_hit < old_hit and accepted_mse < old_mse:
                """Swap accepted and old value"""
                accepted_hit, old_hit = old_hit, accepted_hit
                accepted_mse, old_mse = old_mse, accepted_mse
                accepted_mse_train, old_mse_train = old_mse_train, accepted_mse_train
                accepted_hit_train, old_hit_train = old_hit_train, accepted_hit_train
                print('accepted_hit:', accepted_hit)
                optimal_parameters.update(params)
        del model, history
        model, history = None, None
        if len(optimal_parameters) > 0:
            print('')
            print('The optimal parameters found so far', optimal_parameters)
            print('and the results based on optimal parameters are:')
            print('mse_train: {}, hit_train: {}, mse_dev: {}, hit_dev: {}'.format(
                    accepted_mse_train,
                    accepted_hit_train,
                    accepted_mse,
                    accepted_hit))
        else:
            print('')
            print('No optimal parameters found yet')
        del model
        del history
        count += 1
        
    print('')
    print('The optimal parameters are:', optimal_parameters)
    keras.backend.clear_session()
    """ Has to retrain as the model is deleted during the
    while loop"""

    """ It can happen that the model did not find any optimal
    parameters in that case the dict is empty. You should return
    a message that an error happen"""
    try:
        assert len(optimal_parameters) > 0
        model, history = train_lstm(x_train,
                                    y_train,
                                    x_dev,
                                    y_dev,
                                    **optimal_parameters)
        """" Save model to disk in JSON form """
        # Serialize to JSON
        model_json = model.to_json()
        with open('model.json', 'w') as json_file:
            json_file.write(model_json)
        # serialize weights to HDF5
        model.save_weights('model.h5')


        return optimal_parameters, model, history
    except AssertionError:
        print('No optimal parameters were found, try again but train longer.')
        print(' ')

In [None]:
def main():
    count = 1
    NORMALIZE = True
    (x_train, y_train, x_dev, y_dev,
     x_test, y_test, scaler, stock_names) = split_train_dev_test(cls,
                                                                 normalize=NORMALIZE,
                                                                 to_ret=True,
                                                                 dev=0.9,
                                                                 differenced=False)
    print('')
    print('training algorithm:')
    print('shape train set:', x_train.shape, y_train.shape)
    print('type of train set:', type(x_train))
    print('shape dev set:', x_dev.shape, y_dev.shape)
    print('shape test set:', x_test.shape, y_test.shape)

    optimal_parameters, model, history = train(x_train,
                                               y_train,
                                               x_dev,
                                               y_dev,
                                               scaler,
                                               patience=20,
                                               normalize=NORMALIZE)

    """Both targets and predictions have to be transformed"""
    dev_predict = model.predict(x_dev)
    arg_pred_dev = dev_predict
    arg_y_dev = y_dev

    if NORMALIZE:
        arg_pred_dev = scaler.inverse_transform(dev_predict)
        arg_y_dev = scaler.inverse_transform(y_dev)

    dev_predict = pd.DataFrame(arg_pred_dev, columns=stock_names)
    y_dev = pd.DataFrame(arg_y_dev, columns=stock_names)

    plot_error_curves(history)

    dev_predict.plot(figsize=(10, 10))

    dev_predict.iloc[:, 0:5]

    y_dev.plot(figsize=(10, 10))

    ax = dev_predict.iloc[:, 0:5].plot(
            subplots=True,
            figsize=(15, 20),
            label='Predictions on dev-set',
            color='DarkBlue')

    y_dev.iloc[:, 0:5].plot(ax=ax,
              subplots=True,
               figsize=(15, 20),
               title='Real data together with predictions',
               label='real data',
               color='DarkGreen')

if __name__ == '__main__':
    main()
