# LSTM tuning drop out

In this notebook we tune the drop out used for optimal prediction of the stock prices.
We build on the findings in `3.2.2-lstm_prediction_tuning_batch_size.ipynb`

In [1]:
import matplotlib

from sklearn.model_selection import train_test_split
from IPython.display import display

from data.get_50_highest_weights import get_sp_50_highest_weights_symbols
from data_preparation.ochlva_data import OCHLVAData
from utils.column_modifiers import target_generator
from utils.column_modifiers import keep_columns
from utils.scorers import normalized_root_mean_square_error
from utils.transformations import StockMinMax
from estimators import lstm
from estimators.predictions import calculate_normal_prediction

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
matplotlib.use('nbAgg')

In [3]:
import matplotlib.pyplot as plt
from utils.visualizations import plot_scores
from utils.visualizations import plot_true_and_prediction

Load the S&P 500 (as `^GSPC`) data

In [4]:
ochlva_data = OCHLVAData()

Load three other stocks: The stock weighted the most, the medium weighted stock and the lowest weighted stock (out of the 50 downloaded). 
We do this in order to get a better feeling of the model.

In [5]:
symbols = get_sp_50_highest_weights_symbols()

# Select symbols with high, medium and low weights
selected_symbols = (symbols.iloc[0], 
                    symbols.iloc[len(symbols)//2], 
                    symbols.iloc[-1])

for s in selected_symbols:
    ochlva_data.load_data(s)

For now, we will only be interested in training using the adjusted close values.

In [6]:
# Keep only 'Adj. Close' column
ochlva_data.transform(keep_columns, ['Adj. Close'], copy=False)

Next, we create the target values for the data.
The target columns will be shifted 7, 14 and 28 days with respect to 'Adj. Close'.

In [7]:
days = [7, 14, 28]
ochlva_data.transform(target_generator, 'Adj. Close', days, copy=False)

## Tuning the drop out on the validation set

In order not to leak information of the unseen data into the tuning we will tune the drop out rate on a validation set.

In [8]:
optimal_epochs = 160
optimal_batch_size = 128
drop_out_list = [0.0, 0.1, 0.2, 0.4, 0.8]

validation_scores = dict()
train_scores = dict()

for key in ochlva_data.transformed_data.keys():
    print(f'Processing {key}')
    # Extract the features and targets
    x = ochlva_data.transformed_data[key].\
        loc[:, ochlva_data.transformed_data[key].columns[:-len(days)]] 
    y = ochlva_data.transformed_data[key].\
        loc[:, ochlva_data.transformed_data[key].columns[-len(days):]]
       
    # Append the stock to the scores
    validation_scores[key] = dict()
    train_scores[key] = dict()
    
    for drop_out in drop_out_list:        
        reg = lstm.LSTMRegressor(drop_out=drop_out,
                                 recurrent_drop_out=drop_out,
                                 batch_size=optimal_batch_size,
                                 epochs=optimal_epochs)
        
        x_train, _, y_train, y_test = \
            train_test_split(x, y, shuffle=False, test_size=.2)    
        x_train_for_validate, x_validate, y_train_for_validate, y_validate = \
            train_test_split(x_train, y_train, shuffle=False, test_size=.2)

        # Obtain the day of prediction
        # I.e. for a column named x + 2 days, we would expect the two last rows
        # to contain nan
        prediction_days = y_test.isnull().sum()

        # Scale the features
        scaler = StockMinMax()
        scaler.fit(x_train)
        x_train_for_validate = scaler.transform(x_train_for_validate)
        x_validate = scaler.transform(x_validate)
        y_train_for_validate = scaler.transform(y_train_for_validate)
        y_validate = scaler.transform(y_validate)  
        
        # Calculate validation scores (note that the neural network 
        # supports multi prediction "out of the box")
        y_pred, y_pred_train = \
            calculate_normal_prediction(reg,
                                        x_train_for_validate,
                                        x_validate,
                                        y_train_for_validate,
                                        y_validate, 
                                        prediction_days,
                                        training_prediction=True,
                                        use_multi_output_regressor=False)
            
        # Restore the scaling
        y_pred = scaler.inverse_transform(y_pred)
        y_pred_train = scaler.inverse_transform(y_pred_train) 
        y_validate = scaler.inverse_transform(y_validate)
            
        validation_scores[key][drop_out] = \
            normalized_root_mean_square_error(y_validate, y_pred)
        
        # The true value of the trainings is the same as y_validate shifted by 
        # one day
        y_train_true = y_train.loc[y_pred_train.index, :] 
        train_scores[key][drop_out] = \
            normalized_root_mean_square_error(y_train_true, y_pred_train)  

Processing ^GSPC
Processing AAPL
Processing CMCSA
Processing GILD


In [9]:
_ = plot_scores(train_scores, validation_scores, x_label='Drop out')
plt.show()

<IPython.core.display.Javascript object>

Intrestingly, it appears that having dropouts only makes the predictions worse for all cases.

## Test on the unseen test set

We will now test how well the model with the optimal number of drop out generalizes on the unseen test set.
This will also act as a sanity check in order to see that what we have found so far is reasonable.

In [10]:
optimal_drop_out = 0.0

for key in ochlva_data.transformed_data.keys():
    
    x = ochlva_data.transformed_data[key].\
        loc[:, ochlva_data.transformed_data[key].columns[:-len(days)]] 
    y = ochlva_data.transformed_data[key].\
        loc[:, ochlva_data.transformed_data[key].columns[-len(days):]]
        
    x_train, x_test, y_train, y_test = train_test_split(x, y, shuffle=False,
                                                        test_size=.2)

    # Obtain the day of prediction
    # I.e. for a column named x + 2 days, we would expect the two last rows
    # to contain nan
    prediction_days = y_test.isnull().sum()
    
    # Make the regressor with the optimal n
    reg = lstm.LSTMRegressor(drop_out=optimal_drop_out,
                             recurrent_drop_out=optimal_drop_out,
                             batch_size=optimal_batch_size,
                             epochs=optimal_epochs)

    # Scale the features
    scaler = StockMinMax()
    scaler.fit(x_train)
    x_train = scaler.transform(x_train)
    x_test = scaler.transform(x_test)
    y_train = scaler.transform(y_train)
    y_test = scaler.transform(y_test)  
    
    # NOTE: We refit the model here with the same architecture as we used in the
    #       predictions above
    y_pred = calculate_normal_prediction(reg,
                                         x_train,
                                         x_test,
                                         y_train,
                                         y_test, 
                                         prediction_days,
                                         use_multi_output_regressor=False)

    # Restore the scaling
    x_test = scaler.inverse_transform(x_test)
    y_pred = scaler.inverse_transform(y_pred)
    y_test = scaler.inverse_transform(y_test) 

    # Plot the results
    _ = plot_true_and_prediction(x_test, y_pred, 
                                 columns=['Adj. Close'], y_label='USD')
    plt.show()
    
    # Calculate the normalized root mean squared error
    nrmse = normalized_root_mean_square_error(y_test, y_pred)
    
    print((f'Normalized root mean squared error (averaged for the three '
           f'predictions): {nrmse}'))
    
    print('-'*80)
    print('\n'*5)

<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 6.7810211479983
--------------------------------------------------------------------------------








<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 1.5889691488038842
--------------------------------------------------------------------------------








<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 0.5697901708352041
--------------------------------------------------------------------------------








<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 0.7619718431783955
--------------------------------------------------------------------------------






