# Last day prediction

In this notebook we will test the last day prediction to see how well the estimate is if we use the closing value x days ago as a prediction of stock prices.
We will use this as a benchmark for more complex models.

In [1]:
import matplotlib

from sklearn.model_selection import train_test_split
from IPython.display import display

from data.get_50_highest_weights import get_sp_50_highest_weights_symbols
from data_preparation.ochlva_data import OCHLVAData
from utils.column_modifiers import target_generator
from utils.column_modifiers import keep_columns
from utils.scorers import normalized_root_mean_square_error
from estimators.predictions import calculate_rolling_prediction
from estimators import latest_day

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
matplotlib.use('nbAgg')
from utils.visualizations import plot_true_and_prediction

In [3]:
import matplotlib.pyplot as plt

Load the S&P 500 (as `^GSPC`) data

In [4]:
ochlva_data = OCHLVAData()

Load three other stocks: The stock weighted the most, the medium weighted stock and the lowest weighted stock (out of the 50 downloaded). 
We do this in order to get a better feeling of the model.

In [5]:
symbols = get_sp_50_highest_weights_symbols()

# Select symbols with high, medium and low weights
selected_symbols = (symbols.iloc[0], 
                    symbols.iloc[len(symbols)//2], 
                    symbols.iloc[-1])

for s in selected_symbols:
    ochlva_data.load_data(s)

In [6]:
ax = ochlva_data.plot(['Adj. Close'])
plt.show()

<IPython.core.display.Javascript object>

For now, we will only be interested in training using the adjusted close values.

In [7]:
# Keep only 'Adj. Close' column
ochlva_data.transform(keep_columns, ['Adj. Close'], copy=False)

Next, we create the target values for the data.
The target columns will be shifted 7, 14 and 28 days with respect to 'Adj. Close'.

In [8]:
days = [7, 14, 28]
ochlva_data.transform(target_generator, 'Adj. Close', days, copy=False)

We initialize the regressor.
Note that only one instance is needed.

In [9]:
reg = latest_day.LatestDay()

We will now loop over the symbols in our `ochlva_data`.
Specifically we will:

1. Split the data into a training set and test set, and visualize the two sets 
2. Make a "rolling" prediction. That is, we will
	1. Make a prediction based on the training set.
	2. Add the earliest observation from the test set to the training set, retrain the model and make a new prediction
	3. Add the two earliest observation from the test set to the training set, retrain the model and make a new prediction, and so on util predictions have been performed for the entire test set.
3. Vizualise the result
4. Report the averaged normalized root mean squared error (see `scorers.py` for details)

In [10]:
# Looping through the stocks
for key in ochlva_data.transformed_data.keys():
    print(f'Processing {key}')
    # Extract the features and targets
    # NOTE: We have multiple targets
    x = ochlva_data.transformed_data[key].\
        loc[:, ochlva_data.transformed_data[key].columns[:-len(days)]] 
    y = ochlva_data.transformed_data[key].\
        loc[:, ochlva_data.transformed_data[key].columns[-len(days):]]

    print('Head of features')
    display(x.head())
    print('Head of targets')
    display(y.head())
    
    # NOTE: We could use sklearn.model_selection.TimeSeriesSplit for splitting 
    # the data
    # However, as we are not doing any form of cross-validation, it is here 
    # more convenient to utilize train_test_split
    x_train, x_test, y_train, y_test = \
        train_test_split(x, y, shuffle=False, test_size=.2)
    
    print(f'Train shape: {x_train.shape}')
    print(f'Test shape: {x_test.shape}')
    
    # Plot the train and test set
    ax = x_train.plot()
    _ = x_test.plot(ax=ax)
    ax.legend([f'{key} Train', f'{key} Test'])
    ax.grid()
    _ = ax.set_ylabel('USD')

    plt.show()
    
    # Obtain the day of prediction
    # I.e. for a column named x + 2 days, we would expect the two last rows
    # to contain nan
    prediction_days = y_test.isnull().sum()
    
    # Make predictions
    y_pred = \
        calculate_rolling_prediction(reg, x_train, x_test, y_train, y_test,
                                     prediction_days)

    # Plot the results
    _ = plot_true_and_prediction(x_test, y_pred, y_label='USD')
    plt.show()
    
    # Calculate the normalized root mean squared error
    nrmse = normalized_root_mean_square_error(y_test, y_pred)
    
    print((f'Normalized root mean squared error (averaged for the three '
           f'predictions): {nrmse}'))
    
    print('-'*80)
    print('\n'*5)

Processing ^GSPC
Head of features


Unnamed: 0_level_0,Adj. Close
Date,Unnamed: 1_level_1
2013-03-07,1544.26001
2013-03-08,1551.180054
2013-03-11,1556.219971
2013-03-12,1552.47998
2013-03-13,1554.52002


Head of targets


Unnamed: 0_level_0,Adj. Close + 7 days,Adj. Close + 14 days,Adj. Close + 28 days
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-03-07,1552.099976,1562.849976,1552.01001
2013-03-08,1548.339966,1569.189941,1541.609985
2013-03-11,1558.709961,1562.170044,1555.25
2013-03-12,1545.800049,1570.25,1562.5
2013-03-13,1556.890015,1553.689941,1578.780029


Train shape: (1008, 1)
Test shape: (252, 1)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 7.01011187317512
--------------------------------------------------------------------------------






Processing AAPL
Head of features


Unnamed: 0_level_0,Adj. Close
Date,Unnamed: 1_level_1
2013-03-07,56.151959
2013-03-08,56.300365
2013-03-11,57.102383
2013-03-12,55.871318
2013-03-13,55.860885


Head of targets


Unnamed: 0_level_0,Adj. Close + 7 days,Adj. Close + 14 days,Adj. Close + 28 days
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-03-07,59.430192,58.955501,52.528924
2013-03-08,59.269788,57.727044,51.127022
2013-03-11,58.955501,55.933914,50.9288
2013-03-12,59.040267,56.048935,51.990333
2013-03-13,60.237426,56.335575,52.963187


Train shape: (1002, 1)
Test shape: (251, 1)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 1.4970844050834327
--------------------------------------------------------------------------------






Processing CMCSA
Head of features


Unnamed: 0_level_0,Adj. Close
Date,Unnamed: 1_level_1
2013-03-07,18.735292
2013-03-08,18.91056
2013-03-11,18.972827
2013-03-12,18.781415
2013-03-13,18.721455


Head of targets


Unnamed: 0_level_0,Adj. Close + 7 days,Adj. Close + 14 days,Adj. Close + 28 days
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-03-07,18.619983,19.256485,18.821263
2013-03-08,18.509287,19.362569,18.531634
2013-03-11,18.901335,19.175769,18.795776
2013-03-12,18.63382,19.60025,18.754069
2013-03-13,19.012031,19.379668,18.953334


Train shape: (1005, 1)
Test shape: (252, 1)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 0.5089917884266323
--------------------------------------------------------------------------------






Processing GILD
Head of features


Unnamed: 0_level_0,Adj. Close
Date,Unnamed: 1_level_1
2013-03-07,42.484876
2013-03-08,42.938552
2013-03-11,43.420583
2013-03-12,43.647421
2013-03-13,43.354422


Head of targets


Unnamed: 0_level_0,Adj. Close + 7 days,Adj. Close + 14 days,Adj. Close + 28 days
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-03-07,42.343102,45.102965,49.091534
2013-03-08,41.946136,46.256059,48.070763
2013-03-11,42.078458,45.348706,50.254079
2013-03-12,42.097361,45.556641,51.085819
2013-03-13,43.014165,45.017901,50.405305


Train shape: (1006, 1)
Test shape: (252, 1)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Normalized root mean squared error (averaged for the three predictions): 0.9364518171904758
--------------------------------------------------------------------------------






