## Neural network to predict when to buy, sell or hold a stock

Here we are going to build a prediction algorithm to tell us when to buy, sell or hold a stock. This prediction task has been set up in such a way that the algorithm will make a buy/sell/hold prediction once every day. The data which we'll use are the historical prices of *Netflix* stock dating back to 2002. We'll build a sequential neural network using *Keras* and *Tensorflow*, as well as *Pandas* and *Numpy* for handling the data and doing some computtions.

### Import historical price data

Begin by loading all required packages

In [1]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral3
from bokeh.transform import factor_cmap
import bokeh.models as bmo
#
import pandas as pd
import numpy as np
#
from keras.layers import Input, Embedding, Dense, concatenate, Dropout, Flatten
from keras.models import Model, load_model
from keras import metrics, initializers, optimizers, regularizers

Using TensorFlow backend.


Import data using pandas and convert the columns to the appropriate data types.

In [2]:

pd.set_option("display.max_rows",10)
pd.set_option("display.max_columns",100)
NtflxData = pd.read_csv("NFLX.csv")
print("The data types in our table are:")
display(NtflxData.dtypes)
print ("### Changing index to 'date' ###")
NtflxData['Date'] = pd.to_datetime(NtflxData['Date'])
NtflxData = NtflxData.set_index('Date')
display(NtflxData)

The data types in our table are:


Date          object
Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume       float64
dtype: object

### Changing index to 'date' ###


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2002-05-23,1.156429,1.242857,1.145714,1.196429,1.196429,104790000.0
2002-05-24,1.214286,1.225000,1.197143,1.210000,1.210000,11104800.0
2002-05-28,1.213571,1.232143,1.157143,1.157143,1.157143,6609400.0
2002-05-29,1.164286,1.164286,1.085714,1.103571,1.103571,6757800.0
2002-05-30,1.107857,1.107857,1.071429,1.071429,1.071429,10154200.0
...,...,...,...,...,...,...
2018-02-01,266.410004,271.950012,263.380005,265.070007,265.070007,9669000.0
2018-02-02,263.000000,270.619995,262.709991,267.429993,267.429993,9123600.0
2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,11896100.0
2018-02-06,247.699997,266.700012,245.000000,265.720001,265.720001,12595800.0


We'll also import *S&P500* data to use as feature. The *S&P500* index should tell us the overall market trend, which might impact the our stock of interest.

*Note: We are actually using IVV index fund here because it mimicks the actual S&P500 index, and was easily obtainable.*

In [3]:
### import S&P 500 data
IVVData = pd.read_csv('IVV.csv')
IVVData['Date'] = pd.to_datetime(IVVData['Date'])
IVVData = IVVData.set_index('Date')
display(IVVData.head())

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-05-19,142.65625,142.65625,140.25,140.6875,100.833214,775500
2000-05-22,140.59375,140.59375,136.8125,139.8125,100.206131,1850600
2000-05-23,140.21875,140.21875,137.6875,137.6875,98.683067,373900
2000-05-24,137.75,140.0625,136.65625,139.75,100.161354,400300
2000-05-25,140.03125,140.9375,137.875,138.46875,99.243027,69600


#### Add some features

Some features extracted from the *date* will be added here. Additionally, we'll also add the *close price of next day* from which the exact *target* will be extracted.

In [4]:

NtflxData = NtflxData[pd.notnull(NtflxData['Close'])]
NtflxData['NextDayClose'] = np.nan
NtflxData['NextDayClose'][0:(len(NtflxData)-1)] = NtflxData['Close'][1:len(NtflxData)]
NtflxData['Day'] = NtflxData.index.day
NtflxData['Month'] = NtflxData.index.month
NtflxData['Year'] = NtflxData.index.year
NtflxData['DayofWeek'] = NtflxData.index.dayofweek
NtflxData['DayofYear'] = NtflxData.index.dayofyear
display(NtflxData)

#### Now for S&P 500

IVVData = IVVData[pd.notnull(IVVData['Close'])]
IVVData['NextDayClose'] = np.nan
IVVData['NextDayClose'][0:(len(IVVData)-1)] = IVVData['Close'][1:len(IVVData)]
IVVData['Day'] = IVVData.index.day
IVVData['Month'] = IVVData.index.month
IVVData['Year'] = IVVData.index.year
IVVData['DayofWeek'] = IVVData.index.dayofweek
IVVData['DayofYear'] = IVVData.index.dayofyear
display(IVVData)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2002-05-23,1.156429,1.242857,1.145714,1.196429,1.196429,104790000.0,1.210000,23,5,2002,3,143
2002-05-24,1.214286,1.225000,1.197143,1.210000,1.210000,11104800.0,1.157143,24,5,2002,4,144
2002-05-28,1.213571,1.232143,1.157143,1.157143,1.157143,6609400.0,1.103571,28,5,2002,1,148
2002-05-29,1.164286,1.164286,1.085714,1.103571,1.103571,6757800.0,1.071429,29,5,2002,2,149
2002-05-30,1.107857,1.107857,1.071429,1.071429,1.071429,10154200.0,1.076429,30,5,2002,3,150
...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-01,266.410004,271.950012,263.380005,265.070007,265.070007,9669000.0,267.429993,1,2,2018,3,32
2018-02-02,263.000000,270.619995,262.709991,267.429993,267.429993,9123600.0,254.259995,2,2,2018,4,33
2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,11896100.0,265.720001,5,2,2018,0,36
2018-02-06,247.699997,266.700012,245.000000,265.720001,265.720001,12595800.0,264.559998,6,2,2018,1,37


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2000-05-19,142.656250,142.656250,140.250000,140.687500,100.833214,775500,139.812500,19,5,2000,4,140
2000-05-22,140.593750,140.593750,136.812500,139.812500,100.206131,1850600,137.687500,22,5,2000,0,143
2000-05-23,140.218750,140.218750,137.687500,137.687500,98.683067,373900,139.750000,23,5,2000,1,144
2000-05-24,137.750000,140.062500,136.656250,139.750000,100.161354,400300,138.468750,24,5,2000,2,145
2000-05-25,140.031250,140.937500,137.875000,138.468750,99.243027,69600,137.843750,25,5,2000,3,146
...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-12,265.670013,268.929993,263.820007,267.179993,267.179993,6801400,267.959991,12,2,2018,0,43
2018-02-13,265.899994,268.559998,265.239990,267.959991,267.959991,4210500,271.649994,13,2,2018,1,44
2018-02-14,266.359985,271.980011,266.220001,271.649994,271.649994,6380400,275.019989,14,2,2018,2,45
2018-02-15,273.519989,275.029999,270.779999,275.019989,275.019989,5461700,275.089996,15,2,2018,3,46


Here the actual *target* feature will be extracted. Essentially, if the next day's *Percent Change in Closing Price* is larger than +1, then it is considered a strong *buy* signal. On the other hand, if it is less than -1, then it is considered a strong sell signal. Anything between -1 and +1 is considered a weak price movement for which the signal will be *hold*.

In [5]:
NtflxData['NextDayPrcntChange'] = (NtflxData['NextDayClose'] - NtflxData['Close']) / NtflxData['Close']
NtflxData['NextDayPrcntChangeBinned'] = np.zeros(len(NtflxData))
NtflxData['NextDayPrcntChangeBinned'][(NtflxData['NextDayPrcntChange'] > 0.01)] = 1
NtflxData['NextDayPrcntChangeBinned'][(NtflxData['NextDayPrcntChange'] < -0.01)] = -1
NtflxData['S.No.'] = range(len(NtflxData))
display(NtflxData)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,NextDayPrcntChange,NextDayPrcntChangeBinned,S.No.
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2002-05-23,1.156429,1.242857,1.145714,1.196429,1.196429,104790000.0,1.210000,23,5,2002,3,143,0.011343,1.0,0
2002-05-24,1.214286,1.225000,1.197143,1.210000,1.210000,11104800.0,1.157143,24,5,2002,4,144,-0.043683,-1.0,1
2002-05-28,1.213571,1.232143,1.157143,1.157143,1.157143,6609400.0,1.103571,28,5,2002,1,148,-0.046297,-1.0,2
2002-05-29,1.164286,1.164286,1.085714,1.103571,1.103571,6757800.0,1.071429,29,5,2002,2,149,-0.029125,-1.0,3
2002-05-30,1.107857,1.107857,1.071429,1.071429,1.071429,10154200.0,1.076429,30,5,2002,3,150,0.004667,0.0,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-01,266.410004,271.950012,263.380005,265.070007,265.070007,9669000.0,267.429993,1,2,2018,3,32,0.008903,0.0,3951
2018-02-02,263.000000,270.619995,262.709991,267.429993,267.429993,9123600.0,254.259995,2,2,2018,4,33,-0.049247,-1.0,3952
2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,11896100.0,265.720001,5,2,2018,0,36,0.045072,1.0,3953
2018-02-06,247.699997,266.700012,245.000000,265.720001,265.720001,12595800.0,264.559998,6,2,2018,1,37,-0.004366,0.0,3954


We'll now plot the *target* variable to make sure it is evenly spread throughout the table, as well as to make sure all the three possible target classes are equally represented.


In [6]:
output_notebook()
rolling_window_sum = NtflxData['NextDayPrcntChangeBinned'].rolling(7).sum()
line_data = {'x_values': rolling_window_sum.index,
        'y_values': rolling_window_sum.values}

source = ColumnDataSource(data=line_data)

line_plot = figure(plot_width=950, x_axis_type="datetime")
line_plot.line(x='x_values', y='y_values', source=source, line_width=3)
line_plot.xaxis.axis_label = 'Time'
line_plot.yaxis.axis_label = 'Window Sum of the Target Label'
show(line_plot)

The above plot shows that all the three target classes are evenly distributed across the table. They do not trend up or down with respect to time (along x-axis).

It is also important to make sure the classes are balanced. The following plot shows that all the three target classes are approximately equaly represented. This would prevent the model from simply predicting the most frequent target class more often.

In [7]:
hist_data = NtflxData.groupby(by = 'NextDayPrcntChangeBinned').count().iloc[:,12]
hist_plot_data = {'Gain/Loss': ['Down', 'Neutral', 'High'], 'Count': NtflxData.groupby(by = 'NextDayPrcntChangeBinned').count().iloc[:,12].values}

source = ColumnDataSource(data=hist_plot_data)

hist_plot = figure(x_range=['Down', 'Neutral', 'High'], plot_width=550)
hist_plot.vbar(x='Gain/Loss', top='Count', source=source, width=0.3)

show(hist_plot)

The plot clearly shows that all the three target classes are almost eqaually represented.

Now we'll add *trend* features to know which way are the prices/market heading in the short as well as long term.

Add percent chnage in a given window.

In [8]:
ChangeDays = [1,2,3,5,7,14,30,60,90,120]
for CurrentDay in ChangeDays:
    NewColName = 'PCntChng{0}Days'.format(CurrentDay)
    NtflxData[NewColName] = NtflxData['Close'].pct_change(periods=CurrentDay)
    IVVData[NewColName] = IVVData['Close'].pct_change(periods=CurrentDay)

Add percent above/below rolling window mean

In [9]:
ChangeDays = [2,3,5,7,14,30,60,90,120]
for CurrentDay in ChangeDays:
    NewColName = 'CloseRolled{0}Day_DiffPer'.format(CurrentDay)
    NtflxData[NewColName] = (NtflxData['Close'] - NtflxData['Close'].rolling(window=CurrentDay).mean()) / NtflxData['Close'].rolling(window=CurrentDay).mean()
    IVVData[NewColName] = (IVVData['Close'] - IVVData['Close'].rolling(window=CurrentDay).mean()) / IVVData['Close'].rolling(window=CurrentDay).mean()

Add percent above/below rolling Volume mean

In [10]:
ChangeDays = [2,3,5,7,14,30]
for CurrentDay in ChangeDays:
    NewColName = 'VolumeRolled{0}Day_DiffPer'.format(CurrentDay)
    NtflxData[NewColName] = (NtflxData['Volume'] - NtflxData['Volume'].rolling(window=CurrentDay).mean()) / NtflxData['Volume'].rolling(window=CurrentDay).mean()
    IVVData[NewColName] = (IVVData['Volume'] - IVVData['Volume'].rolling(window=CurrentDay).mean()) / IVVData['Volume'].rolling(window=CurrentDay).mean()

In [11]:
display(NtflxData)
display(IVVData)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,NextDayPrcntChange,NextDayPrcntChangeBinned,S.No.,PCntChng1Days,PCntChng2Days,PCntChng3Days,PCntChng5Days,PCntChng7Days,PCntChng14Days,PCntChng30Days,PCntChng60Days,PCntChng90Days,PCntChng120Days,CloseRolled2Day_DiffPer,CloseRolled3Day_DiffPer,CloseRolled5Day_DiffPer,CloseRolled7Day_DiffPer,CloseRolled14Day_DiffPer,CloseRolled30Day_DiffPer,CloseRolled60Day_DiffPer,CloseRolled90Day_DiffPer,CloseRolled120Day_DiffPer,VolumeRolled2Day_DiffPer,VolumeRolled3Day_DiffPer,VolumeRolled5Day_DiffPer,VolumeRolled7Day_DiffPer,VolumeRolled14Day_DiffPer,VolumeRolled30Day_DiffPer
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1
2002-05-23,1.156429,1.242857,1.145714,1.196429,1.196429,104790000.0,1.210000,23,5,2002,3,143,0.011343,1.0,0,,,,,,,,,,,,,,,,,,,,,,,,,
2002-05-24,1.214286,1.225000,1.197143,1.210000,1.210000,11104800.0,1.157143,24,5,2002,4,144,-0.043683,-1.0,1,0.011343,,,,,,,,,,0.005639,,,,,,,,,-0.808364,,,,,
2002-05-28,1.213571,1.232143,1.157143,1.157143,1.157143,6609400.0,1.103571,28,5,2002,1,148,-0.046297,-1.0,2,-0.043683,-0.032836,,,,,,,,,-0.022329,-0.025857,,,,,,,,-0.253774,-0.838143,,,,
2002-05-29,1.164286,1.164286,1.085714,1.103571,1.103571,6757800.0,1.071429,29,5,2002,2,149,-0.029125,-1.0,3,-0.046297,-0.087958,-0.077613,,,,,,,,-0.023697,-0.046100,,,,,,,,0.011102,-0.171568,,,,
2002-05-30,1.107857,1.107857,1.071429,1.071429,1.071429,10154200.0,1.076429,30,5,2002,3,150,0.004667,0.0,4,-0.029125,-0.074074,-0.114521,,,,,,,,-0.014778,-0.035369,-0.066467,,,,,,,0.200828,0.295101,-0.635831,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-01,266.410004,271.950012,263.380005,265.070007,265.070007,9669000.0,267.429993,1,2,2018,3,32,0.008903,0.0,3951,-0.019349,-0.049247,-0.068590,-0.017167,0.059052,0.220171,0.392028,0.325284,0.414839,0.567163,-0.009769,-0.023287,-0.034958,-0.025662,0.065369,0.192360,0.280986,0.311580,0.355947,-0.094837,-0.142996,-0.225222,-0.288201,-0.287954,-0.008567
2018-02-02,263.000000,270.619995,262.709991,267.429993,267.429993,9123600.0,254.259995,2,2,2018,4,33,-0.049247,-1.0,3952,0.008903,-0.010618,-0.040782,-0.026111,0.023460,0.208832,0.429954,0.336281,0.497788,0.560268,0.004432,-0.000635,-0.021256,-0.020141,0.060785,0.188645,0.285423,0.316823,0.362442,-0.029022,-0.102235,-0.245987,-0.264721,-0.331369,-0.071128
2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,11896100.0,265.720001,5,2,2018,0,36,0.045072,1.0,3953,-0.049247,-0.040782,-0.059341,-0.106574,-0.057249,0.147745,0.346573,0.297973,0.417438,0.486901,-0.025245,-0.030479,-0.048328,-0.060806,-0.000721,0.119256,0.216433,0.246866,0.290784,0.131900,0.162913,0.084091,-0.001744,-0.120727,0.189539
2018-02-06,247.699997,266.700012,245.000000,265.720001,265.720001,12595800.0,264.559998,6,2,2018,1,37,-0.004366,0.0,3954,0.045072,-0.006394,0.002452,-0.046915,-0.032338,0.221701,0.408758,0.352678,0.460241,0.576973,0.022039,0.012382,0.004400,-0.013853,0.030370,0.156619,0.264276,0.297145,0.343437,0.028569,0.124106,0.145498,0.037396,-0.085772,0.227327


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,PCntChng1Days,PCntChng2Days,PCntChng3Days,PCntChng5Days,PCntChng7Days,PCntChng14Days,PCntChng30Days,PCntChng60Days,PCntChng90Days,PCntChng120Days,CloseRolled2Day_DiffPer,CloseRolled3Day_DiffPer,CloseRolled5Day_DiffPer,CloseRolled7Day_DiffPer,CloseRolled14Day_DiffPer,CloseRolled30Day_DiffPer,CloseRolled60Day_DiffPer,CloseRolled90Day_DiffPer,CloseRolled120Day_DiffPer,VolumeRolled2Day_DiffPer,VolumeRolled3Day_DiffPer,VolumeRolled5Day_DiffPer,VolumeRolled7Day_DiffPer,VolumeRolled14Day_DiffPer,VolumeRolled30Day_DiffPer
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1
2000-05-19,142.656250,142.656250,140.250000,140.687500,100.833214,775500,139.812500,19,5,2000,4,140,,,,,,,,,,,,,,,,,,,,,,,,,
2000-05-22,140.593750,140.593750,136.812500,139.812500,100.206131,1850600,137.687500,22,5,2000,0,143,-0.006219,,,,,,,,,,-0.003119,,,,,,,,,0.409390,,,,,
2000-05-23,140.218750,140.218750,137.687500,137.687500,98.683067,373900,139.750000,23,5,2000,1,144,-0.015199,-0.021324,,,,,,,,,-0.007658,-0.012255,,,,,,,,-0.663835,-0.626100,,,,
2000-05-24,137.750000,140.062500,136.656250,139.750000,100.161354,400300,138.468750,24,5,2000,2,145,0.014980,-0.000447,-0.006664,,,,,,,,0.007434,0.004793,,,,,,,,0.034100,-0.542479,,,,
2000-05-25,140.031250,140.937500,137.875000,138.468750,99.243027,69600,137.843750,25,5,2000,3,146,-0.009168,0.005674,-0.009611,,,,,,,,-0.004605,-0.001202,-0.005834,,,,,,,-0.703767,-0.752548,-0.899709,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-12,265.670013,268.929993,263.820007,267.179993,267.179993,6801400,267.959991,12,2,2018,0,43,0.013312,0.029119,-0.008940,0.004247,-0.058529,-0.063807,-0.009674,0.029278,0.049534,0.092359,0.006612,0.014004,0.003440,-0.002475,-0.034079,-0.035988,-0.015982,0.000430,0.016435,-0.378359,-0.350970,-0.401763,-0.392786,-0.170687,0.116992
2018-02-13,265.899994,268.559998,265.239990,267.959991,267.959991,4210500,271.649994,13,2,2018,1,44,0.002919,0.016270,0.032124,-0.012166,-0.034483,-0.060581,-0.003310,0.037881,0.051236,0.084858,0.001458,0.006347,0.008870,0.005570,-0.026917,-0.033071,-0.013701,0.002805,0.018725,-0.235282,-0.515897,-0.531829,-0.610980,-0.482368,-0.311683
2018-02-14,266.359985,271.980011,266.220001,271.649994,271.649994,6380400,275.019989,14,2,2018,2,45,0.013771,0.016730,0.030265,0.007641,0.021049,-0.048011,0.002954,0.043443,0.059560,0.103595,0.006838,0.010114,0.021179,0.016366,-0.009999,-0.019850,-0.000812,0.015970,0.031920,0.204883,0.100556,-0.240951,-0.337704,-0.231569,0.055495
2018-02-15,273.519989,275.029999,270.779999,275.019989,275.019989,5461700,275.089996,15,2,2018,3,46,0.012406,0.026347,0.029343,0.059317,0.013861,-0.047253,0.009544,0.059277,0.073835,0.119378,0.006165,0.012803,0.022014,0.026911,0.005854,-0.008000,0.010630,0.027766,0.043753,-0.077579,0.020713,-0.280118,-0.327017,-0.353788,-0.104842


Here we'll scale and normalize the appropriate features to prepare them as inputs to our model.

In [12]:
CategoricalFeatures = NtflxData.columns[[8,9,10]]
for CurrentColumn in CategoricalFeatures:
    NtflxData[CurrentColumn] = NtflxData[CurrentColumn].astype('category')
    NtflxData[CurrentColumn] = NtflxData[CurrentColumn].cat.codes
    IVVData[CurrentColumn] = IVVData[CurrentColumn].astype('category')
    IVVData[CurrentColumn] = IVVData[CurrentColumn].cat.codes


### Normalize Volume #####

NtflxData['Volume'] = np.log1p(NtflxData['Volume'])
NtflxData['Volume'] = (NtflxData['Volume'] - np.min(NtflxData['Volume'])) / (np.max(NtflxData['Volume'] - np.min(NtflxData['Volume'])))

IVVData['Volume'] = np.log1p(IVVData['Volume'])
IVVData['Volume'] = (IVVData['Volume'] - np.min(IVVData['Volume'])) / (np.max(IVVData['Volume'] - np.min(IVVData['Volume'])))

display (NtflxData)
display (IVVData)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,NextDayPrcntChange,NextDayPrcntChangeBinned,S.No.,PCntChng1Days,PCntChng2Days,PCntChng3Days,PCntChng5Days,PCntChng7Days,PCntChng14Days,PCntChng30Days,PCntChng60Days,PCntChng90Days,PCntChng120Days,CloseRolled2Day_DiffPer,CloseRolled3Day_DiffPer,CloseRolled5Day_DiffPer,CloseRolled7Day_DiffPer,CloseRolled14Day_DiffPer,CloseRolled30Day_DiffPer,CloseRolled60Day_DiffPer,CloseRolled90Day_DiffPer,CloseRolled120Day_DiffPer,VolumeRolled2Day_DiffPer,VolumeRolled3Day_DiffPer,VolumeRolled5Day_DiffPer,VolumeRolled7Day_DiffPer,VolumeRolled14Day_DiffPer,VolumeRolled30Day_DiffPer
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1
2002-05-23,1.156429,1.242857,1.145714,1.196429,1.196429,0.839738,1.210000,23,4,0,3,143,0.011343,1.0,0,,,,,,,,,,,,,,,,,,,,,,,,,
2002-05-24,1.214286,1.225000,1.197143,1.210000,1.210000,0.520547,1.157143,24,4,0,4,144,-0.043683,-1.0,1,0.011343,,,,,,,,,,0.005639,,,,,,,,,-0.808364,,,,,
2002-05-28,1.213571,1.232143,1.157143,1.157143,1.157143,0.446759,1.103571,28,4,0,1,148,-0.046297,-1.0,2,-0.043683,-0.032836,,,,,,,,,-0.022329,-0.025857,,,,,,,,-0.253774,-0.838143,,,,
2002-05-29,1.164286,1.164286,1.085714,1.103571,1.103571,0.449917,1.071429,29,4,0,2,149,-0.029125,-1.0,3,-0.046297,-0.087958,-0.077613,,,,,,,,-0.023697,-0.046100,,,,,,,,0.011102,-0.171568,,,,
2002-05-30,1.107857,1.107857,1.071429,1.071429,1.071429,0.507821,1.076429,30,4,0,3,150,0.004667,0.0,4,-0.029125,-0.074074,-0.114521,,,,,,,,-0.014778,-0.035369,-0.066467,,,,,,,0.200828,0.295101,-0.635831,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-01,266.410004,271.950012,263.380005,265.070007,265.070007,0.500859,267.429993,1,1,16,3,32,0.008903,0.0,3951,-0.019349,-0.049247,-0.068590,-0.017167,0.059052,0.220171,0.392028,0.325284,0.414839,0.567163,-0.009769,-0.023287,-0.034958,-0.025662,0.065369,0.192360,0.280986,0.311580,0.355947,-0.094837,-0.142996,-0.225222,-0.288201,-0.287954,-0.008567
2018-02-02,263.000000,270.619995,262.709991,267.429993,267.429993,0.492602,254.259995,2,1,16,4,33,-0.049247,-1.0,3952,0.008903,-0.010618,-0.040782,-0.026111,0.023460,0.208832,0.429954,0.336281,0.497788,0.560268,0.004432,-0.000635,-0.021256,-0.020141,0.060785,0.188645,0.285423,0.316823,0.362442,-0.029022,-0.102235,-0.245987,-0.264721,-0.331369,-0.071128
2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,0.530336,265.720001,5,1,16,0,36,0.045072,1.0,3953,-0.049247,-0.040782,-0.059341,-0.106574,-0.057249,0.147745,0.346573,0.297973,0.417438,0.486901,-0.025245,-0.030479,-0.048328,-0.060806,-0.000721,0.119256,0.216433,0.246866,0.290784,0.131900,0.162913,0.084091,-0.001744,-0.120727,0.189539
2018-02-06,247.699997,266.700012,245.000000,265.720001,265.720001,0.538463,264.559998,6,1,16,1,37,-0.004366,0.0,3954,0.045072,-0.006394,0.002452,-0.046915,-0.032338,0.221701,0.408758,0.352678,0.460241,0.576973,0.022039,0.012382,0.004400,-0.013853,0.030370,0.156619,0.264276,0.297145,0.343437,0.028569,0.124106,0.145498,0.037396,-0.085772,0.227327


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,PCntChng1Days,PCntChng2Days,PCntChng3Days,PCntChng5Days,PCntChng7Days,PCntChng14Days,PCntChng30Days,PCntChng60Days,PCntChng90Days,PCntChng120Days,CloseRolled2Day_DiffPer,CloseRolled3Day_DiffPer,CloseRolled5Day_DiffPer,CloseRolled7Day_DiffPer,CloseRolled14Day_DiffPer,CloseRolled30Day_DiffPer,CloseRolled60Day_DiffPer,CloseRolled90Day_DiffPer,CloseRolled120Day_DiffPer,VolumeRolled2Day_DiffPer,VolumeRolled3Day_DiffPer,VolumeRolled5Day_DiffPer,VolumeRolled7Day_DiffPer,VolumeRolled14Day_DiffPer,VolumeRolled30Day_DiffPer
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1
2000-05-19,142.656250,142.656250,140.250000,140.687500,100.833214,0.550053,139.812500,19,4,0,4,140,,,,,,,,,,,,,,,,,,,,,,,,,
2000-05-22,140.593750,140.593750,136.812500,139.812500,100.206131,0.650107,137.687500,22,4,0,0,143,-0.006219,,,,,,,,,,-0.003119,,,,,,,,,0.409390,,,,,
2000-05-23,140.218750,140.218750,137.687500,137.687500,98.683067,0.466132,139.750000,23,4,0,1,144,-0.015199,-0.021324,,,,,,,,,-0.007658,-0.012255,,,,,,,,-0.663835,-0.626100,,,,
2000-05-24,137.750000,140.062500,136.656250,139.750000,100.161354,0.473980,138.468750,24,4,0,2,145,0.014980,-0.000447,-0.006664,,,,,,,,0.007434,0.004793,,,,,,,,0.034100,-0.542479,,,,
2000-05-25,140.031250,140.937500,137.875000,138.468750,99.243027,0.272731,137.843750,25,4,0,3,146,-0.009168,0.005674,-0.009611,,,,,,,,-0.004605,-0.001202,-0.005834,,,,,,,-0.703767,-0.752548,-0.899709,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-12,265.670013,268.929993,263.820007,267.179993,267.179993,0.799841,267.959991,12,1,18,0,43,0.013312,0.029119,-0.008940,0.004247,-0.058529,-0.063807,-0.009674,0.029278,0.049534,0.092359,0.006612,0.014004,0.003440,-0.002475,-0.034079,-0.035988,-0.015982,0.000430,0.016435,-0.378359,-0.350970,-0.401763,-0.392786,-0.170687,0.116992
2018-02-13,265.899994,268.559998,265.239990,267.959991,267.959991,0.744675,271.649994,13,1,18,1,44,0.002919,0.016270,0.032124,-0.012166,-0.034483,-0.060581,-0.003310,0.037881,0.051236,0.084858,0.001458,0.006347,0.008870,0.005570,-0.026917,-0.033071,-0.013701,0.002805,0.018725,-0.235282,-0.515897,-0.531829,-0.610980,-0.482368,-0.311683
2018-02-14,266.359985,271.980011,266.220001,271.649994,271.649994,0.792490,275.019989,14,1,18,2,45,0.013771,0.016730,0.030265,0.007641,0.021049,-0.048011,0.002954,0.043443,0.059560,0.103595,0.006838,0.010114,0.021179,0.016366,-0.009999,-0.019850,-0.000812,0.015970,0.031920,0.204883,0.100556,-0.240951,-0.337704,-0.231569,0.055495
2018-02-15,273.519989,275.029999,270.779999,275.019989,275.019989,0.774605,275.089996,15,1,18,3,46,0.012406,0.026347,0.029343,0.059317,0.013861,-0.047253,0.009544,0.059277,0.073835,0.119378,0.006165,0.012803,0.022014,0.026911,0.005854,-0.008000,0.010630,0.027766,0.043753,-0.077579,0.020713,-0.280118,-0.327017,-0.353788,-0.104842


Now we need to merge the two dataframes, *Netflix* and *S&P500*.

In [13]:
CombinedDF = NtflxData.join(IVVData[['PCntChng1Days','PCntChng2Days','PCntChng3Days','PCntChng5Days',
                                     'PCntChng7Days','PCntChng14Days','PCntChng30Days','PCntChng60Days',
                                     'PCntChng90Days','PCntChng120Days','CloseRolled2Day_DiffPer',
                                     'CloseRolled3Day_DiffPer','CloseRolled5Day_DiffPer','CloseRolled7Day_DiffPer',
                                     'CloseRolled14Day_DiffPer','CloseRolled30Day_DiffPer']],
                            lsuffix='_Ntflx', rsuffix='_SP500')
display(CombinedDF)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,NextDayPrcntChange,NextDayPrcntChangeBinned,S.No.,PCntChng1Days_Ntflx,PCntChng2Days_Ntflx,PCntChng3Days_Ntflx,PCntChng5Days_Ntflx,PCntChng7Days_Ntflx,PCntChng14Days_Ntflx,PCntChng30Days_Ntflx,PCntChng60Days_Ntflx,PCntChng90Days_Ntflx,PCntChng120Days_Ntflx,CloseRolled2Day_DiffPer_Ntflx,CloseRolled3Day_DiffPer_Ntflx,CloseRolled5Day_DiffPer_Ntflx,CloseRolled7Day_DiffPer_Ntflx,CloseRolled14Day_DiffPer_Ntflx,CloseRolled30Day_DiffPer_Ntflx,CloseRolled60Day_DiffPer,CloseRolled90Day_DiffPer,CloseRolled120Day_DiffPer,VolumeRolled2Day_DiffPer,VolumeRolled3Day_DiffPer,VolumeRolled5Day_DiffPer,VolumeRolled7Day_DiffPer,VolumeRolled14Day_DiffPer,VolumeRolled30Day_DiffPer,PCntChng1Days_SP500,PCntChng2Days_SP500,PCntChng3Days_SP500,PCntChng5Days_SP500,PCntChng7Days_SP500,PCntChng14Days_SP500,PCntChng30Days_SP500,PCntChng60Days_SP500,PCntChng90Days_SP500,PCntChng120Days_SP500,CloseRolled2Day_DiffPer_SP500,CloseRolled3Day_DiffPer_SP500,CloseRolled5Day_DiffPer_SP500,CloseRolled7Day_DiffPer_SP500,CloseRolled14Day_DiffPer_SP500,CloseRolled30Day_DiffPer_SP500
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1
2002-05-23,1.156429,1.242857,1.145714,1.196429,1.196429,0.839738,1.210000,23,4,0,3,143,0.011343,1.0,0,,,,,,,,,,,,,,,,,,,,,,,,,,0.009995,0.014461,0.003554,-0.001632,-0.000454,0.023321,-0.005328,-0.013348,-0.035721,-0.036648,0.004973,0.008116,0.004066,0.003527,0.014835,0.006844
2002-05-24,1.214286,1.225000,1.197143,1.210000,1.210000,0.520547,1.157143,24,4,0,4,144,-0.043683,-1.0,1,0.011343,,,,,,,,,,0.005639,,,,,,,,,-0.808364,,,,,,-0.012620,-0.002751,0.001658,-0.019917,-0.006668,0.030220,-0.024051,-0.020623,-0.055169,-0.046722,-0.006350,-0.005153,-0.004595,-0.008195,-0.000072,-0.005050
2002-05-28,1.213571,1.232143,1.157143,1.157143,1.157143,0.446759,1.103571,28,4,0,1,148,-0.046297,-1.0,2,-0.043683,-0.032836,,,,,,,,,-0.022329,-0.025857,,,,,,,,-0.253774,-0.838143,,,,,-0.006989,-0.019521,-0.009720,-0.016036,-0.021120,0.027791,-0.024040,-0.050721,-0.042642,-0.047035,-0.003507,-0.008902,-0.008356,-0.012127,-0.008961,-0.011201
2002-05-29,1.164286,1.164286,1.085714,1.103571,1.103571,0.449917,1.071429,29,4,0,2,149,-0.029125,-1.0,3,-0.046297,-0.087958,-0.077613,,,,,,,,-0.023697,-0.046100,,,,,,,,0.011102,-0.171568,,,,,-0.007316,-0.014253,-0.026693,-0.012619,-0.033886,-0.016514,-0.052669,-0.073226,-0.056919,-0.069202,-0.003671,-0.007224,-0.013128,-0.014512,-0.015049,-0.016646
2002-05-30,1.107857,1.107857,1.071429,1.071429,1.071429,0.507821,1.076429,30,4,0,3,150,0.004667,0.0,4,-0.029125,-0.074074,-0.114521,,,,,,,,-0.014778,-0.035369,-0.066467,,,,,,,0.200828,0.295101,-0.635831,,,,-0.000840,-0.008149,-0.015080,-0.017790,-0.024055,-0.005478,-0.052375,-0.071837,-0.052627,-0.088037,-0.000420,-0.003010,-0.010421,-0.011913,-0.015494,-0.015690
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-02-01,266.410004,271.950012,263.380005,265.070007,265.070007,0.500859,267.429993,1,1,16,3,32,0.008903,0.0,3951,-0.019349,-0.049247,-0.068590,-0.017167,0.059052,0.220171,0.392028,0.325284,0.414839,0.567163,-0.009769,-0.023287,-0.034958,-0.025662,0.065369,0.192360,0.280986,0.311580,0.355947,-0.094837,-0.142996,-0.225222,-0.288201,-0.287954,-0.008567,-0.001267,0.000494,-0.010323,-0.005467,-0.005606,0.020497,0.045383,0.090745,0.124500,0.156109,-0.000634,-0.000258,-0.005641,-0.005537,0.000279,0.022662
2018-02-02,263.000000,270.619995,262.709991,267.429993,267.429993,0.492602,254.259995,2,1,16,4,33,-0.049247,-1.0,3952,0.008903,-0.010618,-0.040782,-0.026111,0.023460,0.208832,0.429954,0.336281,0.497788,0.560268,0.004432,-0.000635,-0.021256,-0.020141,0.060785,0.188645,0.285423,0.316823,0.362442,-0.029022,-0.102235,-0.245987,-0.264721,-0.331369,-0.071128,-0.022059,-0.023298,-0.021576,-0.038557,-0.027030,-0.008503,0.031212,0.064884,0.101790,0.129456,-0.011152,-0.015234,-0.019931,-0.023705,-0.021199,-0.000905
2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,0.530336,265.720001,5,1,16,0,36,0.045072,1.0,3953,-0.049247,-0.040782,-0.059341,-0.106574,-0.057249,0.147745,0.346573,0.297973,0.417438,0.486901,-0.025245,-0.030479,-0.048328,-0.060806,-0.000721,0.119256,0.216433,0.246866,0.290784,0.131900,0.162913,0.084091,-0.001744,-0.120727,0.189539,-0.041365,-0.062511,-0.063699,-0.072188,-0.067636,-0.046074,-0.010930,0.021344,0.061017,0.072047,-0.021119,-0.035317,-0.046532,-0.054923,-0.058640,-0.041894
2018-02-06,247.699997,266.700012,245.000000,265.720001,265.720001,0.538463,264.559998,6,1,16,1,37,-0.004366,0.0,3954,0.045072,-0.006394,0.002452,-0.046915,-0.032338,0.221701,0.408758,0.352678,0.460241,0.576973,0.022039,0.012382,0.004400,-0.013853,0.030370,0.156619,0.264276,0.297145,0.343437,0.028569,0.124106,0.145498,0.037396,-0.085772,0.227327,0.019583,-0.022592,-0.044152,-0.043681,-0.060279,-0.036992,0.006419,0.039549,0.077583,0.093129,0.009696,-0.001301,-0.019150,-0.027831,-0.037671,-0.023335


### Build model

Now we are ready to start building our model.

But first, we'll remove *NANs* from the dataframe to prevent difficulties later. Then we'll split the entire data randomly into *Training* and *Testing* data.

In [14]:
for CurrentColumn in CombinedDF.columns:
    RowsWithNAN = ~np.isnan(CombinedDF[CurrentColumn])
    CombinedDF = CombinedDF[RowsWithNAN]

    TrainIndices = np.random.randint(low=0, high = int(len(CombinedDF)), size = round(0.2*int(len(CombinedDF))))
TrainIndicesBoolean = pd.Series(range(len(CombinedDF)), copy=True).isin(TrainIndices)
MyTest = pd.DataFrame(CombinedDF[TrainIndicesBoolean.values], copy=True)
TrainIndicesBoolean = ~pd.Series(range(len(CombinedDF)), copy=True).isin(TrainIndices)
MyTrain = pd.DataFrame(CombinedDF[TrainIndicesBoolean.values], copy=True)
pd.options.display.max_columns = None
display(MyTrain)
display(MyTest)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,NextDayPrcntChange,NextDayPrcntChangeBinned,S.No.,PCntChng1Days_Ntflx,PCntChng2Days_Ntflx,PCntChng3Days_Ntflx,PCntChng5Days_Ntflx,PCntChng7Days_Ntflx,PCntChng14Days_Ntflx,PCntChng30Days_Ntflx,PCntChng60Days_Ntflx,PCntChng90Days_Ntflx,PCntChng120Days_Ntflx,CloseRolled2Day_DiffPer_Ntflx,CloseRolled3Day_DiffPer_Ntflx,CloseRolled5Day_DiffPer_Ntflx,CloseRolled7Day_DiffPer_Ntflx,CloseRolled14Day_DiffPer_Ntflx,CloseRolled30Day_DiffPer_Ntflx,CloseRolled60Day_DiffPer,CloseRolled90Day_DiffPer,CloseRolled120Day_DiffPer,VolumeRolled2Day_DiffPer,VolumeRolled3Day_DiffPer,VolumeRolled5Day_DiffPer,VolumeRolled7Day_DiffPer,VolumeRolled14Day_DiffPer,VolumeRolled30Day_DiffPer,PCntChng1Days_SP500,PCntChng2Days_SP500,PCntChng3Days_SP500,PCntChng5Days_SP500,PCntChng7Days_SP500,PCntChng14Days_SP500,PCntChng30Days_SP500,PCntChng60Days_SP500,PCntChng90Days_SP500,PCntChng120Days_SP500,CloseRolled2Day_DiffPer_SP500,CloseRolled3Day_DiffPer_SP500,CloseRolled5Day_DiffPer_SP500,CloseRolled7Day_DiffPer_SP500,CloseRolled14Day_DiffPer_SP500,CloseRolled30Day_DiffPer_SP500
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1
2002-11-12,0.600000,0.614286,0.572143,0.577857,0.577857,0.289746,0.641429,12,10,0,1,316,0.110013,1.0,120,-0.038050,-0.053801,-0.025301,-0.112939,-0.095078,-0.019395,-0.148421,-0.411637,-0.508505,-0.517015,-0.019394,-0.031138,-0.040560,-0.066744,-0.082767,0.009105,-0.191000,-0.279016,-0.335674,0.710383,-0.121773,-0.359447,-0.294455,-0.294500,-0.652521,0.006231,-0.009148,-0.019971,-0.033304,-0.017043,-0.015736,0.035802,-0.069070,-0.093951,-0.193572,0.003106,-0.001012,-0.013550,-0.018130,-0.011684,0.019701
2002-11-13,0.579286,0.642857,0.555714,0.641429,0.641429,0.344992,0.618571,13,10,0,2,317,-0.035636,-1.0,121,0.110013,0.067778,0.050294,0.019296,-0.044681,0.107275,0.108644,-0.322264,-0.447384,-0.469893,0.052139,0.057301,0.060714,0.043147,0.011018,0.116037,-0.095563,-0.193895,-0.258550,0.191841,0.673272,0.214609,0.057953,0.021659,-0.470525,-0.002139,0.004078,-0.011267,-0.045347,-0.027540,0.003624,0.066546,-0.059928,-0.071354,-0.185012,-0.001071,0.000640,-0.006368,-0.016331,-0.014048,0.015370
2002-11-14,0.642857,0.645000,0.618571,0.618571,0.618571,0.168039,0.650000,14,10,0,3,318,0.050809,1.0,122,-0.035636,0.070457,0.029726,0.043373,-0.050440,-0.037778,0.194482,-0.361357,-0.468386,-0.465433,-0.018141,0.009716,0.014289,0.013712,-0.022338,0.070016,-0.120560,-0.216660,-0.281244,-0.552640,-0.560379,-0.600313,-0.693528,-0.663741,-0.839709,0.024371,0.022180,0.028549,0.001765,-0.011863,0.007658,0.104501,-0.051306,-0.015293,-0.159274,0.012039,0.015397,0.017483,0.009387,0.009427,0.036715
2002-11-15,0.642143,0.653571,0.589286,0.650000,0.650000,0.279967,0.778571,15,10,0,4,319,0.197802,1.0,123,0.050809,0.013362,0.124846,0.064328,0.032917,-0.052083,0.387196,-0.339143,-0.487901,-0.411003,0.024775,0.020942,0.052267,0.060077,0.031495,0.112741,-0.068514,-0.169624,-0.241393,0.374412,-0.011502,0.166188,-0.120409,-0.185554,-0.634439,0.007380,0.031931,0.029723,0.020303,-0.014864,0.020873,0.130531,-0.054579,-0.017088,-0.146828,0.003676,0.012921,0.020827,0.019069,0.015368,0.040185
2002-11-18,0.648571,0.780714,0.625000,0.778571,0.778571,0.506521,0.664286,18,10,0,0,322,-0.146788,-1.0,124,0.197802,0.258661,0.213807,0.296076,0.313253,0.184782,0.889080,-0.216391,-0.355411,-0.273334,0.090000,0.140963,0.191777,0.217100,0.218752,0.305545,0.121495,0.000724,-0.088744,0.662118,1.315151,1.725239,1.965109,2.259516,0.861065,-0.009840,-0.002533,0.021776,0.025943,-0.000772,0.023161,0.149822,-0.041795,-0.016614,-0.154514,-0.004945,-0.004142,0.005641,0.009154,0.003745,0.025362
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-01-31,281.940002,282.290009,269.579987,270.299988,270.299988,0.527912,265.070007,31,0,16,2,31,-0.019349,-1.0,3950,-0.030488,-0.050213,-0.015659,0.034443,0.187714,0.271880,0.421734,0.356111,0.431825,0.537718,-0.015480,-0.027336,-0.019224,0.001334,0.101515,0.229650,0.313142,0.343197,0.388378,-0.032583,-0.158781,-0.140899,-0.276316,-0.129546,0.218584,0.001763,-0.009067,-0.015624,-0.003821,-0.002037,0.029044,0.053227,0.095539,0.125971,0.141531,0.000881,-0.002457,-0.005467,-0.005072,0.002988,0.025477
2018-02-01,266.410004,271.950012,263.380005,265.070007,265.070007,0.500859,267.429993,1,1,16,3,32,0.008903,0.0,3951,-0.019349,-0.049247,-0.068590,-0.017167,0.059052,0.220171,0.392028,0.325284,0.414839,0.567163,-0.009769,-0.023287,-0.034958,-0.025662,0.065369,0.192360,0.280986,0.311580,0.355947,-0.094837,-0.142996,-0.225222,-0.288201,-0.287954,-0.008567,-0.001267,0.000494,-0.010323,-0.005467,-0.005606,0.020497,0.045383,0.090745,0.124500,0.156109,-0.000634,-0.000258,-0.005641,-0.005537,0.000279,0.022662
2018-02-02,263.000000,270.619995,262.709991,267.429993,267.429993,0.492602,254.259995,2,1,16,4,33,-0.049247,-1.0,3952,0.008903,-0.010618,-0.040782,-0.026111,0.023460,0.208832,0.429954,0.336281,0.497788,0.560268,0.004432,-0.000635,-0.021256,-0.020141,0.060785,0.188645,0.285423,0.316823,0.362442,-0.029022,-0.102235,-0.245987,-0.264721,-0.331369,-0.071128,-0.022059,-0.023298,-0.021576,-0.038557,-0.027030,-0.008503,0.031212,0.064884,0.101790,0.129456,-0.011152,-0.015234,-0.019931,-0.023705,-0.021199,-0.000905
2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,0.530336,265.720001,5,1,16,0,36,0.045072,1.0,3953,-0.049247,-0.040782,-0.059341,-0.106574,-0.057249,0.147745,0.346573,0.297973,0.417438,0.486901,-0.025245,-0.030479,-0.048328,-0.060806,-0.000721,0.119256,0.216433,0.246866,0.290784,0.131900,0.162913,0.084091,-0.001744,-0.120727,0.189539,-0.041365,-0.062511,-0.063699,-0.072188,-0.067636,-0.046074,-0.010930,0.021344,0.061017,0.072047,-0.021119,-0.035317,-0.046532,-0.054923,-0.058640,-0.041894


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,NextDayClose,Day,Month,Year,DayofWeek,DayofYear,NextDayPrcntChange,NextDayPrcntChangeBinned,S.No.,PCntChng1Days_Ntflx,PCntChng2Days_Ntflx,PCntChng3Days_Ntflx,PCntChng5Days_Ntflx,PCntChng7Days_Ntflx,PCntChng14Days_Ntflx,PCntChng30Days_Ntflx,PCntChng60Days_Ntflx,PCntChng90Days_Ntflx,PCntChng120Days_Ntflx,CloseRolled2Day_DiffPer_Ntflx,CloseRolled3Day_DiffPer_Ntflx,CloseRolled5Day_DiffPer_Ntflx,CloseRolled7Day_DiffPer_Ntflx,CloseRolled14Day_DiffPer_Ntflx,CloseRolled30Day_DiffPer_Ntflx,CloseRolled60Day_DiffPer,CloseRolled90Day_DiffPer,CloseRolled120Day_DiffPer,VolumeRolled2Day_DiffPer,VolumeRolled3Day_DiffPer,VolumeRolled5Day_DiffPer,VolumeRolled7Day_DiffPer,VolumeRolled14Day_DiffPer,VolumeRolled30Day_DiffPer,PCntChng1Days_SP500,PCntChng2Days_SP500,PCntChng3Days_SP500,PCntChng5Days_SP500,PCntChng7Days_SP500,PCntChng14Days_SP500,PCntChng30Days_SP500,PCntChng60Days_SP500,PCntChng90Days_SP500,PCntChng120Days_SP500,CloseRolled2Day_DiffPer_SP500,CloseRolled3Day_DiffPer_SP500,CloseRolled5Day_DiffPer_SP500,CloseRolled7Day_DiffPer_SP500,CloseRolled14Day_DiffPer_SP500,CloseRolled30Day_DiffPer_SP500
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1
2002-11-20,0.674286,0.727857,0.660714,0.717143,0.717143,0.506857,0.785714,20,10,0,2,324,0.095617,1.0,126,0.079570,-0.078898,0.103297,0.118040,0.193818,0.119287,0.923373,-0.179067,-0.363348,-0.364557,0.038263,-0.003968,0.045834,0.080068,0.110269,0.163294,0.044546,-0.066394,-0.153843,-0.473064,-0.374109,-0.017351,0.244245,0.800678,0.592795,0.012286,0.009938,0.000000,0.031931,0.036139,0.031116,0.170463,-0.026089,0.011726,-0.123022,0.006105,7.379675e-03,0.005895,0.012894,0.010743,0.026460
2002-11-25,0.792857,0.796429,0.753571,0.792857,0.792857,0.400415,0.749286,25,10,0,0,329,-0.054954,-1.0,129,0.008174,0.009091,0.105577,0.018349,0.281756,0.217104,0.483956,-0.153319,-0.316503,-0.329305,0.004070,0.005738,0.058150,0.072464,0.175047,0.229372,0.164417,0.047840,-0.054448,0.030074,-0.276382,-0.589120,-0.524022,-0.269442,-0.194098,0.003206,-0.002974,0.026460,0.036661,0.034035,0.021768,0.110086,0.023104,0.107075,-0.092245,0.001600,7.097507e-05,0.012882,0.018142,0.030783,0.040994
2002-12-02,0.742857,0.796429,0.724286,0.761429,0.761429,0.277413,0.770714,2,11,0,0,336,0.012194,1.0,133,-0.017511,0.013308,0.016206,-0.031789,0.061753,0.267540,0.115064,-0.167187,-0.265334,-0.303267,-0.008833,-0.001561,-0.005968,-0.013354,0.060697,0.134043,0.136300,0.027900,-0.078333,0.366016,0.502618,-0.048156,-0.449860,-0.658264,-0.572794,0.003732,-0.001803,0.025605,0.005877,0.029193,0.066387,0.060620,0.049036,0.118997,-0.082464,0.001863,6.377946e-04,0.005942,0.005033,0.023406,0.035534
2002-12-03,0.750000,0.789286,0.739286,0.770714,0.770714,0.199899,0.710000,3,11,0,1,337,-0.078776,-1.0,134,0.012194,-0.005530,0.025664,-0.027928,-0.019091,0.333745,0.210998,-0.145009,-0.115575,-0.288259,0.006060,0.002167,0.012005,0.001458,0.053417,0.140270,0.153907,0.042008,-0.064147,-0.265990,-0.148994,-0.161628,-0.497020,-0.799363,-0.736386,-0.013067,-0.009384,-0.014846,-0.010439,-0.013381,0.045936,0.030048,0.022452,0.083382,-0.084278,-0.006576,-7.513949e-03,-0.005119,-0.006189,0.006843,0.020988
2002-12-12,0.905714,0.914286,0.885714,0.900714,0.900714,0.310428,0.892857,12,11,0,3,346,-0.008723,0.0,141,0.003980,0.033606,0.126898,0.146364,0.168675,0.146364,0.404230,0.143246,0.184038,-0.079562,0.001986,0.012309,0.059485,0.097203,0.132610,0.250826,0.363464,0.221491,0.104203,-0.317882,-0.287308,-0.319498,-0.473946,-0.289480,-0.459007,-0.001759,0.001765,0.015206,-0.005259,-0.022605,-0.035684,0.015774,0.050926,0.052998,-0.090727,-0.000880,2.936858e-08,0.000264,-0.003152,-0.016069,-0.006438
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2017-12-29,192.509995,193.949997,191.220001,191.960007,191.960007,0.412314,201.070007,29,11,15,4,363,0.047458,1.0,3929,-0.003892,0.030713,0.022369,0.017708,0.026414,0.018139,-0.000833,0.040716,0.133577,0.243828,-0.001950,0.008705,0.011796,0.013325,0.016822,0.010642,-0.006267,0.019095,0.036646,-0.321661,-0.193516,-0.010819,-0.031568,-0.079441,-0.097197,-0.003484,-0.001634,-0.001003,-0.002523,-0.001040,0.005799,0.041328,0.054727,0.088462,0.102658,-0.001745,-1.708076e-03,-0.001678,-0.001634,-0.001186,0.009803
2018-01-11,214.289993,217.750000,213.350006,217.240005,217.240005,0.467728,221.229996,11,0,16,3,11,0.018367,1.0,3937,0.022210,0.037886,0.024475,0.056461,0.080420,0.150514,0.090672,0.071837,0.243218,0.152222,0.010983,0.019794,0.023645,0.033218,0.082300,0.122556,0.115409,0.132616,0.157873,0.125487,0.164241,0.183838,0.141474,0.206360,0.185224,0.007098,0.005496,0.007792,0.016411,0.026731,0.033830,0.050784,0.081852,0.114277,0.118849,0.003536,4.188760e-03,0.006071,0.009621,0.020522,0.030714
2018-01-19,222.750000,223.490005,218.500000,220.460007,220.460007,0.513240,227.580002,19,0,16,4,19,0.032296,1.0,3942,0.000590,0.013609,-0.004830,0.014822,0.053270,0.143999,0.196786,0.145425,0.213052,0.197892,0.000295,0.004694,0.001135,0.008107,0.040653,0.106740,0.120031,0.135581,0.166211,0.123752,0.134380,0.063099,0.167923,0.309940,0.569877,0.004373,0.002876,0.012872,0.015822,0.021405,0.047074,0.066040,0.095517,0.125907,0.136506,0.002182,2.412946e-03,0.005847,0.009686,0.021394,0.037328
2018-01-22,222.000000,227.789993,221.199997,227.580002,227.580002,0.586868,250.289993,22,0,16,0,22,0.099789,1.0,3943,0.032296,0.032905,0.046345,0.028703,0.070864,0.185559,0.228171,0.161004,0.229166,0.252780,0.015891,0.021500,0.027542,0.030527,0.061513,0.134457,0.153122,0.169416,0.201443,0.253247,0.455975,0.497326,0.652852,0.978749,1.484155,0.007930,0.012337,0.010828,0.017220,0.031145,0.059066,0.074250,0.102536,0.130778,0.145979,0.003949,6.729707e-03,0.010354,0.013243,0.025289,0.043041


Just printing out the columns here to cross-check their position in the data frame.

In [15]:
pd.set_option("display.max_rows",100)
pd.DataFrame(MyTrain.columns,range(len(MyTrain.columns)))

Unnamed: 0,0
0,Open
1,High
2,Low
3,Close
4,Adj Close
5,Volume
6,NextDayClose
7,Day
8,Month
9,Year


Now will be the important part where we'll prepare the inputs and make sure the input arrays have the right shape, as expected by *Keras* and *Tensorflow* based on our model.

In [80]:
MonthCodeInput = MyTrain['Month']
MonthCodeInput = np.array(MonthCodeInput.astype('int32'))
MonthCodeInput = np.reshape (MonthCodeInput,(-1,1))
print ('Month Code Input')
print (MonthCodeInput[:10])

YearCodeInput = MyTrain['Year']
YearCodeInput = np.array(YearCodeInput.astype('int32'))
YearCodeInput = np.reshape (YearCodeInput,(-1,1))
print ('Year Code Input')
print (YearCodeInput)
print (YearCodeInput[:10])

DayOfWeekCodeInput = MyTrain['DayofWeek']
DayOfWeekCodeInput = np.array(DayOfWeekCodeInput.astype('int32'))
DayOfWeekCodeInput = np.reshape (DayOfWeekCodeInput,(-1,1))
print ('Day of Week')
print (DayOfWeekCodeInput[:10])

#########################

ContFeaturesColumns = list([5,7,11]) + list(range(15, 18)) + list(range(25,35)) + list(range(35,37))+ list(range(40,43)) + list(range(50,53))
ContinuousFeatures = MyTrain.columns[ContFeaturesColumns]
ContinuousInput = MyTrain[ContinuousFeatures]
ContinuousInput = np.array(ContinuousInput)
ContinuousInput = np.asarray(ContinuousInput, dtype=np.float32)
ContinuousInput = np.reshape(ContinuousInput,(-1,len(ContinuousFeatures)))
print ('Continuous Input')
print(ContinuousInput[:3])



##### Gain/Loss One Hot Encode #####
print("Gain-Loss binned")
#print(MyTrain['NextDayPrcntChangeBinned'][:10])
GainLossOutput = pd.Series(list(np.reshape(MyTrain['NextDayPrcntChangeBinned'],-1)))
GainLossOutputCodes = pd.get_dummies(GainLossOutput)
GainLossOutputCodes = np.array(GainLossOutputCodes)
print ('Gain loss one hot codes')
print (GainLossOutputCodes[:10])


Month Code Input
[[10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]]
Year Code Input
[[ 0]
 [ 0]
 [ 0]
 ...
 [16]
 [16]
 [16]]
[[0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]]
Day of Week
[[1]
 [2]
 [3]
 [4]
 [0]
 [1]
 [3]
 [4]
 [1]
 [2]]
Continuous Input
[[ 2.89745629e-01  1.20000000e+01  3.16000000e+02 -3.80497202e-02
  -5.38009591e-02 -2.53012106e-02 -1.93938259e-02 -3.11375782e-02
  -4.05598283e-02 -6.67438656e-02 -8.27666670e-02  9.10538435e-03
  -1.91000178e-01 -2.79016286e-01 -3.35673839e-01  7.10382521e-01
  -1.21773288e-01 -3.59446615e-01  6.23091683e-03 -9.14769061e-03
  -1.99712794e-02  3.10578244e-03 -1.01224461e-03 -1.35495095e-02]
 [ 3.44991684e-01  1.30000000e+01  3.17000000e+02  1.10013381e-01
   6.77776784e-02  5.02935909e-02  5.21387123e-02  5.73005490e-02
   6.07143007e-02  4.31469940e-02  1.10179493e-02  1.16036959e-01
  -9.55628008e-02 -1.93895340e-01 -2.58550316e-01  1.91840947e-01
   6.73272133e-01  2.14608982e-01 -2.13919161e-03  4.07839613e-03
  -1.1

Now we'll build our model.

In [43]:
month_input = Input(shape=(1,), dtype='int32', name='month_input')
year_input = Input(shape = (1,), dtype='int32', name = 'year_input')
dayofweek_input = Input(shape = (1,), dtype='int32', name = 'dayofweek_input')


Month_x = Embedding(output_dim=2, input_dim=(MonthCodeInput.max()+1), input_length=1)(month_input)
Month_x = Flatten()(Month_x)
Year_x = Embedding(output_dim=2, input_dim=(YearCodeInput.max()+1), input_length=1)(year_input)
Year_x = Flatten()(Year_x)
DayOfWeek_x = Embedding(output_dim=(DayOfWeekCodeInput.max()+1), input_dim=(DayOfWeekCodeInput.max()+1), 
                        input_length=1)(dayofweek_input)
DayOfWeek_x = Flatten()(DayOfWeek_x)

continuous_input = Input(shape=(len(ContinuousFeatures),), dtype='float32', name='cont_input')



x_Concat = concatenate([DayOfWeek_x, continuous_input])


x_Concat = Dense(10, activation='sigmoid')(x_Concat)

x_Concat = Dense(10, activation='sigmoid')(x_Concat)


main_output = Dense(3, name='main_output', activation = 'softmax')(x_Concat)


model = Model(inputs=[month_input, year_input, dayofweek_input, continuous_input], outputs = [main_output])   


In [44]:
display(model.summary())

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
dayofweek_input (InputLayer)    (None, 1)            0                                            
__________________________________________________________________________________________________
embedding_15 (Embedding)        (None, 1, 5)         25          dayofweek_input[0][0]            
__________________________________________________________________________________________________
flatten_15 (Flatten)            (None, 5)            0           embedding_15[0][0]               
__________________________________________________________________________________________________
cont_input (InputLayer)         (None, 24)           0                                            
__________________________________________________________________________________________________
concatenat

None

### Train the model

In [45]:
model.compile(optimizer='adadelta',
              loss={'main_output': 'categorical_crossentropy'}, metrics=['accuracy'])
history = model.fit({'month_input': MonthCodeInput, 'year_input': YearCodeInput, 
                     'dayofweek_input': DayOfWeekCodeInput, 'cont_input': ContinuousInput},
          {'main_output': GainLossOutputCodes},
         epochs=5, batch_size=1, verbose=1, validation_split=0.2, shuffle = True)

Train on 2510 samples, validate on 628 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


### Generate Test features

In [79]:
MonthCodeInput_Test = MyTest['Month']
MonthCodeInput_Test = np.array(MonthCodeInput_Test.astype('int32'))
MonthCodeInput_Test = np.reshape (MonthCodeInput_Test,(-1,1))
print ('Month Code Input')
print (MonthCodeInput_Test[:10])

YearCodeInput_Test = MyTest['Year']
YearCodeInput_Test = np.array(YearCodeInput_Test.astype('int32'))
YearCodeInput_Test = np.reshape (YearCodeInput_Test,(-1,1))
print ('Year Code Input')
print (YearCodeInput_Test[:10])

DayOfWeekCodeInput_Test = MyTest['DayofWeek']
DayOfWeekCodeInput_Test = np.array(DayOfWeekCodeInput_Test.astype('int32'))
DayOfWeekCodeInput_Test = np.reshape (DayOfWeekCodeInput_Test,(-1,1))
print ('Day of Week')
print (DayOfWeekCodeInput_Test[:10])

#########################

ContinuousFeatures_Test = MyTest.columns[ContFeaturesColumns]
ContinuousInput_Test = MyTest[ContinuousFeatures_Test]
ContinuousInput_Test = np.array(ContinuousInput_Test)
ContinuousInput_Test = np.asarray(ContinuousInput_Test, dtype=np.float32)
ContinuousInput_Test = np.reshape(ContinuousInput_Test,(-1,len(ContinuousFeatures_Test)))
print ('Continuous Input')
print(ContinuousInput_Test[:3])


Month Code Input
[[10]
 [10]
 [11]
 [11]
 [11]
 [11]
 [ 0]
 [ 0]
 [ 1]
 [ 1]]
Year Code Input
[[0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]]
Day of Week
[[2]
 [0]
 [0]
 [1]
 [3]
 [2]
 [4]
 [4]
 [2]
 [0]]
Continuous Input
[[ 5.06857395e-01  2.00000000e+01  3.24000000e+02  7.95696452e-02
  -7.88983926e-02  1.03296921e-01  3.82625535e-02 -3.96805536e-03
   4.58336733e-02  8.00678656e-02  1.10268801e-01  1.63293839e-01
   4.45459783e-02 -6.63937256e-02 -1.53843284e-01 -4.73064393e-01
  -3.74109477e-01 -1.73505023e-02  1.22855678e-02  9.93817393e-03
   0.00000000e+00  6.10528048e-03  7.37967482e-03  5.89503348e-03]
 [ 4.00415301e-01  2.50000000e+01  3.29000000e+02  8.17365572e-03
   9.09109414e-03  1.05577268e-01  4.07019397e-03  5.73826628e-03
   5.81503063e-02  7.24635720e-02  1.75047114e-01  2.29371741e-01
   1.64416879e-01  4.78396192e-02 -5.44475354e-02  3.00740525e-02
  -2.76381910e-01 -5.89120388e-01  3.20575968e-03 -2.97373603e-03
   2.64596324e-02  1.60031475e-03  7.09750748e-0

Model will now be used to make predictions on the *Test* features.

In [47]:
PredictedValues = model.predict({'month_input': MonthCodeInput_Test, 'year_input': YearCodeInput_Test,
                                 'dayofweek_input': DayOfWeekCodeInput_Test, 'cont_input': ContinuousInput_Test},
          batch_size=100)
print(PredictedValues)

[[0.3210102  0.31155545 0.36743438]
 [0.3210102  0.31155545 0.36743438]
 [0.32101014 0.31155542 0.36743438]
 ...
 [0.31705022 0.2916461  0.39130366]
 [0.3166138  0.29038483 0.39300138]
 [0.31633314 0.28956643 0.3941004 ]]


Create a new dataframe to store the predicted and actual values.

In [75]:
PredictedValues_DF = pd.DataFrame(PredictedValues)
PredictedValues_DF.columns = ['PredictedValue_-1', 'PredictedValue_0', 'PredictedValue_+1']
PredictedValues_DF['S.No'] = range(len(MyTest))
PredictedValues_DF['ActualValues'] = np.array(MyTest['NextDayPrcntChangeBinned'])
PredictedValues_DF['NextDayPrcntChange'] = np.array(50*(abs(MyTest['NextDayPrcntChange'])))
display(PredictedValues_DF.head())

Unnamed: 0,PredictedValue_-1,PredictedValue_0,PredictedValue_+1,S.No,ActualValues,NextDayPrcntChange
0,0.32101,0.311555,0.367434,0,1.0,4.780846
1,0.32101,0.311555,0.367434,1,-1.0,2.747721
2,0.32101,0.311555,0.367434,2,1.0,0.609709
3,0.32101,0.311555,0.367434,3,-1.0,3.938815
4,0.32101,0.311555,0.367434,4,0.0,0.436154


In [76]:
print (PredictedValues_DF['ActualValues'].values[:10])

[ 1. -1.  1. -1.  0. -1.  1. -1. -1.  1.]


In [74]:
output_notebook()
circle_plot_data = {'x_values': PredictedValues_DF['S.No'], 'y_values': PredictedValues_DF['PredictedValue_+1'], 'size': PredictedValues_DF['NextDayPrcntChange']+3,
                   'circle_color': [str(x) for x in PredictedValues_DF['ActualValues'].values]}
source = ColumnDataSource(data=circle_plot_data)

circle_plot = figure(plot_width=750)
#circle_plot.circle(x = 'x_values', y = 'y_values', size='size', color = factor_cmap('circle_color', palette=Spectral3, factors=['0.', '-1.','1.']), source=source)
circle_plot.circle(x = 'x_values', y = 'y_values', size='size', color = factor_cmap('circle_color', palette=['red','green','blue'], factors=['0.0', '-1.0','1.0']), legend = 'circle_color', source=source)
#circle_plot.circle(x = 'x_values', y = 'y_values', source=source)

show(circle_plot)


Make plots to check how good the predictions are.

Plot made from the predicted values shows that most of the times the model isn't extremely sure which way the price would shift on the next day. But when it *is* sure, it does predict the right target class most of the times. Most importantly, it almost never misses large price movements.