## LSTM model ##

- This is an exploratory notebook to find the best hyperparameters for the model so that it is not overfitting neither is it underfitting

#### Fourier transforms####

- eliminate noise and create approximations of the real stock movement.
- creates a series of sine waves with different amplitudes and frames. Wen combined, these sine waves approximate the original function. 
- we use them to extract global and local trends in the stock price.

#### Stacked autoencoders ####

- to find the missing correlations, we can find new types of features that affect stock movements. 
- Autoencode is a neural network model that seeks to learn a compressed representation of an input.
    - They are unsupervised but are trained with supervised learning methods. They are typically trained as part of a broader model that attemps to recreate the input.
    - They have a design with a bottleneck at the midpoint from which the reconstruction of the input data is performed. 
    - The most commong use isa learned or automatic feature extraction model.
    - We can take the data up until the bottleneck and use that as a fixed length vector that provides a compressed representation of hte input data, can be used for supervised learning model, visualization or more generally for dimensionality reduction.
  #### LSTM Autoencoder:
      - Implementation of autoencoder for sequence data usin and Encoder- Decoder LSTM architecture.
      - an encoder-decoder LSTM is configured to read the input sequence, encode it, decode it, and recreate it. The performance is evaluated based on the model's ability to recreate the input sequence.
      - When the model has a desired level of performance, the decoder part of the model may be removed and we will have just hte encoder part. 
      - Composite model: two decoders used to predict the next frame in the sequence and one to reconstruct frames in the sequence. 

##### steps:
- perform statistical checks for the 'quality' of the data. Make sure the data does not have heteroskedasticity, multicollinearity, or serial correlation.
- create feature importance, use XGBoost to do that.
- Heteroskedasticity: when the difference between predicted value by a regression and the real value is dependent on the data i.e. error grows when the data point grows along the x-axis.
- Serial Correlation: when one data is a formula of another feature.

##### PCA:
- reduce the dimensionality of the features created from autoencoders, using eigen portfolios



In [1]:
# To be crystal clear, the top of your code file must have the following 4 lines before any others;
from numpy.random import seed
seed(42)
from tensorflow import set_random_seed
set_random_seed(42)

In [2]:
!pip install python-decouple

[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [3]:
!pip install alpha_vantage

[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [4]:
!pip install intrinio-sdk

[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [5]:
!pip install quandl

[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [6]:
!pip install xgboost

[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [7]:
import pandas as pd
import numpy as np
import quandl
import datetime
from decouple import config
import math
# sklearn imports
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.metrics import mean_squared_error, accuracy_score
from sklearn.preprocessing import MinMaxScaler
import xgboost as xgb
import matplotlib.pyplot as plt
# keras imports
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.callbacks import ModelCheckpoint
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from keras.layers import RepeatVector, TimeDistributed
# notebook imports
from fin_data_fundamentals import find_fundamentals
from fin_data_fundamentals import get_fundamentals
from alpha_vantage.foreignexchange import ForeignExchange
from alpha_vantage.techindicators import TechIndicators
from alpha_vantage.timeseries import TimeSeries
from fin_data import DailyTimeSeries
from fracdiff import FractionalDifferentiation as fd

Using TensorFlow backend.


In [8]:
from fin_data import DailyTimeSeries
import numpy as np
import pandas as pd
import os, sys
import matplotlib.pyplot as plt

class HiddenPrints:
    def __enter__(self):
        self._original_stdout = sys.stdout
        sys.stdout = open(os.devnull, 'w')

    def __exit__(self, exc_type, exc_val, exc_tb):
        sys.stdout.close()
        sys.stdout = self._original_stdout


pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

### Let's pull in the data:

In [9]:
# We will experiment with Tesla stock

visa = DailyTimeSeries('V')

df = visa.initiate()

################################################################### 
 Ticker:  V 
 Last Refreshed:  2019-09-23 09:51:12 
 Data Retrieved:  Daily Time Series with Splits and Dividend Events 
 ###################################################################


In [10]:
df.head()

Unnamed: 0_level_0,V_open,V_high,V_low,V_close,V_adjusted_close,V_volume,V_dividend_amount
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-03-19,59.5,69.0,55.0,56.5,11.1997,708486000.0,0.0
2008-03-20,58.4,65.0,57.5,64.35,12.7558,198985200.0,0.0
2008-03-24,67.24,67.7,59.0,59.73,11.84,149566400.0,0.0
2008-03-25,60.58,64.25,59.82,63.25,12.5377,87092000.0,0.0
2008-03-26,62.73,64.48,61.57,63.96,12.6785,43111600.0,0.0


plt.figure(figsize=(20,12))

plt.plot(df['TSLA close'], color = 'teal')
plt.plot(df['5. adjusted close'], color = 'red');

#### Adding Indicators ##

Add indicators, technical, securities, and macro to the dataframe

In [11]:
# add the fundamentals,

fund_list = ["operatingrevenue", "totalrevenue", "netincome", "totaloperatingexpenses", "totalgrossprofit", "totaloperatingincome", "totalpretaxincome", "weightedavebasicdilutedsharesos", "cashdividendspershare", "totalcostofrevenue"]

# not a complete list of fundamentals that I can pull in from the API, but this is good so far.

In [12]:
df_fund = visa.add_fundamentals(df, fund_list)

################################################################### 
 Ticker:  V 
 Fundamentals Retrieved:  ['V_open' 'V_high' 'V_low' 'V_close' 'V_adjusted_close' 'V_volume'
 'V_dividend_amount' 'V_operatingrevenue' 'V_totalrevenue' 'V_netincome'
 'V_totaloperatingexpenses' 'V_totalgrossprofit' 'V_totaloperatingincome'
 'V_totalpretaxincome' 'V_weightedavebasicdilutedsharesos'
 'V_cashdividendspershare' 'V_totalcostofrevenue'] 
 ###################################################################
################################################################### 
 Ticker:  V 
 Retrieved Data Start Date:  2009-07-30 
 Retrieved Data End Date:  2019-07-26 
 Data Retrieved:  ['V_operatingrevenue', 'V_totalrevenue', 'V_netincome', 'V_totaloperatingexpenses', 'V_totalgrossprofit', 'V_totaloperatingincome', 'V_totalpretaxincome', 'V_weightedavebasicdilutedsharesos', 'V_cashdividendspershare', 'V_totalcostofrevenue'] 
 ###################################################################


In [13]:
df_fund.head()

Unnamed: 0_level_0,V_open,V_high,V_low,V_close,V_adjusted_close,V_volume,V_dividend_amount,V_operatingrevenue,V_totalrevenue,V_netincome,V_totaloperatingexpenses,V_totalgrossprofit,V_totaloperatingincome,V_totalpretaxincome,V_weightedavebasicdilutedsharesos,V_cashdividendspershare,V_totalcostofrevenue
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2008-03-19,59.5,69.0,55.0,56.5,11.1997,708486000.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-03-20,58.4,65.0,57.5,64.35,12.7558,198985200.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-03-24,67.24,67.7,59.0,59.73,11.84,149566400.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-03-25,60.58,64.25,59.82,63.25,12.5377,87092000.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-03-26,62.73,64.48,61.57,63.96,12.6785,43111600.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0


In [14]:
# Add data with all the indicators:

def tech_company_data(ticker):
    """
    Containes Tech Index
    """    
    dts = DailyTimeSeries(ticker)
    df = dts.initiate()
    with HiddenPrints():
        df = dts.add_securities(['XLK', 'vix', 'SPX'], primary_df=df)
        df = dts.add_technicals(['SMA', 'EMA', 'MACD', 'STOCH', 
                                   'RSI', 'ADX', 'CCI', 'BBANDS', 
                                   'AD', 'OBV'], 
                                  primary_df=df)
        df = dts.add_macro(primary_df=df, 
                             indices=['housing_index', 'confidence_index', 'trade_index', 'longterm_rates'])
    
    return df

In [15]:
df = tech_company_data('V')


################################################################### 
 Ticker:  V 
 Last Refreshed:  2019-09-23 09:51:16 
 Data Retrieved:  Daily Time Series with Splits and Dividend Events 
 ###################################################################




In [16]:
df.head()

Unnamed: 0_level_0,V_open,V_high,V_low,V_close,V_adjusted_close,V_volume,V_dividend_amount,XLK_open,XLK_high,XLK_low,XLK_close,XLK_adjusted_close,XLK_volume,XLK_dividend_amount,XLK_split_coefficient,vix_open,vix_high,vix_low,vix_close,vix_adjusted_close,vix_volume,vix_dividend_amount,vix_split_coefficient,SPX_open,SPX_high,SPX_low,SPX_close,SPX_adjusted_close,SPX_volume,SPX_dividend_amount,SPX_split_coefficient,V_SMA,V_EMA,V_MACD,V_MACD_Hist,V_MACD_Signal,V_SlowD,V_SlowK,V_RSI,V_ADX,V_CCI,V_Real Lower Band,V_Real Upper Band,V_Real Middle Band,V_Chaikin A/D,V_OBV,housing_index,conf_index,conf_index_SE,trade_value,10 Yrs Rates,20-Yr Maturity Rate
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1
2008-05-14,83.2949,84.35,81.45,82.23,16.3001,51400400.0,0.0,25.02,25.31,24.94,24.95,20.7378,4226000.0,0.0,1.0,17.98,17.98,16.74,17.66,17.66,0.0,0.0,1.0,1405.65,1420.1899,1405.65,1408.66,1408.66,3979370000.0,0.0,1.0,79.032,79.3058,4.8159,-0.3851,5.2011,35.8379,34.7584,65.1019,43.8072,40.6145,65.4179,92.6461,79.032,-319488300.0,1489803000.0,167.328,63.53,2.61,96.2716,4.49,4.63
2008-05-15,82.53,82.93,81.11,82.25,16.304,36688800.0,0.0,24.97,25.48,24.94,25.43,21.1368,2793000.0,0.0,1.0,17.65,17.84,16.25,16.3,16.3,0.0,0.0,1.0,1408.36,1424.4,1406.87,1423.5699,1423.5699,3836480000.0,0.0,1.0,79.7925,79.5862,4.4788,-0.5779,5.0566,35.1165,40.1106,65.1223,42.0522,28.9602,67.2889,92.2961,79.7925,-310215300.0,1526492000.0,167.328,63.53,2.61,96.2716,4.41,4.55
2008-05-16,82.71,82.85,81.26,82.37,16.3278,34066800.0,0.0,25.5,25.5,25.15,25.37,21.0869,3631700.0,0.0,1.0,16.3,17.92,16.3,16.47,16.47,0.0,0.0,1.0,1423.89,1425.8199,1414.35,1425.35,1425.35,3842590000.0,0.0,1.0,80.461,79.8513,4.1731,-0.7068,4.8799,41.7123,50.2678,65.2512,40.3849,24.491,68.9465,91.9755,80.461,-296717100.0,1560558000.0,167.328,63.53,2.61,96.2716,4.44,4.57
2008-05-19,82.27,84.74,82.15,83.32,16.5161,45473200.0,0.0,25.482,25.69,25.26,25.5,21.195,4212200.0,0.0,1.0,16.47,17.89,15.82,17.01,17.01,0.0,0.0,1.0,1425.28,1440.24,1421.63,1426.63,1426.63,3683970000.0,0.0,1.0,81.0785,80.1817,3.9619,-0.7344,4.6963,50.0598,59.8011,66.2894,39.0943,36.9569,70.3697,91.7873,81.0785,-301106400.0,1606032000.0,167.328,63.53,2.61,96.2716,4.42,4.55
2008-05-20,82.88,83.84,81.91,82.74,16.4012,40702000.0,0.0,25.25,25.25,24.84,25.1,20.8625,3115200.0,0.0,1.0,17.02,18.42,17.02,17.58,17.58,0.0,0.0,1.0,1424.49,1424.49,1409.09,1413.4,1413.4,3854320000.0,0.0,1.0,81.7015,80.4253,3.705,-0.7931,4.4981,56.2321,58.6274,65.0406,37.8194,22.8959,72.1959,91.2071,81.7015,-306800500.0,1565330000.0,167.328,63.53,2.61,96.2716,4.38,4.52


In [17]:

df.shape

(2698, 52)

In [18]:
df = tech_company_data('V')

################################################################### 
 Ticker:  V 
 Last Refreshed:  2019-09-23 09:51:32 
 Data Retrieved:  Daily Time Series with Splits and Dividend Events 
 ###################################################################




In [19]:
df.tail()

Unnamed: 0_level_0,V_open,V_high,V_low,V_close,V_adjusted_close,V_volume,V_dividend_amount,XLK_open,XLK_high,XLK_low,XLK_close,XLK_adjusted_close,XLK_volume,XLK_dividend_amount,XLK_split_coefficient,vix_open,vix_high,vix_low,vix_close,vix_adjusted_close,vix_volume,vix_dividend_amount,vix_split_coefficient,SPX_open,SPX_high,SPX_low,SPX_close,SPX_adjusted_close,SPX_volume,SPX_dividend_amount,SPX_split_coefficient,V_SMA,V_EMA,V_MACD,V_MACD_Hist,V_MACD_Signal,V_SlowD,V_SlowK,V_RSI,V_ADX,V_CCI,V_Real Middle Band,V_Real Upper Band,V_Real Lower Band,V_Chaikin A/D,V_OBV,housing_index,conf_index,conf_index_SE,trade_value,10 Yrs Rates,20-Yr Maturity Rate
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1
2019-01-25,139.0,139.9,137.97,138.67,138.0167,9756300.0,0.0,65.5,66.125,65.26,65.93,65.4863,10030800.0,0.0,1.0,18.43,18.46,17.31,17.42,17.42,0.0,0.0,1.0,2657.4399,2672.3799,2657.3301,2664.76,2664.76,3814080000.0,0.0,1.0,135.829,136.4451,0.9699,0.3524,0.6175,50.7652,46.269,53.7618,12.1218,79.0098,135.829,141.8311,129.8269,593209700.0,2768893000.0,205.073,54.48,4.3,126.8701,2.99,2.92
2019-01-28,137.51,137.51,134.81,135.99,135.3493,9857200.0,0.0,64.95,65.07,64.51,65.05,64.6122,9662400.0,0.0,1.0,18.56,20.42,18.42,18.87,18.87,0.0,0.0,1.0,2644.97,2644.97,2624.0601,2643.8501,2643.8501,3612810000.0,0.0,1.0,136.028,136.4017,0.8029,0.1483,0.6546,45.8219,45.4754,49.6072,11.8062,4.2198,136.028,141.7686,130.2874,591968400.0,2759036000.0,205.073,54.48,4.3,126.8701,2.99,2.92
2019-01-29,136.59,136.69,134.11,135.0,134.364,7457700.0,0.0,65.1863,65.19,64.25,64.35,63.9169,8573900.0,0.0,1.0,19.45,19.93,18.42,19.13,19.13,0.0,0.0,1.0,2644.8899,2650.9299,2631.05,2640.0,2640.0,3504200000.0,0.0,1.0,136.231,136.2682,0.584,-0.0564,0.6405,42.3339,35.2572,48.1601,11.6143,-28.8534,136.231,141.5058,130.9562,589656000.0,2751578000.0,205.073,54.48,4.3,126.8701,2.97,2.9
2019-01-30,136.1,137.895,135.51,137.6,136.9517,8078100.0,0.0,65.42,66.54,65.1298,66.35,65.9035,16995300.0,0.0,1.0,19.15,19.31,17.54,17.66,17.66,0.0,0.0,1.0,2653.6201,2690.4399,2648.3401,2681.05,2681.05,3857810000.0,0.0,1.0,136.514,136.3951,0.6133,-0.0218,0.635,37.892,32.9435,52.0288,11.2168,22.0824,136.514,141.4329,131.5951,595735700.0,2759656000.0,205.073,54.48,4.3,126.5041,2.98,2.9
2019-01-31,134.39,135.73,133.3,135.01,134.3739,20095700.0,0.0,66.01,66.64,65.8,66.28,65.8339,18250900.0,0.0,1.0,17.39,17.72,16.54,16.57,16.57,0.0,0.0,1.0,2685.49,2708.95,2678.6499,2704.1001,2704.1001,4917650000.0,0.0,1.0,136.6185,136.2631,0.4226,-0.17,0.5926,34.0176,33.8523,48.2528,11.1833,-79.8969,136.6185,141.3111,131.9259,603922900.0,2739561000.0,204.708,47.93,3.84,126.5041,2.91,2.83


In [20]:
# merge the df and df_fund data together

df_new = pd.merge(df, df_fund, on='date')

df_new.head()


Unnamed: 0_level_0,V_open_x,V_high_x,V_low_x,V_close_x,V_adjusted_close_x,V_volume_x,V_dividend_amount_x,XLK_open,XLK_high,XLK_low,XLK_close,XLK_adjusted_close,XLK_volume,XLK_dividend_amount,XLK_split_coefficient,vix_open,vix_high,vix_low,vix_close,vix_adjusted_close,vix_volume,vix_dividend_amount,vix_split_coefficient,SPX_open,SPX_high,SPX_low,SPX_close,SPX_adjusted_close,SPX_volume,SPX_dividend_amount,SPX_split_coefficient,V_SMA,V_EMA,V_MACD,V_MACD_Hist,V_MACD_Signal,V_SlowD,V_SlowK,V_RSI,V_ADX,V_CCI,V_Real Middle Band,V_Real Upper Band,V_Real Lower Band,V_Chaikin A/D,V_OBV,housing_index,conf_index,conf_index_SE,trade_value,10 Yrs Rates,20-Yr Maturity Rate,V_open_y,V_high_y,V_low_y,V_close_y,V_adjusted_close_y,V_volume_y,V_dividend_amount_y,V_operatingrevenue,V_totalrevenue,V_netincome,V_totaloperatingexpenses,V_totalgrossprofit,V_totaloperatingincome,V_totalpretaxincome,V_weightedavebasicdilutedsharesos,V_cashdividendspershare,V_totalcostofrevenue
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1
2008-05-14,83.2949,84.35,81.45,82.23,16.3001,51400400.0,0.0,25.02,25.31,24.94,24.95,20.7378,4226000.0,0.0,1.0,17.98,17.98,16.74,17.66,17.66,0.0,0.0,1.0,1405.65,1420.1899,1405.65,1408.66,1408.66,3979370000.0,0.0,1.0,79.032,79.3058,4.8159,-0.3851,5.2011,35.8379,34.7584,65.1019,43.8072,40.6145,79.032,92.6461,65.4179,-319488300.0,1489803000.0,167.328,63.53,2.61,96.2716,4.49,4.63,83.2949,84.35,81.45,82.23,16.3001,51400400.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-05-15,82.53,82.93,81.11,82.25,16.304,36688800.0,0.0,24.97,25.48,24.94,25.43,21.1368,2793000.0,0.0,1.0,17.65,17.84,16.25,16.3,16.3,0.0,0.0,1.0,1408.36,1424.4,1406.87,1423.5699,1423.5699,3836480000.0,0.0,1.0,79.7925,79.5862,4.4788,-0.5779,5.0566,35.1165,40.1106,65.1223,42.0522,28.9602,79.7925,92.2961,67.2889,-310215300.0,1526492000.0,167.328,63.53,2.61,96.2716,4.41,4.55,82.53,82.93,81.11,82.25,16.304,36688800.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-05-16,82.71,82.85,81.26,82.37,16.3278,34066800.0,0.0,25.5,25.5,25.15,25.37,21.0869,3631700.0,0.0,1.0,16.3,17.92,16.3,16.47,16.47,0.0,0.0,1.0,1423.89,1425.8199,1414.35,1425.35,1425.35,3842590000.0,0.0,1.0,80.461,79.8513,4.1731,-0.7068,4.8799,41.7123,50.2678,65.2512,40.3849,24.491,80.461,91.9755,68.9465,-296717100.0,1560558000.0,167.328,63.53,2.61,96.2716,4.44,4.57,82.71,82.85,81.26,82.37,16.3278,34066800.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-05-19,82.27,84.74,82.15,83.32,16.5161,45473200.0,0.0,25.482,25.69,25.26,25.5,21.195,4212200.0,0.0,1.0,16.47,17.89,15.82,17.01,17.01,0.0,0.0,1.0,1425.28,1440.24,1421.63,1426.63,1426.63,3683970000.0,0.0,1.0,81.0785,80.1817,3.9619,-0.7344,4.6963,50.0598,59.8011,66.2894,39.0943,36.9569,81.0785,91.7873,70.3697,-301106400.0,1606032000.0,167.328,63.53,2.61,96.2716,4.42,4.55,82.27,84.74,82.15,83.32,16.5161,45473200.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0
2008-05-20,82.88,83.84,81.91,82.74,16.4012,40702000.0,0.0,25.25,25.25,24.84,25.1,20.8625,3115200.0,0.0,1.0,17.02,18.42,17.02,17.58,17.58,0.0,0.0,1.0,1424.49,1424.49,1409.09,1413.4,1413.4,3854320000.0,0.0,1.0,81.7015,80.4253,3.705,-0.7931,4.4981,56.2321,58.6274,65.0406,37.8194,22.8959,81.7015,91.2071,72.1959,-306800500.0,1565330000.0,167.328,63.53,2.61,96.2716,4.38,4.52,82.88,83.84,81.91,82.74,16.4012,40702000.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0


Finally, the new dataframe has all the columns and indicators that we were initially planning on using. Let's move forward from here.

In [21]:
print('There are {} number of days in the dataset.'.format(df_new.shape[0]))

There are 2698 number of days in the dataset.


### Target & Split

We will use the adjusted close shift by 1 day for now.

We will then split the dataframe into X and y and further training and testing sets.

In [22]:
#df['percent_change'] = ((df['5. adjusted close_x'].shift(-1) - df['TSLA close']) / (df['TSLA close']))*100


In [23]:
#df['percent_change'].head()

In [29]:
df_new['target'] = (df_new['V_adjusted_close_x'].shift(-1))
                     # - df_new['TSLA close_x']) / (df_new['TSLA close_x']))*100


In [30]:
df_new['target'].head()

date
2008-05-14    16.3040
2008-05-15    16.3278
2008-05-16    16.5161
2008-05-19    16.4012
2008-05-20    16.0265
Name: target, dtype: float64

In [31]:
df_new.columns.tolist

<bound method IndexOpsMixin.tolist of Index(['V_open_x', 'V_high_x', 'V_low_x', 'V_close_x', 'V_adjusted_close_x',
       'V_volume_x', 'V_dividend_amount_x', 'XLK_open', 'XLK_high', 'XLK_low',
       'XLK_close', 'XLK_adjusted_close', 'XLK_volume', 'XLK_dividend_amount',
       'XLK_split_coefficient', 'vix_open', 'vix_high', 'vix_low', 'vix_close',
       'vix_adjusted_close', 'vix_volume', 'vix_dividend_amount',
       'vix_split_coefficient', 'SPX_open', 'SPX_high', 'SPX_low', 'SPX_close',
       'SPX_adjusted_close', 'SPX_volume', 'SPX_dividend_amount',
       'SPX_split_coefficient', 'V_SMA', 'V_EMA', 'V_MACD', 'V_MACD_Hist',
       'V_MACD_Signal', 'V_SlowD', 'V_SlowK', 'V_RSI', 'V_ADX', 'V_CCI',
       'V_Real Middle Band', 'V_Real Upper Band', 'V_Real Lower Band',
       'V_Chaikin A/D', 'V_OBV', 'housing_index', 'conf_index',
       'conf_index_SE', 'trade_value', '10 Yrs Rates', '20-Yr Maturity Rate',
       'V_open_y', 'V_high_y', 'V_low_y', 'V_close_y', 'V_adjusted_close_y

In [32]:
df_new.head()

Unnamed: 0_level_0,V_open_x,V_high_x,V_low_x,V_close_x,V_adjusted_close_x,V_volume_x,V_dividend_amount_x,XLK_open,XLK_high,XLK_low,XLK_close,XLK_adjusted_close,XLK_volume,XLK_dividend_amount,XLK_split_coefficient,vix_open,vix_high,vix_low,vix_close,vix_adjusted_close,vix_volume,vix_dividend_amount,vix_split_coefficient,SPX_open,SPX_high,SPX_low,SPX_close,SPX_adjusted_close,SPX_volume,SPX_dividend_amount,SPX_split_coefficient,V_SMA,V_EMA,V_MACD,V_MACD_Hist,V_MACD_Signal,V_SlowD,V_SlowK,V_RSI,V_ADX,V_CCI,V_Real Middle Band,V_Real Upper Band,V_Real Lower Band,V_Chaikin A/D,V_OBV,housing_index,conf_index,conf_index_SE,trade_value,10 Yrs Rates,20-Yr Maturity Rate,V_open_y,V_high_y,V_low_y,V_close_y,V_adjusted_close_y,V_volume_y,V_dividend_amount_y,V_operatingrevenue,V_totalrevenue,V_netincome,V_totaloperatingexpenses,V_totalgrossprofit,V_totaloperatingincome,V_totalpretaxincome,V_weightedavebasicdilutedsharesos,V_cashdividendspershare,V_totalcostofrevenue,new,target
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1
2008-05-14,83.2949,84.35,81.45,82.23,16.3001,51400400.0,0.0,25.02,25.31,24.94,24.95,20.7378,4226000.0,0.0,1.0,17.98,17.98,16.74,17.66,17.66,0.0,0.0,1.0,1405.65,1420.1899,1405.65,1408.66,1408.66,3979370000.0,0.0,1.0,79.032,79.3058,4.8159,-0.3851,5.2011,35.8379,34.7584,65.1019,43.8072,40.6145,79.032,92.6461,65.4179,-319488300.0,1489803000.0,167.328,63.53,2.61,96.2716,4.49,4.63,83.2949,84.35,81.45,82.23,16.3001,51400400.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.304,16.304
2008-05-15,82.53,82.93,81.11,82.25,16.304,36688800.0,0.0,24.97,25.48,24.94,25.43,21.1368,2793000.0,0.0,1.0,17.65,17.84,16.25,16.3,16.3,0.0,0.0,1.0,1408.36,1424.4,1406.87,1423.5699,1423.5699,3836480000.0,0.0,1.0,79.7925,79.5862,4.4788,-0.5779,5.0566,35.1165,40.1106,65.1223,42.0522,28.9602,79.7925,92.2961,67.2889,-310215300.0,1526492000.0,167.328,63.53,2.61,96.2716,4.41,4.55,82.53,82.93,81.11,82.25,16.304,36688800.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.3278,16.3278
2008-05-16,82.71,82.85,81.26,82.37,16.3278,34066800.0,0.0,25.5,25.5,25.15,25.37,21.0869,3631700.0,0.0,1.0,16.3,17.92,16.3,16.47,16.47,0.0,0.0,1.0,1423.89,1425.8199,1414.35,1425.35,1425.35,3842590000.0,0.0,1.0,80.461,79.8513,4.1731,-0.7068,4.8799,41.7123,50.2678,65.2512,40.3849,24.491,80.461,91.9755,68.9465,-296717100.0,1560558000.0,167.328,63.53,2.61,96.2716,4.44,4.57,82.71,82.85,81.26,82.37,16.3278,34066800.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.5161,16.5161
2008-05-19,82.27,84.74,82.15,83.32,16.5161,45473200.0,0.0,25.482,25.69,25.26,25.5,21.195,4212200.0,0.0,1.0,16.47,17.89,15.82,17.01,17.01,0.0,0.0,1.0,1425.28,1440.24,1421.63,1426.63,1426.63,3683970000.0,0.0,1.0,81.0785,80.1817,3.9619,-0.7344,4.6963,50.0598,59.8011,66.2894,39.0943,36.9569,81.0785,91.7873,70.3697,-301106400.0,1606032000.0,167.328,63.53,2.61,96.2716,4.42,4.55,82.27,84.74,82.15,83.32,16.5161,45473200.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.4012,16.4012
2008-05-20,82.88,83.84,81.91,82.74,16.4012,40702000.0,0.0,25.25,25.25,24.84,25.1,20.8625,3115200.0,0.0,1.0,17.02,18.42,17.02,17.58,17.58,0.0,0.0,1.0,1424.49,1424.49,1409.09,1413.4,1413.4,3854320000.0,0.0,1.0,81.7015,80.4253,3.705,-0.7931,4.4981,56.2321,58.6274,65.0406,37.8194,22.8959,81.7015,91.2071,72.1959,-306800500.0,1565330000.0,167.328,63.53,2.61,96.2716,4.38,4.52,82.88,83.84,81.91,82.74,16.4012,40702000.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.0265,16.0265


df.columns

In [33]:
df_new.columns = ['VISA open', 'VISA high', 'VISA low', 'VISA close',
       'VISA adj_close', 'VISA_vol', 'VISA_div',
       'VISA_coef', 'XLK open', 'XLK high', 'XLK low',
       'XLK close', 'XLK adj_close', 'XLK_vol',
       'XLK_div', 'XLK_coef', 'vix open',
       'vix high', 'vix low', 'vix close', 'vix adj_close',
       'vix_vol', 'vix_div', 'vix_coef',
       'SPX open', 'SPX high', 'SPX low', 'SPX close', 'SPX adj_close',
       'SPX vol', 'SPX_div', 'SPX_coef',
       'VISA_SMA', 'VISA_EMA', 'VISA_MACD_Signal', 'VISA_MACD_Hist',
       'VISA_MACD', 'VISA_SlowD', 'VISA_SlowK', 'VISA_RSI', 'VISA_ADX',
       'VISA_CCI', 'VISA_Real Middle Band', 'VISA_Real Upper Band',
       'VISA_Real Lower Band', 'VISA_Chaikin A/D', 'VISA_OBV', 'housing_index',
       'conf_index', 'conf_index_SE', 'trade_value', '10 Yrs Rates',
       '20-Yr Maturity Rate', 'VISA open_y', 'VISA high_y', 'VISA low_y',
       'VISA close_y', '5. adjusted close', '6. volume', '7. dividend amount',
       '8. split coefficient', 'VISA_operatingrevenue', 'VISA_totalrevenue',
       'VISA_netincome', 'VISA_totaloperatingexpenses',
       'VISA_totalgrossprofit', 'VISA_totaloperatingincome', "totalpretaxincome", "weightedavebasicdilutedsharesos", "cashdividendspershare", "totalcostofrevenue"]

In [34]:
df_new.head()

Unnamed: 0_level_0,VISA open,VISA high,VISA low,VISA close,VISA adj_close,VISA_vol,VISA_div,VISA_coef,XLK open,XLK high,XLK low,XLK close,XLK adj_close,XLK_vol,XLK_div,XLK_coef,vix open,vix high,vix low,vix close,vix adj_close,vix_vol,vix_div,vix_coef,SPX open,SPX high,SPX low,SPX close,SPX adj_close,SPX vol,SPX_div,SPX_coef,VISA_SMA,VISA_EMA,VISA_MACD_Signal,VISA_MACD_Hist,VISA_MACD,VISA_SlowD,VISA_SlowK,VISA_RSI,VISA_ADX,VISA_CCI,VISA_Real Middle Band,VISA_Real Upper Band,VISA_Real Lower Band,VISA_Chaikin A/D,VISA_OBV,housing_index,conf_index,conf_index_SE,trade_value,10 Yrs Rates,20-Yr Maturity Rate,VISA open_y,VISA high_y,VISA low_y,VISA close_y,5. adjusted close,6. volume,7. dividend amount,8. split coefficient,VISA_operatingrevenue,VISA_totalrevenue,VISA_netincome,VISA_totaloperatingexpenses,VISA_totalgrossprofit,VISA_totaloperatingincome,totalpretaxincome,weightedavebasicdilutedsharesos,cashdividendspershare,totalcostofrevenue
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1
2008-05-14,83.2949,84.35,81.45,82.23,16.3001,51400400.0,0.0,25.02,25.31,24.94,24.95,20.7378,4226000.0,0.0,1.0,17.98,17.98,16.74,17.66,17.66,0.0,0.0,1.0,1405.65,1420.1899,1405.65,1408.66,1408.66,3979370000.0,0.0,1.0,79.032,79.3058,4.8159,-0.3851,5.2011,35.8379,34.7584,65.1019,43.8072,40.6145,79.032,92.6461,65.4179,-319488300.0,1489803000.0,167.328,63.53,2.61,96.2716,4.49,4.63,83.2949,84.35,81.45,82.23,16.3001,51400400.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.304,16.304
2008-05-15,82.53,82.93,81.11,82.25,16.304,36688800.0,0.0,24.97,25.48,24.94,25.43,21.1368,2793000.0,0.0,1.0,17.65,17.84,16.25,16.3,16.3,0.0,0.0,1.0,1408.36,1424.4,1406.87,1423.5699,1423.5699,3836480000.0,0.0,1.0,79.7925,79.5862,4.4788,-0.5779,5.0566,35.1165,40.1106,65.1223,42.0522,28.9602,79.7925,92.2961,67.2889,-310215300.0,1526492000.0,167.328,63.53,2.61,96.2716,4.41,4.55,82.53,82.93,81.11,82.25,16.304,36688800.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.3278,16.3278
2008-05-16,82.71,82.85,81.26,82.37,16.3278,34066800.0,0.0,25.5,25.5,25.15,25.37,21.0869,3631700.0,0.0,1.0,16.3,17.92,16.3,16.47,16.47,0.0,0.0,1.0,1423.89,1425.8199,1414.35,1425.35,1425.35,3842590000.0,0.0,1.0,80.461,79.8513,4.1731,-0.7068,4.8799,41.7123,50.2678,65.2512,40.3849,24.491,80.461,91.9755,68.9465,-296717100.0,1560558000.0,167.328,63.53,2.61,96.2716,4.44,4.57,82.71,82.85,81.26,82.37,16.3278,34066800.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.5161,16.5161
2008-05-19,82.27,84.74,82.15,83.32,16.5161,45473200.0,0.0,25.482,25.69,25.26,25.5,21.195,4212200.0,0.0,1.0,16.47,17.89,15.82,17.01,17.01,0.0,0.0,1.0,1425.28,1440.24,1421.63,1426.63,1426.63,3683970000.0,0.0,1.0,81.0785,80.1817,3.9619,-0.7344,4.6963,50.0598,59.8011,66.2894,39.0943,36.9569,81.0785,91.7873,70.3697,-301106400.0,1606032000.0,167.328,63.53,2.61,96.2716,4.42,4.55,82.27,84.74,82.15,83.32,16.5161,45473200.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.4012,16.4012
2008-05-20,82.88,83.84,81.91,82.74,16.4012,40702000.0,0.0,25.25,25.25,24.84,25.1,20.8625,3115200.0,0.0,1.0,17.02,18.42,17.02,17.58,17.58,0.0,0.0,1.0,1424.49,1424.49,1409.09,1413.4,1413.4,3854320000.0,0.0,1.0,81.7015,80.4253,3.705,-0.7931,4.4981,56.2321,58.6274,65.0406,37.8194,22.8959,81.7015,91.2071,72.1959,-306800500.0,1565330000.0,167.328,63.53,2.61,96.2716,4.38,4.52,82.88,83.84,81.91,82.74,16.4012,40702000.0,0.0,0.0,1613000000.0,422000000.0,0.0,0.0,0.0,714000000.0,3516700000.0,0.0,0.0,16.0265,16.0265


In [35]:
df_new['target'] = df_new['VISA adj_close'].shift(-1)

In [36]:
X = df_new.drop(columns=['target', 'VISA close', 'VISA adj_close', 'VISA open', 'VISA high', 'VISA low'])
y = df_new[['target']].values

In [37]:
X.shape, y.shape

((2698, 66), (2698, 1))

In [38]:
def prep_data(train_cut, val_cut, X=X, y=y):
    
    scaler = MinMaxScaler()
    scaler.fit(X)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                        train_size=train_cut, 
                                                        shuffle=False)

    X_test, X_val, y_test, y_val = train_test_split(X_test, y_test,
                                                    train_size=val_cut,
                                                    shuffle=False)
    
    X_train = scaler.transform(X_train)
    X_test = scaler.transform(X_test)
    X_val = scaler.transform(X_val)
    
    return X_train, X_test, X_val, y_train, y_test, y_val

In [39]:
X_train, X_test, X_val, y_train, y_test, y_val = prep_data(train_cut=0.8, val_cut= 0.6)

In [40]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape, X_val.shape, y_val.shape

((2158, 66), (324, 66), (2158, 1), (324, 1), (216, 66), (216, 1))

##### Time Series Generator

In [41]:
train_data_generator = TimeseriesGenerator(X_train, y_train, 
                                           length=15,
                                           sampling_rate=1, 
                                           stride=1, 
                                           batch_size=6)

test_data_generator = TimeseriesGenerator(X_test, y_test, 
                                          length=15, 
                                          sampling_rate=1,
                                          stride=1,
                                          batch_size=6)

val_data_generator = TimeseriesGenerator(X_val, y_val, 
                                          length=15, 
                                          sampling_rate=1,
                                          stride=1,
                                          batch_size=6)

### Model 

In [42]:
# imports

from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.models import load_model
import h5py


In [47]:
model = Sequential()
model.add(LSTM(256, activation='relu', 
               return_sequences=True, 
               input_shape=(train_data_generator.length, 
                            X_train.shape[1])))
model.add(Dropout(0.2))
model.add(LSTM(144, return_sequences=True,
               activation='relu'))
model.add(Dropout(0.2))
model.add(LSTM(72, return_sequences=True,
               activation='relu'))
model.add(Dropout(0.2))
model.add(LSTM(64, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1))


In [48]:
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=100)
mc = ModelCheckpoint('update-weights-{epoch:02d}-{val_loss:.2f}.hdf5',
                      monitor='val_loss',
                      mode='min',
                      verbose=1,
                      save_best_only=True)


In [49]:

model.compile(optimizer='adam', loss='mean_squared_error')

W0923 13:53:10.017699 139672107575104 deprecation_wrapper.py:119] From /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.



In [50]:
history = model.fit_generator(train_data_generator,
                              epochs =200,
                              validation_data = test_data_generator,
                              verbose=1,
                              callbacks = [es, mc])

W0923 13:53:12.214563 139672107575104 deprecation.py:323] From /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0923 13:53:14.837783 139672107575104 deprecation_wrapper.py:119] From /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.



Epoch 1/200

Epoch 00001: val_loss improved from inf to 266.69756, saving model to update-weights-01-266.70.hdf5
Epoch 2/200

Epoch 00002: val_loss did not improve from 266.69756
Epoch 3/200

Epoch 00003: val_loss improved from 266.69756 to 50.76862, saving model to update-weights-03-50.77.hdf5
Epoch 4/200

Epoch 00004: val_loss did not improve from 50.76862
Epoch 5/200

Epoch 00005: val_loss did not improve from 50.76862
Epoch 6/200

Epoch 00006: val_loss did not improve from 50.76862
Epoch 7/200

Epoch 00007: val_loss did not improve from 50.76862
Epoch 8/200

Epoch 00008: val_loss did not improve from 50.76862
Epoch 9/200

Epoch 00009: val_loss did not improve from 50.76862
Epoch 10/200

Epoch 00010: val_loss did not improve from 50.76862
Epoch 11/200

Epoch 00011: val_loss did not improve from 50.76862
Epoch 12/200

Epoch 00012: val_loss did not improve from 50.76862
Epoch 13/200

Epoch 00013: val_loss did not improve from 50.76862
Epoch 14/200

Epoch 00014: val_loss did not improv

In [52]:
import pickle

In [53]:
# pickle the model:
with open('model_tesla.pickle', 'wb') as mod_f:
    pickle.dump(model, mod_f)
    
    

In [55]:
with open('model_tesla.pickle', 'rb') as mod_fr:
    model = pickle.load(mod_fr)
    

In [57]:
model.load_weights('model_tesla.pickle')

OSError: Unable to open file (file signature not found)

In [None]:
# plot the training
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()


In [None]:
plt.figure(figsize=(20,12))
y_pred = model.predict_generator(test_data_generator)
for i in range(30):
    y_pred = np.insert(y_pred, 0, [0], axis=0)
    
plt.plot(y_pred)
plt.plot(y_test);


## Shapley

In [None]:
from shap import DeepExplainer, summary_plot

In [None]:
def extract_data(generator):
    for i in np.arange(len(generator)):
        if i == 0:
            a, b = generator[i]
        else: 
            c, d = generator[i]

            a = np.vstack((a, c))
            b = np.vstack((b, d))
            
    return a, b

In [None]:
def individual_array(sample):
    
    # get the means for each sampling
    for i in np.arange(len(sample[0])):
        means = []
        for j in np.arange(len (sample)):
            means.append(sample[j][i])
        s_value = np.array(np.mean(means))

        if i == 0:
            final_array = s_value
        else:
            final_array = np.hstack((final_array, s_value))
            
    return final_array

In [None]:
def flatten_shap_values(shap_vals):
    
    #Pull the array out of the list. 
    sv = shap_vals[0]
    
    count = 0
    for sample in sv: 
        sample_array = individual_array(sample)
        if count == 0:
            final_array = sample_array
        else:
            final_array = np.vstack((final_array, sample_array))
        
        count +=1
        
    return final_array

In [None]:
a, b = extract_data(test_data_generator)

In [None]:
a.shape

In [None]:
b.shape

In [None]:
from shap import DeepExplainer, summary_plot

In [None]:
de = DeepExplainer(model = model, data=a)

shaps_vals = de.shap_values(a)

In [None]:
print(shaps_vals[0].shape)
print(shaps_vals[0][0].shape)
print(shaps_vals[0][0][0].shape)


In [None]:
summary_plot([shaps_vals[0][100]], feature_names=df.columns)

In [None]:
individual_array(shaps_vals[0][0])

In [None]:
summary_lot([flatten_shap_values(shaps_vals)],
            feature_names = (df.drop(columns=['target'])
                             .columns
                             .tolist()))


In [None]:
shap_df = pd.DataFrame(data=flatten_shap_values(shaps_vals),columns = df.drop(columns='target').columns.tolist())

In [None]:
shap_df.head()

In [None]:

np.save('shaps_vals', shaps_vals)
np.save('feature_gen_extracted', a)
np.save('target_gen_extracted', b)

In [None]:
import pca_concordance as pcac

In [None]:
pcac.feature_pca_concordance(a, shap_df)


In [None]:
import dill

In [None]:
shaps_vals[0]