In [2]:
import pandas as pd
import quandl

df = quandl.get('WIKI/GOOGL')

print(df.head())

              Open    High     Low    Close      Volume  Ex-Dividend  \
Date                                                                   
2004-08-19  100.01  104.06   95.96  100.335  44659000.0          0.0   
2004-08-20  101.01  109.08  100.50  108.310  22834300.0          0.0   
2004-08-23  110.76  113.48  109.05  109.400  18256100.0          0.0   
2004-08-24  111.24  111.60  103.57  104.870  15247300.0          0.0   
2004-08-25  104.76  108.00  103.88  106.000   9188600.0          0.0   

            Split Ratio  Adj. Open  Adj. High   Adj. Low  Adj. Close  \
Date                                                                   
2004-08-19          1.0  50.159839  52.191109  48.128568   50.322842   
2004-08-20          1.0  50.661387  54.708881  50.405597   54.322689   
2004-08-23          1.0  55.551482  56.915693  54.693835   54.869377   
2004-08-24          1.0  55.792225  55.972783  51.945350   52.597363   
2004-08-25          1.0  52.542193  54.167209  52.100830   53.1

Each column represents a different feature, for example 'High', 'Low' and 'Close' prices are all features. It's important to use meaningful features in a project. For example perhaps you might be looking for pattern recognition in equity prices but that doesn't necessarily mean you need all the fields, or columns above when you could choose just one or two. So it becomes necessary to look into the relationships between some of these columns. These features are also known as attributes, input, or predictor variables which can be used to ascertain a stock price prediction (called labels, predicted or output variables).

Within Deep Learning and some of the other algorithms the relationships between some of these attributes are explored but not with Regression.

In [15]:
import pandas as pd
import quandl

df = quandl.get('WIKI/GOOGL')

df = df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]

# creating a new attribute (High-Low Percentage) by combining features
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Close']) / df['Adj. Close'] * 100.0

# creating a daily percentage change variable
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

df_new = ['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']

print(df.head())

            Adj. Open  Adj. High   Adj. Low  Adj. Close  Adj. Volume  \
Date                                                                   
2004-08-19  50.159839  52.191109  48.128568   50.322842   44659000.0   
2004-08-20  50.661387  54.708881  50.405597   54.322689   22834300.0   
2004-08-23  55.551482  56.915693  54.693835   54.869377   18256100.0   
2004-08-24  55.792225  55.972783  51.945350   52.597363   15247300.0   
2004-08-25  52.542193  54.167209  52.100830   53.164113    9188600.0   

              HL_PCT  PCT_change  
Date                              
2004-08-19  3.712563    0.324968  
2004-08-20  0.710922    7.227007  
2004-08-23  3.729433   -1.227880  
2004-08-24  6.417469   -5.726357  
2004-08-25  1.886792    1.183658  


It becomes useful to examine relationships between 'High' and 'Low', or 'Open' and 'Close' rather than just simply listing all the price data for each. It's useful to look at whether they have a positive or negative relationship and if so, how much they move together.

In [17]:
df.tail()

Unnamed: 0_level_0,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume,HL_PCT,PCT_change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-03-21,1092.57,1108.7,1087.21,1094.0,1990515.0,1.343693,0.130884
2018-03-22,1080.01,1083.92,1049.64,1053.15,3418154.0,2.921711,-2.487014
2018-03-23,1051.37,1066.78,1024.87,1026.55,2413517.0,3.918952,-2.360729
2018-03-26,1050.6,1059.27,1010.58,1054.09,3272409.0,0.491419,0.332191
2018-03-27,1063.9,1064.54,997.62,1006.94,2940957.0,5.720301,-5.353887
