# **Stock Price Predictor**

In [5]:
import pandas as pd
import math, datetime, quandl
import numpy as np
from sklearn import preprocessing, svm, cross_validation
from sklearn.linear_model import LinearRegression
from matplotlib import style
import matplotlib.pyplot as plt
import matplotlib
import warnings
warnings.filterwarnings("ignore")


## Preparing the data

1. Take data from [WIKI/GOOGL](https://www.quandl.com)

2. Feature engineering

3. Handling Null values

In [6]:
df = quandl.get('WIKI/GOOGL')
print(df.shape)
df.head(5)

(3424, 12)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ex-Dividend,Split Ratio,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2004-08-19,100.01,104.06,95.96,100.335,44659000.0,0.0,1.0,50.159839,52.191109,48.128568,50.322842,44659000.0
2004-08-20,101.01,109.08,100.5,108.31,22834300.0,0.0,1.0,50.661387,54.708881,50.405597,54.322689,22834300.0
2004-08-23,110.76,113.48,109.05,109.4,18256100.0,0.0,1.0,55.551482,56.915693,54.693835,54.869377,18256100.0
2004-08-24,111.24,111.6,103.57,104.87,15247300.0,0.0,1.0,55.792225,55.972783,51.94535,52.597363,15247300.0
2004-08-25,104.76,108.0,103.88,106.0,9188600.0,0.0,1.0,52.542193,54.167209,52.10083,53.164113,9188600.0


### Featue Engineering
* Take only Adj. Open	Adj. High	Adj. Low	Adj. Close	Adj. Volume.
* Create a column to store high low and low daily percentage.
* Create a column to to measure daily closing and opening percentage change.
* Remove Adj. High, Adj. Low , Adj. Open and Adj. Close.
* Create a column label to store 1% shifted values of Adj. Close.

In [7]:
df = df[['Adj. Open', 'Adj. High','Adj. Low','Adj. Close','Adj. Volume']]
df['HL_pct'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Low'] *100
df['pct_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] *100
df = df[['Adj. Close', 'HL_pct', 'pct_change', 'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(-99999, inplace=True)

In [8]:
forecast_out = int(math.ceil(0.01*len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
df.tail(10)

Unnamed: 0_level_0,Adj. Close,HL_pct,pct_change,Adj. Volume,label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-03-14,1148.89,1.524051,0.269681,2033697.0,
2018-03-15,1150.61,2.363383,0.090469,1623868.0,
2018-03-16,1134.42,2.249505,-1.811572,2654602.0,
2018-03-19,1100.07,2.796349,-1.58263,3076349.0,
2018-03-20,1095.8,2.136878,-0.236708,2709310.0,
2018-03-21,1094.0,1.976619,0.130884,1990515.0,
2018-03-22,1053.15,3.265882,-2.487014,3418154.0,
2018-03-23,1026.55,4.089299,-2.360729,2413517.0,
2018-03-26,1054.09,4.818025,0.332191,3272409.0,
2018-03-27,1006.94,6.707965,-5.353887,2940957.0,


### Handling Null values
As we can see last 1% values of label column are NULL. So to remove them we are simply removing them, because removing 1% data won't be that harmful.

In [9]:
df.dropna(inplace=True)
df.tail()

Unnamed: 0_level_0,Adj. Close,HL_pct,pct_change,Adj. Volume,label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-30,1177.37,1.142604,-0.029718,1792602.0,1094.0
2018-01-31,1182.22,1.213207,-0.134312,1643877.0,1053.15
2018-02-01,1181.59,1.547,0.476195,2774967.0,1026.55
2018-02-02,1119.2,1.811604,-0.729098,5798880.0,1054.09
2018-02-05,1068.76,5.512236,-2.89385,3742469.0,1006.94


Now our data looks pretty good, time to do some machine leaning.

In [10]:
features = np.array(df.drop('label',1))
labels = np.array(df['label'])
features = preprocessing.scale(features)
x_train, x_test, y_train, y_test = cross_validation.train_test_split(features, labels, test_size = 0.1)

print(len(x_train), len(y_train), len(x_test), len(y_test))

3050 3050 339 339


### Using LinearRegession Model

In [11]:
clf = LinearRegression()
clf.fit(x_train, y_train)
accuracy = clf.score(x_test, y_test)
accuracy

0.97713266593538461

Whoa! It's 97.6% accurate, it shows us that Google's stock prices grow almost linearly.