# Nb-20180308-2137-BigGainPredict

This is continuation of previous BigGainPredict after implementing Bollinger Bands...

So,  get list of mega, large, and mid-cap companies on NYSE for looking for 1% price gain prediction using machine learning.

- https://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NYSE&marketcap=Mega-cap
- https://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NYSE&marketcap=Large-cap
- https://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NYSE&marketcap=Mid-cap

In [1]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import datetime as dt
import numpy as np
import pandas as pd
import matplotlib.colors as colors
import matplotlib.dates as mdates
import matplotlib.ticker as mticker
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager

In [3]:
import finance as fat

In [4]:
# Change the plot size.
plt.rcParams['figure.figsize'] = [18.0, 10.0]

### Load Company Data

In [17]:
c = pd.read_csv('data/NYSE-Companies-Mega-Large-Mid-Cap.csv')

In [18]:
c.head()

Unnamed: 0,Symbol,Name,LastSale,MarketCap,ADR TSO,IPOyear,Sector,Industry,Summary Quote,Unnamed: 9
0,BABA,Alibaba Group Holding Limited,187.18,480272900000.0,,2014.0,Miscellaneous,Business Services,https://www.nasdaq.com/symbol/baba,
1,T,AT&T Inc.,37.11,227912300000.0,,,Public Utilities,Telecommunications Equipment,https://www.nasdaq.com/symbol/t,
2,BAC,Bank of America Corporation,32.2,329846800000.0,,,Finance,Major Banks,https://www.nasdaq.com/symbol/bac,
3,BA,Boeing Company (The),348.73,205224200000.0,,,Capital Goods,Aerospace,https://www.nasdaq.com/symbol/ba,
4,CVX,Chevron Corporation,113.35,216527200000.0,,,Energy,Integrated oil Companies,https://www.nasdaq.com/symbol/cvx,


In [19]:
c = c.sort_values('MarketCap',ascending=False)

In [20]:
c.head()

Unnamed: 0,Symbol,Name,LastSale,MarketCap,ADR TSO,IPOyear,Sector,Industry,Summary Quote,Unnamed: 9
0,BABA,Alibaba Group Holding Limited,187.18,480272900000.0,,2014.0,Miscellaneous,Business Services,https://www.nasdaq.com/symbol/baba,
7,JPM,J P Morgan Chase & Co,114.74,393782900000.0,,,Finance,Major Banks,https://www.nasdaq.com/symbol/jpm,
8,JNJ,Johnson & Johnson,132.06,354304000000.0,,,Health Care,Major Pharmaceuticals,https://www.nasdaq.com/symbol/jnj,
2,BAC,Bank of America Corporation,32.2,329846800000.0,,,Finance,Major Banks,https://www.nasdaq.com/symbol/bac,
5,XOM,Exxon Mobil Corporation,74.12,314080700000.0,,,Energy,Integrated oil Companies,https://www.nasdaq.com/symbol/xom,


### Create features

These are some ideas from a paper: _A Feature Fusion Based Forecasting Model for Financial Time Series_ by
Zhiqiang Guo , Huaiqing Wang, Quan Liu, Jie Yang
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0101113

![PredictionParameters.png](attachment:PredictionParameters.png)

We'll start of with a sub-set of these:

### Create scalable features

First, create features that can be linearly scaled (within a row).

**Hypothesis**: having all price data scaled between 0.0-1.0 on a per row basis will improve ease of training and accuracy of predicitions.

In [71]:
ticker = c.iloc[1]['Symbol']

In [86]:
data = fat.get_price_data(ticker)

Loaded data for JPM: 1980-03-17 to 2018-03-07.


In [87]:
# Just use 'Adj Close'
data = pd.DataFrame(data, columns=['Adj Close'])
data.head()

Unnamed: 0_level_0,Adj Close
Date,Unnamed: 1_level_1
1980-03-17,1.284682
1980-03-18,1.294128
1980-03-19,1.31302
1980-03-20,1.303574
1980-03-21,1.331913


In [88]:
# Gain
data['Gain'] = data['Adj Close'].diff()
data.head()

Unnamed: 0_level_0,Adj Close,Gain
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-03-17,1.284682,
1980-03-18,1.294128,0.009446
1980-03-19,1.31302,0.018892
1980-03-20,1.303574,-0.009446
1980-03-21,1.331913,0.028339


In [89]:
# SMA: 6, 12, 20, 200
sma_df = fat.create_sma_df(data, 'Adj Close', [6,12])
del sma_df['Adj Close']
data = data.join(sma_df)
data.tail()

Unnamed: 0_level_0,Adj Close,Gain,Adj Close SMA6,Adj Close SMA12
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2018-03-01,113.43,-2.07,116.225,115.408333
2018-03-02,113.32,-0.11,115.948333,115.4825
2018-03-05,115.059998,1.739998,115.573333,115.485
2018-03-06,115.160004,0.100006,114.971667,115.455833
2018-03-07,114.730003,-0.430001,114.533334,115.46


In [90]:
ema_df = fat.create_ema_df(data, 'Adj Close', [12,26])
del ema_df['Adj Close']
data = data.join(ema_df).dropna()
data.tail()

Unnamed: 0_level_0,Adj Close,Gain,Adj Close SMA6,Adj Close SMA12,Adj Close EMA12,Adj Close EMA26
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-03-01,113.43,-2.07,116.225,115.408333,115.164644,114.012405
2018-03-02,113.32,-0.11,115.948333,115.4825,114.880853,113.961115
2018-03-05,115.059998,1.739998,115.573333,115.485,114.908414,114.042514
2018-03-06,115.160004,0.100006,114.971667,115.455833,114.94712,114.125291
2018-03-07,114.730003,-0.430001,114.533334,115.46,114.913717,114.170085


Moving to new notebook to implement Bollinger Bands...