# Stock Price Prediction

### On the problem

* The problem of stock prediction concerns of being able to forecast a *Weakly Non stationary Time Series* so one can **sell** an active at a higher **Ask Price** that is was it's **Bid price** (buying at a smaller price than it was previously sold).


### On the nature of the data

* Stocks, ETF's, hedge funds, can be modeled instantaneously as two Weakly Non stationary Time Series, one for *bid* prices and one for *ask* prices. By weakly Non stationary Time Series ($X(t)$), it is meant:

<center>First momentum invariance.</center>
$$E[X(t)] = E[X(t + \epsilon)]$$ <br/>

<center>Second momentum invariance.</center>
$$C(X(t)) = C(X(t + \alpha)) = E[(X(t) - E[X(t)])(X(t + \epsilon) - E[X(t + \epsilon)])]$$ <br/>
<center>Bounded energy.</center>
$$E[|X(t)|^2] < \infty $$ <br/>




* So, one should interpret this data as not predictable for longer distances but, among several, one can depicts a few techniques to be able to predict those series. Namely:

    - Short term predictions: It transforms a non-stationary series in a locally stationary series. Thus, predictable.
    - Using the first order derivative ($r(t) = (x(t) - x(t - 1))$): It erases the 'memory' of a non-stationary series.


### On the data

In [16]:
import requests
import quandl
import os

import pandas as pd

## Loading data

In [29]:
api_key = os.environ['QUANDL_API_KEY']
metadata_filename = 'WIKI_metadata.csv'
dataset_filename = 'WIKI_dataset.csv'

### Downloading and extracting the metadata

In [51]:
import zipfile
def get_data(url, filename, extract=True):
    r = requests.get(url, allow_redirects=True)
    filename = filename + '.zip' if extract else filename
    open(filename, 'wb').write(r.content)
    if not extract: return
    zipfile.ZipFile(filename, 'r').extractall('./')

In [52]:
get_data('https://www.quandl.com/api/v3/databases/WIKI/metadata?api_key=%s' % api_key, 
         metadata_filename)

get_data('https://www.quandl.com/api/v3/datatables/WIKI/PRICES/delta.json?api_key=%s' % api_key, 
         dataset_filename, extract=False)

In [53]:
metadata_df = pd.read_csv(metadata_filename)
dataset_df = pd.read_csv(dataset_filename)

In [63]:
metadata_df.head(20)

Unnamed: 0,code,name,description,refreshed_at,from_date,to_date
0,A,"Agilent Technologies Inc. (A) Prices, Dividend...","End of day open, high, low, close and volume, ...",2018-03-27 21:46:10,1999-11-18,2018-03-27
1,AA,"Alcoa Inc. (AA) Prices, Dividends, Splits and ...","End of day open, high, low, close and volume, ...",2018-03-27 21:46:10,2016-11-01,2018-03-27
2,AAL,"American Airlines Group Inc. (AAL) Prices, Div...","<p>End of day open, high, low, close and volum...",2018-03-27 21:46:10,2005-09-27,2018-03-27
3,AAMC,"Altisource Asset Management (AAMC) Prices, Div...","<p>End of day open, high, low, close and volum...",2018-03-27 21:46:07,2012-12-13,2018-03-27
4,AAN,"Aaron's Inc. (AAN) Prices, Dividends, Splits a...","<p>End of day open, high, low, close and volum...",2018-03-27 21:45:53,1984-09-07,2018-03-27
5,AAOI,"Applied Optoelectronics Inc (AAOI) Prices, Div...","<p>End of day open, high, low, close and volum...",2018-03-27 21:46:07,2013-09-26,2018-03-27
6,AAON,"AAON Inc. (AAON) Prices, Dividends, Splits and...","<p>End of day open, high, low, close and volum...",2018-03-27 21:45:53,1992-12-16,2018-03-27
7,AAP,"Advance Auto Parts Inc (AAP) Prices, Dividends...","<p>End of day open, high, low, close and volum...",2018-03-27 21:45:53,2001-11-29,2018-03-27
8,AAPL,"Apple Inc (AAPL) Prices, Dividends, Splits and...","End of day open, high, low, close and volume, ...",2018-03-27 21:46:10,1980-12-12,2018-03-27
9,AAT,"American Assets Trust Inc. (AAT) Prices, Divid...","<p>End of day open, high, low, close and volum...",2018-03-27 21:46:07,2011-01-13,2018-03-27


In [None]:
dataset_df

* Since the api for `get`'ing seems to be not working, we can use the **quandl** module to download the dataset.
* The WIKI dataset has a huge variety of 


In [61]:
quandl.ApiConfig.api_key = api_key

dataset_df = quandl.get_table('WIKI/PRICES', ticker = ['AAPL'], 
                        date = { 'gte': '2015-12-31', 'lte': '2016-12-31' }, 
                        paginate=True)
dataset_df.head()

Unnamed: 0_level_0,ticker,date,open,high,low,close,volume,ex-dividend,split_ratio,adj_open,adj_high,adj_low,adj_close,adj_volume
None,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,AAPL,2016-12-30,116.65,117.2,115.43,115.82,30586265.0,0.0,1.0,115.209202,115.752409,114.004271,114.389454,30586265.0
1,AAPL,2016-12-29,116.45,117.1095,116.4,116.73,15039519.0,0.0,1.0,115.011672,115.663027,114.96229,115.288214,15039519.0
2,AAPL,2016-12-28,117.52,118.0166,116.2,116.76,20905892.0,0.0,1.0,116.068456,116.558923,114.76476,115.317843,20905892.0
3,AAPL,2016-12-27,116.52,117.8,116.49,117.26,18296855.0,0.0,1.0,115.080808,116.344998,115.051178,115.811668,18296855.0
4,AAPL,2016-12-23,115.59,116.52,115.59,116.52,14249484.0,0.0,1.0,114.162295,115.080808,114.162295,115.080808,14249484.0


In [35]:
r = requests.get(url='https://www.quandl.com/api/v3/datatables/WIKI/PRICES/delta.json?api_key=%s' % api_key, allow_redirects=True)
open(dataset_filename + '.zip', 'wb').write(r.content)

45