# Data Wrangler for Neural Net Stock Price Problem

This notebook contains scripts that retrieve stock price data, computes a data frame with the necessary transformations and writes the dataframe to a `.csv` file.
This `.csv` file is the tidy data set that we'll use in the Stock Price Neural Network problem.

## Getting started

The organization Quandl supplies financial and economic data in several easy-to-consume formats for free. We'll get our stock price data from Quandl. To do so, first, you'll need the `quandl` python package. Get this via the following:
```pip install quandl```.

In [1]:
import pandas as pd
import quandl
import numpy as np

Next, you'll need a quandl API key, which you can obtain from [Quandl](https://docs.quandl.com/docs#section-authentication). Once you have your key, put it in a YAML file under Quandl with key 'apikey' or just replace the 'YourQuandlAPIKey' string with yours and comment out the yaml code.

In [2]:
#get my quandl api key
import yaml

#comment out the next three lines if just supply your apikey
with open('./databases.yaml', 'r') as f:
        dbparams = yaml.load(f)
apikey = dbparams['Quandl']['apikey']

#comment out three lines above if you use this
#apikey = 'YourQuandlAPIKey'

quandl.ApiConfig.api_key = apikey

## Data from Quandl

In [3]:
# 2018 data for Cisco
data = quandl.get("WIKI/CSCO", start_date = '2018-01-01')

In [4]:
data.tail(10)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ex-Dividend,Split Ratio,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2018-03-14,45.34,45.7587,45.09,45.28,20923845.0,0.0,1.0,45.34,45.7587,45.09,45.28,20923845.0
2018-03-15,45.3,45.735,45.12,45.33,23338222.0,0.0,1.0,45.3,45.735,45.12,45.33,23338222.0
2018-03-16,45.33,45.6,44.97,45.01,52355707.0,0.0,1.0,45.33,45.6,44.97,45.01,52355707.0
2018-03-19,44.59,44.82,43.9,44.27,24524286.0,0.0,1.0,44.59,44.82,43.9,44.27,24524286.0
2018-03-20,44.49,44.64,44.18,44.37,22385001.0,0.0,1.0,44.49,44.64,44.18,44.37,22385001.0
2018-03-21,44.24,44.9,44.1331,44.31,20616375.0,0.0,1.0,44.24,44.9,44.1331,44.31,20616375.0
2018-03-22,43.76,44.02,43.02,43.07,29374734.0,0.0,1.0,43.76,44.02,43.02,43.07,29374734.0
2018-03-23,43.71,43.84,42.42,42.42,30674112.0,0.0,1.0,43.71,43.84,42.42,42.42,30674112.0
2018-03-26,43.25,44.16,42.83,44.06,28454954.0,0.0,1.0,43.25,44.16,42.83,44.06,28454954.0
2018-03-27,44.49,44.52,42.24,42.68,30088447.0,0.0,1.0,44.49,44.52,42.24,42.68,30088447.0


In [5]:
#list of dow companies from https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average 
dow_companies = ['MMM','AXP', 'AAPL','BA','CAT',
                 'CVX','CSCO','KO','DWDP','XOM',
                 'GS','HD','IBM','INTC','JNJ','JPM',
                 'MCD','MRK','MSFT','NKE','PFE',
                 'PG','TRV','UNH','UTX','VZ','V',
                 'WMT','WBA','DIS']

In [6]:
len(dow_companies)

30

## Calcuate Moving Averages and Buy and Sell Signals

In [7]:
#get the data for the first 3 dow companies:

cols = ['Open','High','Low','Close','Volume']
company = 'CSCO' #Cisco Systems Inc
dow_data = quandl.get('WIKI/'+company, start_date = '2018-01-01')[cols]

In [8]:
dow_data.keys()


Index(['Open', 'High', 'Low', 'Close', 'Volume'], dtype='object')

In [9]:
dow_data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-03-21,44.24,44.9,44.1331,44.31,20616375.0
2018-03-22,43.76,44.02,43.02,43.07,29374734.0
2018-03-23,43.71,43.84,42.42,42.42,30674112.0
2018-03-26,43.25,44.16,42.83,44.06,28454954.0
2018-03-27,44.49,44.52,42.24,42.68,30088447.0


In [10]:
# calculate the moving averages just on the closing price
#9-day trailing moving average
MA9 =  dow_data[['Close']].rolling(window= 9, center=False).mean().rename(columns={'Close':'MA9'})
MA18 = dow_data[['Close']].rolling(window=18, center=False).mean().rename(columns={'Close':'MA18'})

In [11]:
# paste the moving averages onto the right side of the dataframe
dow_data = dow_data.merge(MA9,  how='left', left_index=True, right_index=True)
dow_data = dow_data.merge(MA18, how='left', left_index=True, right_index=True)



In [12]:
dow_data

Unnamed: 0_level_0,Open,High,Low,Close,Volume,MA9,MA18
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-01-02,38.67,38.95,38.43,38.86,19972431.0,,
2018-01-03,38.72,39.285,38.53,39.17,29066090.0,,
2018-01-04,39.05,39.54,38.93,38.99,20606344.0,,
2018-01-05,39.55,39.88,39.365,39.53,24369510.0,,
2018-01-08,39.52,39.96,39.35,39.94,16511704.0,,
2018-01-09,39.79,39.96,39.54,39.69,21339760.0,,
2018-01-10,39.65,40.24,39.63,39.91,19110146.0,,
2018-01-11,40.14,40.21,39.75,40.1,20178596.0,,
2018-01-12,40.22,40.93,40.05,40.87,22962700.0,39.673333,
2018-01-16,40.9,41.16,40.32,40.54,32273879.0,39.86,


In [13]:
# is the fast moving average (MA9) greater then the slow moving average (MA18)?
dow_data['f_gtr_s'] = (dow_data.MA9 > dow_data.MA18)*1 # need these as ints
dow_data['f_gtr_s'] = np.where(np.logical_or(np.isnan(dow_data.MA9), np.isnan(dow_data.MA18)), np.NaN, dow_data.f_gtr_s)

In [14]:
dow_data

Unnamed: 0_level_0,Open,High,Low,Close,Volume,MA9,MA18,f_gtr_s
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-01-02,38.67,38.95,38.43,38.86,19972431.0,,,
2018-01-03,38.72,39.285,38.53,39.17,29066090.0,,,
2018-01-04,39.05,39.54,38.93,38.99,20606344.0,,,
2018-01-05,39.55,39.88,39.365,39.53,24369510.0,,,
2018-01-08,39.52,39.96,39.35,39.94,16511704.0,,,
2018-01-09,39.79,39.96,39.54,39.69,21339760.0,,,
2018-01-10,39.65,40.24,39.63,39.91,19110146.0,,,
2018-01-11,40.14,40.21,39.75,40.1,20178596.0,,,
2018-01-12,40.22,40.93,40.05,40.87,22962700.0,39.673333,,
2018-01-16,40.9,41.16,40.32,40.54,32273879.0,39.86,,


In [15]:
# find the transition points by comparing consecutive days
# first, add column that is the lagged MA comparison (lagged by one day)
dow_data['f_gtr_s_lag1'] = dow_data.f_gtr_s.shift()
#use boolean arithmetic to see the change points
dow_data['Crossover'] = dow_data.f_gtr_s - dow_data.f_gtr_s_lag1

In the table below, a Crossover value of 1 indicates a row (a date) on which 9-day moving average transitioned from being less than the 18-day moving average to being greater than the 18-day moving average.
Conversely, a crossover value of -1 indicates a date on which the 9-day moving average transitioned from being greater than the 18-day moving average to being less than the 18-day moving averages.
***A Crossover value of 1 is a BUY signal; a Crossover value of -1 is a SELL signal***

In [16]:
dow_data

Unnamed: 0_level_0,Open,High,Low,Close,Volume,MA9,MA18,f_gtr_s,f_gtr_s_lag1,Crossover
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2018-01-02,38.67,38.95,38.43,38.86,19972431.0,,,,,
2018-01-03,38.72,39.285,38.53,39.17,29066090.0,,,,,
2018-01-04,39.05,39.54,38.93,38.99,20606344.0,,,,,
2018-01-05,39.55,39.88,39.365,39.53,24369510.0,,,,,
2018-01-08,39.52,39.96,39.35,39.94,16511704.0,,,,,
2018-01-09,39.79,39.96,39.54,39.69,21339760.0,,,,,
2018-01-10,39.65,40.24,39.63,39.91,19110146.0,,,,,
2018-01-11,40.14,40.21,39.75,40.1,20178596.0,,,,,
2018-01-12,40.22,40.93,40.05,40.87,22962700.0,39.673333,,,,
2018-01-16,40.9,41.16,40.32,40.54,32273879.0,39.86,,,,


In [17]:
#ditch the rows with NAs due to the trailing moving averages
dow_data.dropna(axis=0,inplace=True)
dow_data

Unnamed: 0_level_0,Open,High,Low,Close,Volume,MA9,MA18,f_gtr_s,f_gtr_s_lag1,Crossover
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2018-01-29,42.3,42.98,42.3,42.85,23053104.0,41.892222,40.876111,1.0,1.0,0.0
2018-01-30,42.685,42.86,41.97,42.25,25671649.0,42.008889,41.047222,1.0,1.0,0.0
2018-01-31,41.98,42.01,41.35,41.54,34571282.0,42.035556,41.188889,1.0,1.0,0.0
2018-02-01,41.09,42.11,40.67,41.7,26001606.0,42.081111,41.309444,1.0,1.0,0.0
2018-02-02,41.5,41.95,40.87,40.93,25920378.0,42.0,41.364444,1.0,1.0,0.0
2018-02-05,40.87,41.405,39.08,39.09,39491216.0,41.665556,41.331111,1.0,1.0,0.0
2018-02-06,38.33,40.305,37.35,40.17,52940550.0,41.443333,41.345556,1.0,1.0,0.0
2018-02-07,40.31,41.206,40.03,40.34,31841942.0,41.27,41.358889,0.0,1.0,-1.0
2018-02-08,40.44,40.76,38.72,38.73,39300118.0,40.844444,41.24,0.0,0.0,0.0
2018-02-09,39.0,39.92,38.23,39.51,47947702.0,40.473333,41.182778,0.0,0.0,0.0


In [18]:
# augments the stock history dataframe with a Crossover column that indicates whether the fast moving average
# changed position relative to the slow moving average, and a few intermediate column calculations

def computeCrossover(history, fast_win=9, slow_win=18):
    
    #compute moving averages based on close price
    fast = history[['Close']].rolling(window = fast_win, center=False).mean().rename(columns={'Close':'Fast'})
    slow = history[['Close']].rolling(window = slow_win, center=False).mean().rename(columns={'Close':'Slow'})
    
    #paste them on the right
    history = history.merge(fast, how='left', left_index=True, right_index=True)
    history = history.merge(slow, how='left', left_index=True, right_index=True)
    
    #compute whether the fast MA exceeds the slow MA
    history['f_gtr_s'] = (history.Fast > history.Slow)*1 # need these as ints, not booleans
    # Restore proxies (NaNs) where comparison wasn't not valid - had a NaN on either side
    history['f_gtr_s'] = np.where(np.logical_or(np.isnan(history.Fast), np.isnan(history.Slow)), np.NaN, history.f_gtr_s)
    
    # find the transition points by comparing consecutive days
    # first, add column that is the lagged MA comparison (lagged by one day)
    history['f_gtr_s_lag1'] = history.f_gtr_s.shift()
    #subtract consecutive observations to see if current observation differs from previous observation
    history['Crossover'] = history.f_gtr_s - history.f_gtr_s_lag1
    #Crossover == 1 => fast moved above slow; Crossover == -1 => fast moved below slow;
    #Crossover == 0 => fast and slow maintained their previous postions.
    
    #ditch the rows with NAs due to the trailing moving averages
    history.dropna(axis=0,inplace=True)
    
    return history


In [20]:
def getBuy_Dates(history):
    buy_dates = history.index[history.Crossover == 1]
    return buy_dates

In [21]:
cols = ['Open','High','Low','Close','Volume']
company = 'CSCO' #Cisco Systems Inc
CSCO = quandl.get('WIKI/'+company, start_date = '2010-01-01')[cols]

In [22]:
CSCO.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-03-21,44.24,44.9,44.1331,44.31,20616375.0
2018-03-22,43.76,44.02,43.02,43.07,29374734.0
2018-03-23,43.71,43.84,42.42,42.42,30674112.0
2018-03-26,43.25,44.16,42.83,44.06,28454954.0
2018-03-27,44.49,44.52,42.24,42.68,30088447.0


In [23]:
CSCO = computeCrossover(CSCO)
buy_dates = getBuy_Dates(CSCO)

In [24]:
buy_dates

DatetimeIndex(['2010-02-11', '2010-04-15', '2010-06-23', '2010-07-15',
               '2010-09-16', '2010-12-14', '2011-04-07', '2011-05-03',
               '2011-06-30', '2011-08-22', '2011-09-08', '2011-10-11',
               '2011-12-08', '2012-01-04', '2012-03-20', '2012-06-14',
               '2012-08-08', '2012-09-13', '2012-11-20', '2013-01-08',
               '2013-02-11', '2013-03-08', '2013-04-12', '2013-05-08',
               '2013-07-08', '2013-08-06', '2013-09-13', '2013-11-07',
               '2013-12-26', '2014-02-12', '2014-03-26', '2014-05-16',
               '2014-07-01', '2014-09-03', '2014-10-30', '2014-12-26',
               '2015-01-21', '2015-02-13', '2015-04-15', '2015-06-23',
               '2015-07-20', '2015-09-15', '2015-10-05', '2015-12-04',
               '2015-12-28', '2016-02-12', '2016-04-21', '2016-05-23',
               '2016-07-11', '2016-08-25', '2016-08-30', '2016-09-29',
               '2016-11-03', '2016-12-14', '2017-01-26', '2017-04-27',
      

## Get the HLOCV for the Purchase Date and 4 Previous Days

In [18]:
# first purchase date
bd = dow_data.query('Crossover==1').index[0]

In [19]:
# row in the data frame of the purchase date
indx = dow_data.index.get_loc(bd)

In [20]:
indx

15

In [21]:
cols = ['Open','High','Low','Close','Volume']

In [22]:
# these will eventually be our predictors (25 for each purchase)
dow_data.iloc[indx-4:indx+1][cols]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-02-13,40.5,41.31,40.22,41.23,29220084.0
2018-02-14,41.04,42.26,40.99,42.09,39880261.0
2018-02-15,45.065,45.13,43.26,44.08,71490591.0
2018-02-16,43.885,45.09,43.79,44.33,37915915.0
2018-02-20,44.0,44.69,43.81,44.06,26258211.0


In [23]:
# turn the above into a one-row data frame.
#First, we need some column names: XX_n will stand for variable XX at n days before purchase date
#e.g., Column 'Open_4' represents open price 4 days before purchase date
colnames = [c+'_'+str(p) for p in [4,3,2,1]  for c in cols ] + cols

In [24]:
str(colnames)

"['Open_4', 'High_4', 'Low_4', 'Close_4', 'Volume_4', 'Open_3', 'High_3', 'Low_3', 'Close_3', 'Volume_3', 'Open_2', 'High_2', 'Low_2', 'Close_2', 'Volume_2', 'Open_1', 'High_1', 'Low_1', 'Close_1', 'Volume_1', 'Open', 'High', 'Low', 'Close', 'Volume']"

In [25]:
pd.DataFrame(dow_data.iloc[indx-4:indx+1][cols].values.reshape(1,-1),columns = colnames, index=[bd])

Unnamed: 0,Open_4,High_4,Low_4,Close_4,Volume_4,Open_3,High_3,Low_3,Close_3,Volume_3,...,Open_1,High_1,Low_1,Close_1,Volume_1,Open,High,Low,Close,Volume
2018-02-20,40.5,41.31,40.22,41.23,29220084.0,41.04,42.26,40.99,42.09,39880261.0,...,43.885,45.09,43.79,44.33,37915915.0,44.0,44.69,43.81,44.06,26258211.0


In [53]:
#function to get the HLOCV info for 4 days before the buy date(bd)
#returns a wide data frame
def getPredictors(bd, history, n=5):
    
    # row in the data frame of the buy date
    indx = history.index.get_loc(bd)
    
    # column names of interest in the history dataframe
    cols = ['Open','High','Low','Close','Volume']
    # column names for the df to be returned:
    colnames = [c+'_'+str(abs(p)) for p in range(-1*(n-1),0,1)  for c in cols ] + cols
    
    #make a row vector out of the data to be returned:
    rv = history.iloc[(indx-(n-1)):indx+1][cols].values.reshape(1,-1)
    
    #handle case where there's insufficient history - just return df with all NaNs.
    if rv.size != len(colnames):
        rv = np.full((1,len(colnames)), np.nan)
    
    ldf = pd.DataFrame(rv, columns = colnames, index = [bd])
    ldf.index.name = 'Buy_Date'
    #sort the column names for aesthetics
    colnames.sort()
    
    return ldf[colnames]

In [54]:
getPredictors(bd, dow_data)

Unnamed: 0_level_0,Close,Close_1,Close_2,Close_3,Close_4,High,High_1,High_2,High_3,High_4,...,Open,Open_1,Open_2,Open_3,Open_4,Volume,Volume_1,Volume_2,Volume_3,Volume_4
Buy_Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-02-20,44.06,44.33,44.08,42.09,41.23,44.69,45.09,45.13,42.26,41.31,...,44.0,43.885,45.065,41.04,40.5,26258211.0,37915915.0,71490591.0,39880261.0,29220084.0


In [55]:
dow_data.iloc[indx-4:indx+1][cols]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-02-13,40.5,41.31,40.22,41.23,29220084.0
2018-02-14,41.04,42.26,40.99,42.09,39880261.0
2018-02-15,45.065,45.13,43.26,44.08,71490591.0
2018-02-16,43.885,45.09,43.79,44.33,37915915.0
2018-02-20,44.0,44.69,43.81,44.06,26258211.0


In [28]:
xx=dow_data.index[0]
xx

Timestamp('2018-01-29 00:00:00')

In [29]:
getPredictors(xx,dow_data)

Unnamed: 0,Open_4,High_4,Low_4,Close_4,Volume_4,Open_3,High_3,Low_3,Close_3,Volume_3,...,Open_1,High_1,Low_1,Close_1,Volume_1,Open,High,Low,Close,Volume
2018-01-29,,,,,,,,,,,...,,,,,,,,,,


## Sale Results

Stocks are purchased on a 'buy' signal. Once purchased, stocks are sold on the earliest of three dates:

1. 'Sell' signal (crossover value is -1 indicating that the fast moving average transitioned from being higher than the slow moving average to being below the slow moving average
1. maximum number of trading days to hold the stock has been exceeded
1. value of stock decreased below some threshold amount

In [43]:
# function to compute the result of a buy.
# given a buy date and the stock history, loss threshold, max trading days hold time,
# returns the date of sale, reason for sale and the price at sale date

def saleResult(bd,history, loss_thresh = 0.2, maxhold = 20):
    
    reasons = ['SellSig', 'MaxHold', 'LossThresh']
    future_date = pd.to_datetime('2200-12-31') # way after we're all gone
    
    # not interested in the history before the buy data
    df = history.loc[bd:] # this makes the record for the buy date the 0'th record, very handy!
    
    # first sell signal after the buy date; Crossover value of -1 is a sell signal
    ss = future_date
    sell_signals = df.index[df.Crossover == -1]
    if sell_signals.size > 0:
        ss = sell_signals[0]
    
    # max trading days
    md = future_date
    if df.index.size >= maxhold:
        md = df.index[maxhold]
        
    # loss_threshold
    lt = future_date
    maxloss = 1-loss_thresh
    buy_price = df.Close[0]
    min_price = buy_price*maxloss
    lossdates = df.index[df.Close <= min_price]
    if lossdates.size > 0:
        lt = lossdates[0]
    
    # figure out which is the earliest date
    exit_dates = np.array([ss, md, lt]) # need to be in same order as reasons list
    exit_date_i = exit_dates.argmin()
    
    exit_date = exit_dates[exit_date_i]
    reason = reasons[exit_date_i]
    
    # get the exit price
    rn = df.index.get_loc(exit_date)
    exit_price = df.Close[rn]
    
    rdf = pd.DataFrame({'Sell_Date':exit_date, 'Sell_Reason': reason, 'Sell_Price':exit_price},index=[bd])
    rdf.index.name = 'Buy_Date'
    
    return rdf

In [47]:
saleResult(bd, dow_data)

Unnamed: 0_level_0,Exit_Date,Exit_Reason,Exit_price
Buy_Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-02-20,2018-03-20,MaxHold,44.37


In [32]:
bd

Timestamp('2018-02-20 00:00:00')

In [33]:
dow_data

Unnamed: 0_level_0,Open,High,Low,Close,Volume,MA9,MA18,f_gtr_s,f_gtr_s_lag1,Crossover
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2018-01-29,42.3,42.98,42.3,42.85,23053104.0,41.892222,40.876111,1.0,1.0,0.0
2018-01-30,42.685,42.86,41.97,42.25,25671649.0,42.008889,41.047222,1.0,1.0,0.0
2018-01-31,41.98,42.01,41.35,41.54,34571282.0,42.035556,41.188889,1.0,1.0,0.0
2018-02-01,41.09,42.11,40.67,41.7,26001606.0,42.081111,41.309444,1.0,1.0,0.0
2018-02-02,41.5,41.95,40.87,40.93,25920378.0,42.0,41.364444,1.0,1.0,0.0
2018-02-05,40.87,41.405,39.08,39.09,39491216.0,41.665556,41.331111,1.0,1.0,0.0
2018-02-06,38.33,40.305,37.35,40.17,52940550.0,41.443333,41.345556,1.0,1.0,0.0
2018-02-07,40.31,41.206,40.03,40.34,31841942.0,41.27,41.358889,0.0,1.0,-1.0
2018-02-08,40.44,40.76,38.72,38.73,39300118.0,40.844444,41.24,0.0,0.0,0.0
2018-02-09,39.0,39.92,38.23,39.51,47947702.0,40.473333,41.182778,0.0,0.0,0.0


In [44]:
saleResult(bd, dow_data,loss_thresh=0.01)

Unnamed: 0,Exit_Date,Exit_Reason,Exit_price
2018-02-20,2018-02-21,LossThresh,43.31


In [45]:
saleResult(bd, dow_data, maxhold=200)

Unnamed: 0,Exit_Date,Exit_Reason,Exit_price
2018-02-20,2018-03-23,SellSig,42.42


In [44]:
CSCO_Purchases = pd.concat([saleResult(bd,CSCO) for bd in buy_dates] )

In [45]:
CSCO_Purchases

Unnamed: 0_level_0,Sell_Date,Sell_Price,Sell_Reason
Buy_Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-02-11,2010-03-12,25.8800,MaxHold
2010-04-15,2010-05-06,25.4880,SellSig
2010-06-23,2010-06-28,22.4200,SellSig
2010-07-15,2010-08-12,21.3600,MaxHold
2010-09-16,2010-10-14,23.0700,MaxHold
2010-12-14,2011-01-12,21.1200,MaxHold
2011-04-07,2011-04-19,16.6100,SellSig
2011-05-03,2011-05-17,16.6400,SellSig
2011-06-30,2011-07-29,15.9700,MaxHold
2011-08-22,2011-09-06,15.2800,SellSig


In [46]:
CSCO.loc[buy_dates,['Close']].merge(CSCO_Purchases, how='left', left_index=True, right_index=True).reset_index()

Unnamed: 0,Date,Close,Sell_Date,Sell_Price,Sell_Reason
0,2010-02-11,23.930,2010-03-12,25.8800,MaxHold
1,2010-04-15,27.210,2010-05-06,25.4880,SellSig
2,2010-06-23,22.860,2010-06-28,22.4200,SellSig
3,2010-07-15,23.920,2010-08-12,21.3600,MaxHold
4,2010-09-16,21.930,2010-10-14,23.0700,MaxHold
5,2010-12-14,19.540,2011-01-12,21.1200,MaxHold
6,2011-04-07,17.910,2011-04-19,16.6100,SellSig
7,2011-05-03,17.410,2011-05-17,16.6400,SellSig
8,2011-06-30,15.610,2011-07-29,15.9700,MaxHold
9,2011-08-22,15.010,2011-09-06,15.2800,SellSig


In [61]:
def getPurchases(tkr, start_date='2010-01-01',
                loss_thresh = 0.2, maxhold = 20,
                fast_win=9, slow_win=18,
                covar_n=5):
    
    #get the trading data
    hist = quandl.get('WIKI/' + tkr,  start_date = start_date)
    
    #calculate the crossover points and purchase dates:
    hist = computeCrossover(hist, fast_win=fast_win, slow_win=slow_win)
    buy_dates = getBuy_Dates(hist)
    
    #get the results
    results = pd.concat([saleResult(bd,hist,loss_thresh=loss_thresh, maxhold=maxhold) for bd in buy_dates] )
    
    #get the covariates
    # not yet
    
    #get the purchase price
    purchase_prices = hist.loc[buy_dates,['Close']].reset_index()
    purchase_prices.rename(columns = {'Close':'Purchase_Price', 'Date':'Purchase_Date'},inplace=True)
    purchase_prices['Ticker'] = tkr
    purchase_prices = purchase_prices[ ['Ticker','Purchase_Date','Purchase_Price'] ]
    
    #paste it all together
    
    df = purchase_prices.merge(results, how='left', left_on='Purchase_Date', right_index=True)
    df['Gain_Pct'] = 100.0*(df.Sell_Price/df.Purchase_Price-1.0)
    
    return df

In [62]:
getPurchases('CSCO',loss_thresh=0.05)


Unnamed: 0,Ticker,Purchase_Date,Purchase_Price,Sell_Date,Sell_Price,Sell_Reason,Gain_Pct
0,CSCO,2010-02-11,23.930,2010-03-12,25.8800,MaxHold,8.148767
1,CSCO,2010-04-15,27.210,2010-05-06,25.4880,SellSig,-6.328556
2,CSCO,2010-06-23,22.860,2010-06-28,22.4200,SellSig,-1.924759
3,CSCO,2010-07-15,23.920,2010-07-21,22.5600,LossThresh,-5.685619
4,CSCO,2010-09-16,21.930,2010-10-14,23.0700,MaxHold,5.198358
5,CSCO,2010-12-14,19.540,2011-01-12,21.1200,MaxHold,8.085977
6,CSCO,2011-04-07,17.910,2011-04-18,16.7300,LossThresh,-6.588498
7,CSCO,2011-05-03,17.410,2011-05-17,16.6400,SellSig,-4.422746
8,CSCO,2011-06-30,15.610,2011-07-29,15.9700,MaxHold,2.306214
9,CSCO,2011-08-22,15.010,2011-09-06,15.2800,SellSig,1.798801


In [63]:
getPurchases('MSFT',loss_thresh=0.05)

Unnamed: 0,Ticker,Purchase_Date,Purchase_Price,Sell_Date,Sell_Price,Sell_Reason,Gain_Pct
0,MSFT,2010-02-23,28.330,2010-03-23,29.8800,MaxHold,5.471232
1,MSFT,2010-04-13,30.450,2010-05-05,29.8500,SellSig,-1.970443
2,MSFT,2010-06-21,25.950,2010-06-25,24.5325,LossThresh,-5.462428
3,MSFT,2010-07-15,25.510,2010-08-11,24.8600,SellSig,-2.548020
4,MSFT,2010-09-14,25.030,2010-10-01,24.3800,SellSig,-2.596884
5,MSFT,2010-10-14,25.230,2010-11-11,26.6800,MaxHold,5.747126
6,MSFT,2010-12-07,26.870,2011-01-05,28.0000,MaxHold,4.205434
7,MSFT,2011-03-31,25.390,2011-04-20,25.7600,SellSig,1.457267
8,MSFT,2011-04-29,25.920,2011-05-12,25.3200,SellSig,-2.314815
9,MSFT,2011-06-23,24.630,2011-07-22,27.5300,MaxHold,11.774259
