# TL-CNN: Tea Leaf - CNN
**Thesis:** Using CNN to read and predict time series represented by images.

**To Test**
- use back HKEX and see if its a code problem or wad?


**Tried:** 
+ RP-Price/FFD/Volume: 0 recall
+ RP-Returns/FFD/Volume: 0 recall
+ fixed fundamental error on 31st May (dataframes not syncing properly)
+ fixed another fundamental error on 1st June (series were being clipped due to improper range

## 0. Loading Libraries and Initalising Parameters

In [1]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import numpy as np
import math
import pandas as pd
pd.options.mode.chained_assignment = None #removes warning https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas
import matplotlib.pyplot as plt
import random
import string
import os
import matplotlib
import glob #for wildcard combining of files
from datetime import datetime
from dateutil.relativedelta import relativedelta

from pyts.image import RecurrencePlot
from pyts.image import GramianAngularField 
from pyts.image import MarkovTransitionField

from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import coint

from sklearn import preprocessing #to normalize data


In [3]:
import quandl
#define API key
quandl.ApiConfig.api_key = "YOURQUANDLAPI"

In [4]:
# Time Series Transformation initialization with default parameters
rp = RecurrencePlot()
mtf = MarkovTransitionField()
gaf = GramianAngularField()

## 1. Get Price, Volume and derivatives from WSE

In [5]:
start_date='31-Dec-1991'
end_date='31-Dec-2018'

In [6]:
#How to get all tickers: https://help.quandl.com/article/92-how-do-i-download-the-quandl-codes-of-all-the-timeseries-in-a-given-data-feed
df_idx = pd.read_csv('WSE_metadata.csv', encoding = "ISO-8859-1")
df_idx['ticker'] = 'WSE/' + df_idx.code #add actual quandl ticker
idx = df_idx.ticker.tolist()
df_idx.head(3)

Unnamed: 0,code,name,description,refreshed_at,from_date,to_date,ticker
0,01NFI,"Stock Prices for 01NFI, ISIN PLNFI0100011, Cur...",Stock prices from Warsaw Stock Exchange (GPW)<...,12/9/16 19:32,12/6/97,6/2/09,WSE/01NFI
1,04PRO,"Stock Prices for 04PRO, ISIN PLNFI0400015, Cur...",Stock prices from Warsaw Stock Exchange (GPW)<...,12/9/16 19:32,12/6/97,6/12/07,WSE/04PRO
2,05VICT,"Stock Prices for 05VICT, ISIN PLNFI0500012, Cu...",Stock prices from Warsaw Stock Exchange (GPW)<...,12/9/16 19:32,12/6/97,6/12/07,WSE/05VICT


In [7]:
#1: Open, 2: High, 3: Low, 4: Close, 5: %Change, 6: Volume 7: # no of trades, 8: Turnover
#df_vol = quandl.get(idx, column_index=6, start_date=start_date, end_date=end_date)
#df_vol.to_csv(start_date + '_' + end_date + '_WSEvol.csv',index_label='Date')
#df_vol.head(2)

In [8]:
df_vol = pd.read_csv(start_date + '_' + end_date + '_WSEvol.csv', index_col='Date')
df_vol.index = pd.to_datetime(df_vol.index)
print(df_vol.shape)
df_vol.head(1)

(6444, 880)


Unnamed: 0_level_0,WSE/01NFI - Volume,WSE/04PRO - Volume,WSE/05VICT - Volume,WSE/06MAGNA - Volume,WSE/08OCTAVA - Volume,WSE/10FOKSAL - Volume,WSE/11BIT - Volume,WSE/11_NFI - Volume,WSE/13FORTUNA - Volume,WSE/14ZACH - Volume,...,WSE/ZETKAMA - Volume,WSE/ZEW - Volume,WSE/ZNTKLAPY - Volume,WSE/ZPUE - Volume,WSE/ZREMB - Volume,WSE/ZREW - Volume,WSE/ZTSERG - Volume,WSE/ZUE - Volume,WSE/ZURAWIE - Volume,WSE/ZYWIEC - Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1991-12-31,,,,,,,,,,,...,,,,,,,,,,13129.0


In [9]:
#1: Open, 2: High, 3: Low, 4: Close, 5: %Change, 6: Volume 7: # no of trades, 8: Turnover
#df_px = quandl.get(idx, column_index=4, start_date=start_date, end_date=end_date)
#df_px.to_csv(start_date + '_' + end_date + '_WSEpx.csv',index_label='Date')

In [10]:
df_px = pd.read_csv(start_date + '_' + end_date + '_WSEpx.csv', index_col='Date')
df_px.index = pd.to_datetime(df_px.index)
print(df_px.shape)
df_px.head(1)

(6444, 880)


Unnamed: 0_level_0,WSE/01NFI - Close,WSE/04PRO - Close,WSE/05VICT - Close,WSE/06MAGNA - Close,WSE/08OCTAVA - Close,WSE/10FOKSAL - Close,WSE/11BIT - Close,WSE/11_NFI - Close,WSE/13FORTUNA - Close,WSE/14ZACH - Close,...,WSE/ZETKAMA - Close,WSE/ZEW - Close,WSE/ZNTKLAPY - Close,WSE/ZPUE - Close,WSE/ZREMB - Close,WSE/ZREW - Close,WSE/ZTSERG - Close,WSE/ZUE - Close,WSE/ZURAWIE - Close,WSE/ZYWIEC - Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1991-12-31,,,,,,,,,,,...,,,,,,,,,,14.0


In [11]:
#1: Open, 2: High, 3: Low, 4: Close, 5: %Change, 6: Volume 7: # no of trades, 8: Turnover
#df_open = quandl.get(idx, column_index=1, start_date=start_date, end_date=end_date)
#df_open.to_csv(start_date + '_' + end_date + '_WSEopen.csv',index_label='Date')
#df_open.head(2)

In [12]:
df_open = pd.read_csv(start_date + '_' + end_date + '_WSEopen.csv', index_col='Date')
df_open.index = pd.to_datetime(df_open.index)
print(df_open.shape)
df_open.head(1)

(6444, 880)


Unnamed: 0_level_0,WSE/01NFI - Open,WSE/04PRO - Open,WSE/05VICT - Open,WSE/06MAGNA - Open,WSE/08OCTAVA - Open,WSE/10FOKSAL - Open,WSE/11BIT - Open,WSE/11_NFI - Open,WSE/13FORTUNA - Open,WSE/14ZACH - Open,...,WSE/ZETKAMA - Open,WSE/ZEW - Open,WSE/ZNTKLAPY - Open,WSE/ZPUE - Open,WSE/ZREMB - Open,WSE/ZREW - Open,WSE/ZTSERG - Open,WSE/ZUE - Open,WSE/ZURAWIE - Open,WSE/ZYWIEC - Open
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1991-12-31,,,,,,,,,,,...,,,,,,,,,,0.0


In [13]:
#From Advances in Financial Machine Learning by Marcos de Lopez
def getWeights_FFD(d,thres):
    w,k=[1.],1
    while True:
        w_=-w[-1]/k*(d-k+1)
        if abs(w_)<thres:break
        w.append(w_);k+=1
    return np.array(w[::-1]).reshape(-1,1)

#-------------------------------------------------------------------------------------------------------------------
def fracDiff_FFD(df_px,d,thres=1e-5):
    # Constant width window
    w = getWeights_FFD(d,thres)
    width = len(w)-1
    df={}
    for stock in df_px.columns:
        seriesF, df_=df_px[[stock]].fillna(method='ffill').dropna(), pd.Series()
        for iloc1 in range(width,seriesF.shape[0]):
            loc0, loc1 = seriesF.index[iloc1-width], seriesF.index[iloc1]
            df_[loc1]=np.dot(w.T, seriesF.loc[loc0:loc1])[0,0]
        df[stock]=df_.copy(deep=True)
    df=pd.concat(df,axis=1)
    return df

In [16]:
def gen_img(start_date=start_date, end_date=end_date, d=0.3, yr_jmp=1, df_px=df_px, df_vol=df_vol, df_open=df_open):
    ''' generate images from df_px, df_vol & df_open after converting them to ffd series
    
    Parameters
    ----------
    start_date: string
        start date of period
    
    end_date: string
        end date of period
        
    d: float
        fraction for differentiation
    
    yr_jmp: integer
        year window for calculating metrics
    
    df_px: dataframe
        contains price data for all stocks; columns are stock names, rows are timestamps
        
    df_vol: dataframe
        contains volume data for all stocks; columns are stock names, rows are timestamps

    df_open: dataframe
        contains open data for all stocks; columns are stock names, rows are timestamps
    
    Outputs
    -------
    Creates images and saves them in "/Users/jolenechua/Google Drive/fastai/cnn/cnn_input"
    
    '''
    #Convert from string to datetime
    end_date = datetime.strptime(end_date, '%d-%b-%Y')
    p_start = datetime.strptime(start_date, '%d-%b-%Y')
    
    min_max_scaler = preprocessing.MinMaxScaler()
    
    while end_date >= p_start + relativedelta(years=yr_jmp):
        p_end = p_start + relativedelta(years=yr_jmp)
        print(p_start)
        print(p_end)
        
        px = df_px[p_start:p_end]
        px.dropna(axis=1,inplace=True)
        px.dropna(axis=0,inplace=True)
        px.columns = [x[:10] for x in px.columns] # remove 'price' in title
        
        op = df_open[p_start:p_end]
        op.dropna(axis=1,inplace=True)
        op.dropna(axis=0,inplace=True)
        op.columns = [x[:10] for x in op.columns] # remove 'open' in title
        
        vol = df_vol[p_start:p_end]
        vol.dropna(axis=1,inplace=True)
        vol.dropna(axis=0,inplace=True)
        vol.columns = [x[:10] for x in vol.columns] # remove 'volume' in title
        
        ret = px.pct_change()
        ret.fillna(0,inplace=True)
        ret.replace(np.inf, 0, inplace=True)
        
        val_px = fracDiff_FFD(px, d, thres=1e-3)
        val_vol = fracDiff_FFD(vol, d, thres=1e-3)
        val_op = fracDiff_FFD(op, d, thres=1e-3)
        
        #making sure columns of each df are common
        com_cols = list(set(px.columns).intersection(vol.columns))
        px = px[com_cols].T
        vol = vol[com_cols].T
        op = op[com_cols].T
        ffd_p = val_px[com_cols].T
        ffd_v = val_vol[com_cols].T
        ffd_o = val_op[com_cols].T
        ret = ret[com_cols].T
        
        #differentiated series is shorter, making all tables have same length
        min_len = min(ffd_p.shape[1], ffd_v.shape[1], ffd_o.shape[1])
        px  = px.iloc[:,-min_len:]
        ret = ret.iloc[:,-min_len:]
        vol = vol.iloc[:,-min_len:]
        ffd_p = ffd_p.iloc[:,-min_len:]
        ffd_v = ffd_v.iloc[:,-min_len:]
        ffd_o = ffd_o.iloc[:,-min_len:]
                
        #create recurrence plots
        rp1 = gaf.fit_transform(ffd_o)
        rp2 = gaf.fit_transform(ffd_p)
        rp3 = gaf.fit_transform(ffd_v)
        
        %cd "/Users/jolenechua/Google Drive/fastai/cnn/cnn_input"
        start = p_start.strftime("%Y")
        end = p_end.strftime("%Y")
        for i in range(rp2.shape[0]):
            #values are not normalised after rp, which affects how it looks when converted to an image
            #https://stackoverflow.com/questions/26414913/normalize-columns-of-pandas-data-frame/48651066
            rp1[i] = min_max_scaler.fit_transform(rp1[i])
            rp2[i] = min_max_scaler.fit_transform(rp2[i])
            rp3[i] = min_max_scaler.fit_transform(rp3[i])
            arr = np.dstack((rp1[i], rp2[i], rp3[i]))
            
            matplotlib.image.imsave(start + '_' + end + '_' + px.index[i].replace('/','_') + '.png', arr)

        p_start = p_end + relativedelta(days=1)

In [17]:
start_date='31-Dec-2013'
end_date='31-Dec-2018'

In [18]:
gen_img(start_date=start_date, end_date=end_date, d=0.3, yr_jmp=1, df_px=df_px, df_vol=df_vol, df_open=df_open)

2013-12-31 00:00:00
2014-12-31 00:00:00


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).


/Users/jolenechua/Google Drive/fastai/cnn/cnn_input


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

2015-01-01 00:00:00
2016-01-01 00:00:00


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).


/Users/jolenechua/Google Drive/fastai/cnn/cnn_input


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

2016-01-02 00:00:00
2017-01-02 00:00:00


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).


/Users/jolenechua/Google Drive/fastai/cnn/cnn_input


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

2017-01-03 00:00:00
2018-01-03 00:00:00


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).


/Users/jolenechua/Google Drive/fastai/cnn/cnn_input


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

## 2. Getting next day returns
Get labels for time series - if >0, 1, if <= 0, 0

In [49]:
def gen_labels(start_date=start_date, end_date=end_date, yr_jmp=1, df_px=df_px, df_vol=df_vol):
    ''' generate labels for df_px and df_vol after converting prices to ffd series
    '''
    end_date = datetime.strptime(end_date, '%d-%b-%Y')
    p_start = datetime.strptime(start_date, '%d-%b-%Y')
    
    df = pd.DataFrame()
    
    while end_date >= p_start + relativedelta(years=yr_jmp):
        p_end = p_start +  relativedelta(years=yr_jmp)
        p_end_nxt = p_end +  relativedelta(days=1)
        print(p_start)
        print(p_end)
        px = df_px[p_start:p_end]
        px.dropna(axis=1,inplace=True)
        px.dropna(axis=0,inplace=True)
        px.columns = [x[:10] for x in px.columns] # remove volume in title
        vol = df_vol[p_start:p_end]
        vol.dropna(axis=1,inplace=True)
        vol.dropna(axis=0,inplace=True)
        vol.columns = [x[:10] for x in vol.columns] # remove price in title
        
        #get stock list
        com_cols = list(set(px.columns).intersection(vol.columns))
        
        label_history = quandl.get(com_cols, column_index=4, start_date=p_end, end_date=p_end_nxt)
        #sometimes the next day has no data for some reason, so adding one more day
        while label_history.shape[0] != 2:
            p_end_nxt = p_end_nxt + relativedelta(days=1)
            label_history = quandl.get(com_cols, column_index=4, start_date=p_end, end_date=p_end_nxt)
        returns = label_history.pct_change()
        returns.fillna(0,inplace=True)
        df_lab = returns.transpose()
        assert label_history.shape[0] == 2 
        df_lab['label'] = df_lab.iloc[:,1].apply(lambda x: 1 if x>0 else 0)
        labels=df_lab[['label']]
        
        start = p_start.strftime("%Y")
        end = p_end.strftime("%Y")
        new_labels =  [start + '_' + end + '_' + com_cols[x].replace('/','_') + '.png' for x in range(len(com_cols))]
        labels.index = new_labels
        
        df = pd.concat([df, labels])
        
        #Add 1 day and continue loop
        p_start = p_end + relativedelta(days=1)
    
    %cd "/Users/jolenechua/Google Drive/fastai/cnn"
    df.to_csv('WSE_labels.csv',index_label='name',header='label')

In [50]:
gen_labels(start_date=start_date, end_date=end_date, yr_jmp=1, df_px=df_px, df_vol=df_vol)

2013-12-31 00:00:00
2014-12-31 00:00:00
2015-01-01 00:00:00
2016-01-01 00:00:00
2016-01-02 00:00:00
2017-01-02 00:00:00
2017-01-03 00:00:00
2018-01-03 00:00:00
/Users/jolenechua/Google Drive/fastai/cnn


## 3. (Hacking) Aligning labels with images
I probably messed up somewhere, but the number images generated don't match exactly the number of labels. Hacking it for now.

In [19]:
%cd "/Users/jolenechua/Google Drive/fastai/cnn"
df = pd.read_csv('WSE_labels.csv', index_col='name')

/Users/jolenechua/Google Drive/fastai/cnn


In [20]:
%cd "/Users/jolenechua/Google Drive/fastai/cnn/cnn_input"
mylist = [f for f in glob.glob("*.png")]

/Users/jolenechua/Google Drive/fastai/cnn/cnn_input


In [21]:
label_ls = df.index
final = df.loc[mylist]
final.dropna(inplace=True)
final['label']= final['label'].astype(int)
%cd "/Users/jolenechua/Google Drive/fastai/cnn"
final.to_csv('WSE_labels.csv',index_label='name')

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  


/Users/jolenechua/Google Drive/fastai/cnn


## 4. Further development

1. Troubleshoot and find out #3
1. Tweak parameters of RP, GAF, MTF and test
1. Try other 