In [9]:
from src.seanmod import *
from vega_datasets import data
alt.renderers.enable('default') #otherwise, the chart will not render in the browser.
from math import fabs
import math

RendererRegistry.enable('default')

For this simple notebook, we investigate simple candlestick patterns to see if investor advice holds true. This investigation has the possibility of being highly open-ended, and at risk of trying to objectify subjective claims that were never ment to be taken very strictly. Because of this, well defined, simple claims will only be tested. It is also useful to list confounding variables at this moment:

1) Time Window for Bar Generation: Depending on the window chosen to generate a bar, bars can look quite different as you zoom in and out of the dataset. We will describe this as a pseudo-anti-fractal nature for now.

2) Time Lengths of Signal Influence: It is assumed that believers of candlechart signals think that groupings of bars have meaning, and portend future events. A reasonable believer would probably not think that the signal correlates with its predicted event 100 percent of the time.

3) Market Whales, and Insider Trading groups: can throw off the signals - and we can't control for them unless an extreme swing event is witnessed.

Definition of a Reliable Signal: It has to work **more than 50% of the time.**

First, let us import our datasets, and take a look at them:
    

In [2]:
neoDF = pd.read_csv("./data/NEO.TO.csv")

In [3]:
neoDF.head(5)
neoDF.tail(5)
theChart = boxplotblast(neoDF)
theChart



Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2017-12-08,17.469999,17.6,16.99,17.6,15.745424,500200
1,2017-12-11,17.85,17.85,17.0,17.700001,15.83489,155000
2,2017-12-12,17.77,17.9,17.5,17.75,15.879621,67100
3,2017-12-13,17.75,17.799999,17.700001,17.700001,15.83489,55300
4,2017-12-14,17.700001,17.709999,17.15,17.6,15.745424,40300


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
707,2020-10-05,10.8,10.825,10.78,10.79,10.79,10400
708,2020-10-06,10.99,10.99,10.8,10.89,10.89,39500
709,2020-10-07,10.99,11.0,10.73,10.85,10.85,13600
710,2020-10-08,10.86,10.935,10.81,10.89,10.89,8400
711,2020-10-09,10.59,10.85,10.59,10.85,10.85,3200


Guide: Blue lines indicate max/min value. Red Lines indicate cutoff of outliers.


Boxplot blast did not work. Why? OK, so not implemented. Can we get altair to even work? Do a sample plot:

In [4]:
alt.renderers.enable('default')
cars = pd.read_csv("./data/mtcars.csv")
cars.head(5)

chart = alt.Chart(cars).mark_point().encode(
    x='hp:Q',
    y='mpg:Q',
    color='cyl:N',
)

chart

RendererRegistry.enable('default')

Unnamed: 0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


So Vega and Altair are working correctly. Good.

In [5]:
#Load all the DFs:
neoDF = pd.read_csv("./data/NEO.TO.csv",usecols=[0,1,2,3,4]) #Rare earth metals
neoDF.rename(columns={'Date': 'date',"Open":"open","High":"high",'Low':'low',"Close":"close"},inplace=True)
amznDF = pd.read_csv("./data/AMZN.csv",usecols=[0,1,2,3,4]) #Amazon
amznDF.rename(columns={'Date': 'date',"Open":"open","High":"high",'Low':'low',"Close":"close"},inplace=True)
aspsDF = pd.read_csv("./data/ASPS.csv",usecols=[0,1,2,3,4]) #Forclosure and Property company
aspsDF.rename(columns={'Date': 'date',"Open":"open","High":"high",'Low':'low',"Close":"close"},inplace=True)
klDF = pd.read_csv("./data/KL.TO.csv",usecols=[0,1,2,3,4]) #Kirkland Lake Gold
klDF.rename(columns={'Date': 'date',"Open":"open","High":"high",'Low':'low',"Close":"close"},inplace=True)
neoDF.head()

Unnamed: 0,date,open,high,low,close
0,2017-12-08,17.469999,17.6,16.99,17.6
1,2017-12-11,17.85,17.85,17.0,17.700001
2,2017-12-12,17.77,17.9,17.5,17.75
3,2017-12-13,17.75,17.799999,17.700001,17.700001
4,2017-12-14,17.700001,17.709999,17.15,17.6


In [6]:
#Let's make an example of a bar chart, using a subset of data.
#We have boiled down the DF to the bare minimum. Now we can make a function with this code, and start exploring
#our dataframes.

source = data.ohlc() #open high low close columns are subsetted.
source.drop(['signal','ret'],axis = 1, inplace=True)
source.head()
open_close_color = alt.condition("datum.open <= datum.close",
                                 alt.value("#FF8C00"),
                                 alt.value("#6666FF"))

source.dtypes

blueColor = "#99cfff"
backgroundColor = '#000022'

base = alt.Chart(source).encode(
    alt.X('date:T',
          axis=alt.Axis(
              format='%m/%d',
              labelAngle=-45,
              title='Timeline',
              gridColor=blueColor,
              labelColor=blueColor,
              tickColor=blueColor,
              titleColor=blueColor
          )
    ),
    color=open_close_color
)

rule = base.mark_rule().encode(
    alt.Y(
        'low:Q',
        axis=alt.Axis(
              title='Price',
              gridColor=blueColor,
              labelColor=blueColor,
              tickColor=blueColor,
              titleColor=blueColor
          ),
              scale=alt.Scale(zero=False),
        
    ),
    alt.Y2('high:Q')
)

bar = base.mark_bar().encode(
    alt.Y('open:Q'),
    alt.Y2('close:Q')
)

(rule + bar).configure(background=backgroundColor)

Unnamed: 0,date,open,high,low,close
0,2009-06-01,28.7,30.05,28.45,30.04
1,2009-06-02,30.04,30.13,28.3,29.63
2,2009-06-03,29.62,31.79,29.62,31.02
3,2009-06-04,31.02,31.02,29.92,30.18
4,2009-06-05,29.39,30.81,28.85,29.62


date     datetime64[ns]
open            float64
high            float64
low             float64
close           float64
dtype: object

In [10]:
#Generating our general barchart functionL

#Signature: [Formatted] DF, tCenter, tWindow, daySumWidth -> [Altair Chart]
#Purpose: Given chart data, a time window to form bars, an Altair Candlestick chart is generated for inspections.
#Notes: This function expects a header formatted as ['date','open','close','high','low'] 
#Note: I use object dates...and somehow they get interpreted by Altair. Magic is in there somewhere, but this
#isnt the time to investigate.
#
def candlechartview(tickerDF): # tCenter, tWindow, daySumWidth):
    #PreDefinitions:
    blueColor = "#99cfff"
    backgroundColor = '#000022'
    midpoint = math.floor(tickerDF.shape[0]/2)

    #lets check the data integrity, and format accordingly:
    #tests:
    #DF not of zero length or columns.
    if (tickerDF.shape[0] < 1) or (tickerDF.shape[1] < 1):
        raise ValueError("ERROR: One or more Data Frame Dimensions is zero.")
    
    #first column is a valid date option...I don't check this (just needs to be a str obj)
    
    #last four columns are real numbers
    if ((tickerDF.dtypes[1] != 'float64') or (tickerDF.dtypes[2] != 'float64') or (tickerDF.dtypes[3] != 'float64') or (tickerDF.dtypes[4] != 'float64')):
        raise ValueError("ERROR: Columns 1 to 4 are not of float64 type")
        
    #dates are in monotonic sequence (increasing)...check later. For now we know they are as data is clean.
    
    #Tests have passed. Now lets generate a temp dataframe, where we consolidate based
    #time window
    
    
    #we are now ready to generate our plot, and output. Code credits to the Altair Team:
    #https://altair-viz.github.io/gallery/candlestick_chart.html?highlight=candlestick
    
    source = tickerDF
    base = alt.Chart(source).encode(
        alt.X('date:T',
              axis=alt.Axis(
                  format='%m/%d',
                  labelAngle=-45,
                  title='Timeline',
                  gridColor=blueColor,
                  labelColor=blueColor,
                  tickColor=blueColor,
                  titleColor=blueColor
              )
        ),
        color=open_close_color
    ).properties(height=400, width=400)
        
    rule = base.mark_rule().encode(
        alt.Y(
            'low:Q',
            axis=alt.Axis(
                  title='Price',
                  gridColor=blueColor,
                  labelColor=blueColor,
                  tickColor=blueColor,
                  titleColor=blueColor
              ),
                  scale=alt.Scale(zero=False),

        ),
        alt.Y2('high:Q')
    )

    bar = base.mark_bar(width=10).encode(
        alt.Y('open:Q'),
        alt.Y2('close:Q')
    )
    
    #make a dataframe on the spot
    #redLine = alt.Chart(pd.DataFrame({'x': [source.at[midpoint,'date']]})).mark_rule().encode(
     #   x='x')
  
    
    return (rule + bar).configure(background=backgroundColor)
    
    
candlechartview(aspsDF.head(1000))

In [None]:
neoDF.shape[0]
neoDF.head()
neoDF.dtypes[1] == 'float64'

Now is the time to start investigating data frames for events, that we will inspect. One particular single Candlebar of interest is the the long tailed Doji. This has a very close open and close, and swings to high and low extremes during the day. Let us identify one of these Dojis with the following critierion:

Let $O, C, H, L$ represent the Open, Close, High and Low prices, respectively. Then our criterion is:

$$ \vert \vert O - C \vert \vert < \epsilon_{1}$$  AND

$$ \vert \vert O - H \vert \vert < 0.05 \times O $$ AND

$$ \vert \vert O - L \vert \vert < 0.05 \times O $$ AND

$$ \vert \vert O - H \vert - \vert \vert O - L \vert \vert < \epsilon_{2} $$

Where the $\epsilon$ 's are hand selected to get the right number of examples in a dataset. Let us make a test function, and then get a truth column by applying our complex test to the dataframe in question. We will then gather up all the "True" indicies from the vector, and this will tell us 

In [11]:
#It looks like we need to use applymap to get this to work. Lets do a simple test:
dojiSer = (neoDF.open - neoDF.close).map(math.fabs) 
hiSer = (neoDF.open - neoDF.high).map(math.fabs)
loSer = (neoDF.open - neoDF.low).map(math.fabs)
ep1 = 0.005
ep2 = 0.0005
constraintDF = neoDF[(dojiSer < ep1) & (hiSer > neoDF.open*ep2) & (loSer > neoDF.open*ep2)]

#I ignore the fourth condition for now...too many fine constraints leads to unstable parameterization.



So lets use example 187 in order to figure out what is going on. We will pick a window around the given data point, and assume the day binning is just one for simplicity.

Learning Point: To access individual DF cells, use: constraintDF.at[187,'date'].


In [12]:
tWindow = 15
testDF = neoDF.iloc[(386-tWindow):(386+tWindow),:]
hold = candlechartview(testDF)
hold

In [13]:
#Do interactive cells for Vega work? import altair as alt
from vega_datasets import data

source = data.cars()

alt.Chart(source).mark_circle().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
).interactive()