This notebook is a - kind of - continuation from: inside_bar_explore.ipynb

In [1]:
import pandas as pd
import plotly.graph_objects as go
import utils
import datetime as dt
from dateutil.parser import *

In [2]:
df_trades = pd.read_csv("./Data/Pairs/USD_JPY_H4_trades.csv")

In [3]:
pair = "USD_JPY"
granularity = "M5"

In [4]:
df_raw = pd.read_csv(utils.get_hist_data_filename(pair, granularity))
df_raw.shape

(223553, 15)

In [5]:
non_nums = ['ticker', 'time', 'volume']
num_cols = [x for x in df_raw.columns if x not in non_nums]
df_raw[num_cols] = df_raw[num_cols].apply(pd.to_numeric)

In [6]:
df_trades["time"] = [parse(x) for x in df_trades.time]
df_raw["time"] = [parse(x) for x in df_raw.time]

We need to know the time of the next trade (shift), and the time of the last 5 minute trade in the 4 hour window (because the "H4" granularity data is what we're digging into).  So, that's why we're adding 03:55 to the start time of the 4 hour window.

In [7]:
df_trades["next"] =  df_trades["time"].shift(-1)
df_trades["trade_end"] = df_trades.next + dt.timedelta(hours=3, minutes=55)
df_trades["trade_start"] = df_trades.time + dt.timedelta(hours=4)
df_trades[["time", "next", "trade_end", "trade_start"]].head()

Unnamed: 0,time,next,trade_end,trade_start
0,2020-01-05 22:00:00+00:00,2020-01-06 22:00:00+00:00,2020-01-07 01:55:00+00:00,2020-01-06 02:00:00+00:00
1,2020-01-06 22:00:00+00:00,2020-01-07 06:00:00+00:00,2020-01-07 09:55:00+00:00,2020-01-07 02:00:00+00:00
2,2020-01-07 06:00:00+00:00,2020-01-09 18:00:00+00:00,2020-01-09 21:55:00+00:00,2020-01-07 10:00:00+00:00
3,2020-01-09 18:00:00+00:00,2020-01-16 10:00:00+00:00,2020-01-16 13:55:00+00:00,2020-01-09 22:00:00+00:00
4,2020-01-16 10:00:00+00:00,2020-01-20 14:00:00+00:00,2020-01-20 17:55:00+00:00,2020-01-16 14:00:00+00:00


In [8]:
print(f"Debug, print df_trades info:  {df_trades.info()}")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 401 entries, 0 to 400
Data columns (total 31 columns):
 #   Column          Non-Null Count  Dtype                  
---  ------          --------------  -----                  
 0   Unnamed: 0.1    401 non-null    int64                  
 1   Unnamed: 0      401 non-null    int64                  
 2   time            401 non-null    datetime64[ns, tzutc()]
 3   volume          401 non-null    int64                  
 4   bid_o           401 non-null    float64                
 5   bid_h           401 non-null    float64                
 6   bid_l           401 non-null    float64                
 7   bid_c           401 non-null    float64                
 8   mid_o           401 non-null    float64                
 9   mid_h           401 non-null    float64                
 10  mid_l           401 non-null    float64                
 11  mid_c           401 non-null    float64                
 12  ask_o           401 non-null    floa

In [9]:
df_trades.dropna(inplace=True)
df_trades.reset_index(drop=True, inplace=True)

In [10]:
def signal_text(signal):
    if signal == 1:
        return "BUY"
    elif signal == -1:
        return "SELL"
    else:
        return "NONE"
    


#### triggered function:  Receive Buy/Sell direction, get current price from a finer granularity against higher granularity entry(signal) price.

Simplified example:
* We have a Entry Price of 1.00, identified in a 4 hour candle
* We meet 12 5 minute candles at < 1.00
* On candle 13 (an hour in) we get a price of 2.99
* We buy at 2.99, but our take profit is 3.00 - so this is a worthless trade.
* Alternately, candle 13 had a price of 1.01.  If we hit our take profit, we done well. 

In [11]:
def triggered(direction, current_price, signal_price):
    if direction == 1 and current_price > signal_price:
        return True
    elif direction == -1 and current_price < signal_price:
        return True
    else:
        return False

In [12]:
def end_hit_calc(direction, SL, price, start_price):
    """  Return fractional impact accounting for Entry/Stoploss swing.  """
    delta = price - start_price
    full_delta = start_price - SL
    fraction = abs(delta / full_delta)

    if direction == 1 and price >= start_price:
        return fraction
    elif direction == 1 and price <= start_price:
        return -fraction
    elif direction == -1 and price <= start_price:
        return fraction
    elif direction == -1 and price >= start_price:
        return -fraction
    
    print("Error:  end_hit_calc should return something!!")

In [13]:
def process_buy(TP, SL, ask_prices, bid_prices, entry_price):
    for index, price in enumerate(ask_prices):
        if triggered(1, price, entry_price) == True:
            for live_price in bid_prices[index:]:    
                if live_price >= TP:
                    return 2.0
                elif live_price <= SL:
                    return -1.0
            return end_hit_calc(1, SL, live_price, entry_price)
    return 0.0 

def process_sell(TP, SL, ask_prices, bid_prices, entry_price):
    for index, price in enumerate(bid_prices):
        if triggered(-1, price, entry_price) == True:
            for live_price in ask_prices[index:]:    
                if live_price <= TP:
                    return 2.0
                elif live_price >= SL:
                    return -1.0
            return end_hit_calc(-1, SL, live_price, entry_price)   
    return 0.0

We've used iterrows to loop through DataFrames until now, but this can be slow.  The process_m5 function presents a faster alternative.

Note:  in a later iteration of re-engineering (to factor in spread) we make this routine redundant.  I've commented out, because I still think it's useful reference.

In [14]:
# def process_m5(m5_df, row):
#     """  Process 5 minute (M5) candle data.  """
#     result = 0.0
#     for index, price in enumerate(m5_df.mid_c.values):
#         if triggered(row.SIGNAL, price, row.ENTRY) == True:
#             # print(f"   {signal_text(row.SIGNAL)} Signal at (index {index}) {price: 2f} from Entry: {row.ENTRY: 2f} ") 
#             result = process_trade(index, row.SIGNAL, row.TAKEPROFIT, row.STOPLOSS, m5_df.mid_c.values, row.ENTRY)          
#             break
    
#     return  result


### Important 

Remember at this this point in this notebook:  
1. df_trades is built on h4 granularity (4 hour candles)
2. df_raw is built on m5 granularity (5 minute candles)

This is really obvious while working through a tutorial, but consider putting granularity in DF naming in future.

In [15]:
total = 0
for index, row in df_trades.iterrows():
    m5_data = df_raw[(df_raw.time >= row.trade_start) & (df_raw.time <= row.trade_end)]
    if row.SIGNAL == 1:
        r = process_buy(row.TAKEPROFIT, row.STOPLOSS, m5_data.ask_c.values, m5_data.bid_c.values, row.ENTRY)
        total += r
    elif row.SIGNAL == -1:
        r = process_sell(row.TAKEPROFIT, row.STOPLOSS, m5_data.ask_c.values, m5_data.bid_c.values, row.ENTRY)
        total += r            
    # if index > 10:
    #     break
print(f"Aggregated total on 2 vs -1 basis, now factoring bid/ask spread (end_hit_calc):  {total}")

Aggregated total on 2 vs -1 basis, now factoring bid/ask spread (end_hit_calc):  110.22222222222229


In [16]:
m5_data.shape

(384, 15)