# Triple Barrier Method

This notebook will cover partial exercise answers:
* Exercise 3.1
* Exercise 3.2
* Exercise 3.3

As we go along, there will be some explanations. 

More importantly, this method can be applied not just within mean-reversion strategy but also other strategies as well. Most of the functions below can be found under research/Labels.

Contact: boyboi86@gmail.com

In [1]:
import numpy as np
import pandas as pd
from numpy.ma.core import cumsum

import research as rs
print(rs.__file__)


%matplotlib inline

p = print

#pls take note of version
#numpy 1.17.3
#pandas 1.0.3
#sklearn 0.21.3

dollar = pd.read_csv('./Sample_data/dollar_bars.txt',
                 sep=',', 
                 header=0, 
                 parse_dates = True, 
                 index_col=['date_time'])


Num of CPU core:  128
Machine info:  Linux-6.8.0-51-generic-x86_64-with-glibc2.39
Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0]
Numpy 2.1.3
Pandas 2.2.3
/home/mmx/PycharmProjects/AFML/research.py


In [2]:
d_vol = rs.vol(dollar['close'],  50)

In [3]:

events = rs.cs_filter(dollar['close'],
                    limit = d_vol.mean())

events

DatetimeIndex(['2015-01-02 07:07:35.156000', '2015-01-02 09:35:57.204000',
               '2015-01-02 12:59:42.176000', '2015-01-02 14:19:33.847000',
               '2015-01-02 14:33:39.311000', '2015-01-02 14:42:28.315000',
               '2015-01-02 14:51:59.300000', '2015-01-02 15:01:45.497000',
               '2015-01-02 15:14:31.569000', '2015-01-02 15:22:54.187000',
               ...
               '2016-12-30 20:57:19.151000', '2016-12-30 20:58:34.724000',
               '2016-12-30 20:59:16.663000', '2016-12-30 20:59:34.157000',
               '2016-12-30 20:59:50.345000', '2016-12-30 20:59:58.848000',
               '2016-12-30 21:00:00.352000', '2016-12-30 21:00:24.294000',
               '2016-12-30 21:03:03.027000', '2016-12-30 21:13:31.990000'],
              dtype='datetime64[ns]', length=22890, freq=None)

In [4]:
vb = rs.vert_barrier(data = dollar['close'],
                 events = events, 
                 period = 'days', 
                 freq = 1)

vb # Show some example output

2015-01-02 07:07:35.156   2015-01-04 23:20:12.567
2015-01-02 09:35:57.204   2015-01-04 23:20:12.567
2015-01-02 12:59:42.176   2015-01-04 23:20:12.567
2015-01-02 14:19:33.847   2015-01-04 23:20:12.567
2015-01-02 14:33:39.311   2015-01-04 23:20:12.567
                                    ...          
2016-12-29 19:50:32.702   2016-12-30 19:55:31.030
2016-12-29 20:43:20.886   2016-12-30 20:44:21.481
2016-12-29 20:56:54.013   2016-12-30 20:57:19.151
2016-12-29 21:00:00.349   2016-12-30 21:00:00.352
2016-12-29 21:13:14.022   2016-12-30 21:13:31.990
Name: date_time, Length: 22850, dtype: datetime64[ns]

In [5]:
tb = rs.tri_barrier(data = dollar['close'], 
                events = events, 
                trgt = d_vol, 
                min_req = 0.002, 
                num_threads = 3, 
                ptSl = [1,1], 
                t1 = vb, 
                side = None)

tb # Show some example

# the pandas obj will break the data up process it then stich it back into 1 piece again. (See below)
# this will only happen when you use pandas obj multiprocess func using num_threads > 1.

# if you scroll all the way to the bottom, that is your final dataframe output.



[                                             t1                      sl  \
2016-04-28 08:11:31.935 2016-04-29 10:02:20.933 2016-04-28 19:53:44.370   
2016-04-28 08:58:32.457 2016-04-29 10:02:20.933                     NaT   
2016-04-28 10:52:03.623 2016-04-29 11:47:49.541 2016-04-28 19:31:00.850   
2016-04-28 12:01:23.295 2016-04-29 12:37:34.150 2016-04-28 19:25:12.672   
2016-04-28 13:01:28.025 2016-04-29 13:28:30.173 2016-04-28 19:25:12.672   
...                                         ...                     ...   
2016-12-30 20:59:58.848                     NaT                     NaT   
2016-12-30 21:00:00.352                     NaT                     NaT   
2016-12-30 21:00:24.294                     NaT                     NaT   
2016-12-30 21:03:03.027                     NaT                     NaT   
2016-12-30 21:13:31.990                     NaT                     NaT   

                                             pt  
2016-04-28 08:11:31.935 2016-04-28 13:30:00.579 

2025-01-28 10:01:01.253272 100.0% _pt_sl_t1 done after 0.07 mins. Remaining 0.0 mins..


Unnamed: 0,t1,trgt
2015-01-05 14:54:26.286,2015-01-05 15:40:45.114,0.002244
2015-01-05 14:57:13.616,2015-01-05 15:40:45.114,0.002469
2015-01-05 15:01:57.494,2015-01-05 16:21:16.062,0.002787
2015-01-05 15:07:29.012,2015-01-05 15:40:45.114,0.002827
2015-01-05 15:13:09.655,2015-01-05 16:10:05.172,0.002882
...,...,...
2016-12-30 18:02:22.880,2016-12-30 19:55:31.030,0.002839
2016-12-30 18:36:03.267,2016-12-30 19:47:05.557,0.002786
2016-12-30 19:02:57.783,2016-12-30 19:55:31.030,0.002732
2016-12-30 19:55:31.030,2016-12-30 20:59:16.663,0.002775


In [6]:
m_label = rs.meta_label(data = dollar['close'],
                      events = tb,
                      drop = False)

m_label # Show some example

# previously when we run tri_bar func, NaT is present. However once func is passed to labels, these NaTs will be dropped.
# There is an in-built drop func that will trigger the below drop_label func as well.
# change drop = False to float value i.e. 0.05

Unnamed: 0,ret,bin
2015-01-05 14:54:26.286,-0.003448,-1.0
2015-01-05 14:57:13.616,-0.002957,-1.0
2015-01-05 15:01:57.494,-0.003701,-1.0
2015-01-05 15:07:29.012,-0.002957,-1.0
2015-01-05 15:13:09.655,-0.003451,-1.0
...,...,...
2016-12-30 18:02:22.880,-0.003242,-1.0
2016-12-30 18:36:03.267,-0.002904,-1.0
2016-12-30 19:02:57.783,-0.002908,-1.0
2016-12-30 19:55:31.030,0.003028,1.0


> AFML page 54 section 3.9
>
> "Some ML classifiers do not perform well when data samples are too imbalanced. 
>  In those circumstances, it is preferably to drop those rare labels and focus on more common outcomes."

In [7]:
m_label['bin'].value_counts()

# Here is a quick look at our 'bin' values.
# Apparently we have a rare label, bin = 0

 1.0    11343
-1.0    10784
 0.0       80
Name: bin, dtype: int64

In [8]:
m_label['bin'].value_counts(normalize = True)

# basically it's 0.003602 of all our metalabels. Max is 1

 1.0    0.510785
-1.0    0.485613
 0.0    0.003602
Name: bin, dtype: float64

In [9]:
drop_meta_label = rs.drop_label(events = m_label, 
                                min_pct = 0.05)

drop_meta_label # Show some example

# In the below case we dropped all bin = 0, while keeping only 1 & -1

Unnamed: 0,ret,bin
2015-01-05 14:54:26.286,-0.003448,-1.0
2015-01-05 14:57:13.616,-0.002957,-1.0
2015-01-05 15:01:57.494,-0.003701,-1.0
2015-01-05 15:07:29.012,-0.002957,-1.0
2015-01-05 15:13:09.655,-0.003451,-1.0
...,...,...
2016-12-30 18:02:22.880,-0.003242,-1.0
2016-12-30 18:36:03.267,-0.002904,-1.0
2016-12-30 19:02:57.783,-0.002908,-1.0
2016-12-30 19:55:31.030,0.003028,1.0
