## Setup and Dependencies

In [1]:
# --- Extension Setup ---
%load_ext line_profiler

# --- Module Imports ---
import sys
sys.path.append("..")  # Adjust if your afml repo is nested differently

# --- Environment Diagnostics ---
from pathlib import Path
print(f"Working Dir: {Path.cwd()}")


Working Dir: c:\Users\JoeN\Documents\GitHub\Machine-Learning-Blueprint\notebooks


In [2]:
import time
import re
import warnings
import winsound
from pathlib import Path
from pprint import pprint
from tqdm import tqdm

import matplotlib.pyplot as plt
from sklearn.base import clone
from sklearn.ensemble import (
    BaggingClassifier,
    RandomForestClassifier,
)
from sklearn.metrics import (
    accuracy_score,
    f1_score,
    log_loss,
    precision_score,
    recall_score,
)
from sklearn.tree import DecisionTreeClassifier

from afml.cross_validation import (
    PurgedKFold,
    PurgedSplit,
    analyze_cross_val_scores,
    probability_weighted_accuracy,
    analyze_cross_val_scores,
)
from afml.data_structures.bars import *
from afml.ensemble.sb_bagging import (
    SequentiallyBootstrappedBaggingClassifier,
    compute_custom_oob_metrics,
    estimate_ensemble_size,
)
from afml.labeling.triple_barrier import (
    add_vertical_barrier,
    get_event_weights,
    triple_barrier_labels,
)
from afml.sample_weights.optimized_attribution import (
    get_weights_by_time_decay_optimized,
)

# from afml.sampling import get_ind_mat_average_uniqueness, get_ind_matrix, seq_bootstrap
from afml.strategies import (
    BollingerStrategy,
    MACrossoverStrategy,
    create_bollinger_features,
    get_entries,
    ForexFeatureEngine,
)
from afml.util import get_daily_vol, value_counts_data
from afml.cache import get_cache_efficiency_report, print_cache_health, clear_afml_cache, clear_cv_cache

warnings.filterwarnings("ignore")
# plt.style.use("seaborn-v0_8-whitegrid")
plt.style.use("dark_background")

[32m2025-11-09 00:28:33.968[0m | [34m[1mDEBUG   [0m | [36mafml.cache[0m:[36m<module>[0m:[36m619[0m - [34m[1mEnhanced cache features available:[0m
[32m2025-11-09 00:28:33.970[0m | [34m[1mDEBUG   [0m | [36mafml.cache[0m:[36m<module>[0m:[36m620[0m - [34m[1m  - Robust cache keys for NumPy/Pandas[0m
[32m2025-11-09 00:28:33.972[0m | [34m[1mDEBUG   [0m | [36mafml.cache[0m:[36m<module>[0m:[36m621[0m - [34m[1m  - MLflow integration: ✓[0m
[32m2025-11-09 00:28:33.975[0m | [34m[1mDEBUG   [0m | [36mafml.cache[0m:[36m<module>[0m:[36m622[0m - [34m[1m  - Backtest caching: ✓[0m
[32m2025-11-09 00:28:33.979[0m | [34m[1mDEBUG   [0m | [36mafml.cache[0m:[36m<module>[0m:[36m623[0m - [34m[1m  - Cache monitoring: ✓[0m
[32m2025-11-09 00:28:34.661[0m | [34m[1mDEBUG   [0m | [36mafml.cache[0m:[36m_configure_numba[0m:[36m59[0m - [34m[1mNumba cache configured: C:\Users\JoeN\AppData\Local\afml\afml\Cache\numba_cache[0m
[32m2025-11-09

In [3]:
# clear_afml_cache()
# clear_cv_cache()

In [4]:
# Check cache health anytime
print_cache_health()

# Find functions with low hit rates or high call counts
df = get_cache_efficiency_report()
try:
    pprint(df.sort_values('calls', ascending=False).head(10))
except:
    pass


CACHE HEALTH REPORT

Overall Statistics:
  Total Functions:     11
  Total Calls:         193
  Overall Hit Rate:    45.6%
  Total Cache Size:    0.00 MB

Top Performers (by hit rate):
  1. train_rf: 78.0% (50 calls)
  2. analyze_cross_val_scores: 60.9% (64 calls)
  3. triple_barrier_labels: 30.8% (13 calls)
  4. add_vertical_barrier: 26.7% (15 calls)
  5. create_bollinger_features: 25.0% (4 calls)

Worst Performers (by hit rate):
  1. get_event_weights: 0.0% (25 calls)
  2. trend_scanning_labels: 0.0% (2 calls)
  3. get_bins: 0.0% (6 calls)
  4. drop_labels: 0.0% (6 calls)
  5. calculate_all_features: 0.0% (2 calls)

Recommendations:
  1. Overall hit rate is low (<50%). Consider reviewing cache key generation or function parameter patterns.
  2. Functions with low hit rate: get_event_weights. Review cache key generation for these functions.


                                            function  calls  hits  misses  \
4  afml.cross_validation.cross_validation.analyze...     64    39 

## 1. Data Preparation

In [5]:
symbol = "EURUSD"
start_date, end_date = "2018-01-01", "2024-12-31"
sample_start, sample_end = start_date, "2023-12-31"

## 2. Bollinger Band Strategy

In [6]:
bb_timeframe = "M5"
file = Path(fr"..\data\EURUSD_{bb_timeframe}_time_2018-01-01-2024-12-31.parq")
bb_time_bars = pd.read_parquet(file)

In [7]:
bb_period, bb_std = 20, 2 # Bollinger Band parameters
bb_strategy = BollingerStrategy(window=bb_period, num_std=bb_std)
bb_lookback = 10
bb_pt_barrier, bb_sl_barrier, bb_time_horizon = (1, 2, dict(days=1))
min_ret = 5e-5
bb_vol_multiplier = 1

### Time-Bars

In [8]:
bb_side = bb_strategy.generate_signals(bb_time_bars)
bb_df = bb_time_bars.loc[sample_start : sample_end]

print(f"{bb_strategy.get_strategy_name()} Signals:")
value_counts_data(bb_side.reindex(bb_df.index), verbose=True)

# Volatility target for barriers
vol_lookback = 100
vol_target = get_daily_vol(bb_df.close, vol_lookback) * bb_vol_multiplier
close = bb_df.close
_, t_events = get_entries(bb_strategy, bb_df, filter_threshold=vol_target)

vertical_barriers = add_vertical_barrier(t_events, close, **bb_time_horizon)

Bollinger_w20_std2 Signals:

        count  proportion
side                     
 0    373,536    0.842213
-1     35,095    0.079129
 1     34,886    0.078658



[32m2025-11-09 00:28:44.708[0m | [1mINFO    [0m | [36mafml.filters.filters[0m:[36mcusum_filter[0m:[36m151[0m - [1m19,458 CUSUM-filtered events[0m
[32m2025-11-09 00:28:44.790[0m | [1mINFO    [0m | [36mafml.strategies.signal_processing[0m:[36mget_entries[0m:[36m105[0m - [1mBollinger_w20_std2 | 10,384 (14.84%) trade events selected by CUSUM filter using series.[0m


#### Feature Engineering

In [9]:
bb_feat = create_bollinger_features(bb_time_bars, bb_period, bb_std)
bb_feat_time = bb_feat.copy()
bb_feat_time.info()
# not_stationary = is_stationary(bb_feat_time)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 516825 entries, 2018-01-02 23:20:00 to 2024-12-31 00:00:00
Data columns (total 59 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   spread               516825 non-null  float32
 1   vol                  516825 non-null  float32
 2   h1_vol               516825 non-null  float32
 3   h4_vol               516825 non-null  float32
 4   d1_vol               516825 non-null  float32
 5   ret                  516825 non-null  float32
 6   ret_5                516825 non-null  float32
 7   ret_10               516825 non-null  float32
 8   ret_1_lag_1          516825 non-null  float32
 9   ret_5_lag_1          516825 non-null  float32
 10  ret_10_lag_1         516825 non-null  float32
 11  ret_1_lag_2          516825 non-null  float32
 12  ret_5_lag_2          516825 non-null  float32
 13  ret_10_lag_2         516825 non-null  float32
 14  ret_1_lag_3          516825 non-nu

#### Triple-Barrier Method

In [10]:
bb_events_tb = triple_barrier_labels(
    close,
    vol_target,
    t_events,
    pt_sl=[bb_pt_barrier, bb_sl_barrier],
    min_ret=min_ret,
    vertical_barrier_times=vertical_barriers,
    side_prediction=bb_side,
    vertical_barrier_zero=True,
    verbose=False,
)

bb_events_tb_time = bb_events_tb.copy()
# bb_events_tb_time_meta = bb_events_tb.copy()
print(f"Triple-Barrier (pt={bb_pt_barrier}, sl={bb_sl_barrier}, h={bb_time_horizon}):")
value_counts_data(bb_events_tb['bin'], verbose=True)

weights = get_event_weights(bb_events_tb, close)
av_uniqueness = weights['tW'].mean()
print(f"Average Uniqueness: {av_uniqueness:.4f}")

Triple-Barrier (pt=1, sl=2, h={'days': 1}):

     count  proportion
bin                   
1    6,506    0.626601
0    3,877    0.373399

Average Uniqueness: 0.5488


#### CV of Weighting Methods

In [11]:
from os import cpu_count

# Reserve 1 CPU if you want to do something else during training, otherwise set to -1
N_JOBS = cpu_count() - 1
N_ESTIMATORS = 100
seed = 7
min_w_leaf = 0.05
max_depth = 4
n_splits = 3
pct_embargo = 0.01
test_size = 0.2

In [12]:
cont = bb_events_tb_time.copy()
X = bb_feat_time.reindex(cont.index)
y = cont["bin"]
t1 = cont["t1"]

train, test = PurgedSplit(t1, test_size).split(X)
X_train, X_test, y_train, y_test = (
        X.iloc[train],
        X.iloc[test],
        y.iloc[train],
        y.iloc[test],
    )

cont_train = cont.iloc[train]
cont_train = get_event_weights(cont_train, bb_df.close)
bb_cont_train = cont_train.copy()

cv_gen = PurgedKFold(n_splits, cont_train["t1"], pct_embargo)

In [13]:
avg_u = cont_train.tW.mean()
print(f"Average Uniqueness in Training Set: {avg_u:.4f}")

weighting_schemes = {
    "unweighted": pd.Series(1., index=cont_train.index),
    "uniqueness": cont_train["tW"],
    "return": cont_train["w"],
    }

decay_factors = [0.0, 0.25, 0.5, 0.75]
time_decay_weights = {}
for time_decay in decay_factors:
    decay_w = get_weights_by_time_decay_optimized(
                triple_barrier_events=cont_train,
                close_index=close.index,
                last_weight=time_decay,
                linear=True,
                av_uniqueness=cont_train["tW"],
            )
    time_decay_weights[f"decay_{time_decay}"] = decay_w
        
weighting_schemes.keys()

Average Uniqueness in Training Set: 0.5473


dict_keys(['unweighted', 'uniqueness', 'return'])

##### Selection of Best Model

In [14]:
from sklearn.ensemble import RandomForestClassifier


# Initialize Random Forest

clf = RandomForestClassifier(
    criterion='entropy',
    n_estimators=N_ESTIMATORS,
    class_weight="balanced_subsample",
    max_samples=avg_u,
    min_weight_fraction_leaf=min_w_leaf,
    max_depth=max_depth,
    random_state=seed,
    n_jobs=N_JOBS,  # Use all available cores
    )



- Analyze all CV scores for all weighting schemes to find the best scheme

In [15]:
all_cv_scores_df = pd.DataFrame()
all_cv_scores_d = {}
all_cms = {}
best_score, best_scheme = None, None

if set(y_train.values) == {0, 1}:
    scoring = "f1"  # f1 for meta-labeling
else:
    scoring = "neg_log_loss"  # symmetric towards all cases

for scheme, w in tqdm(weighting_schemes.items()):
    cv_scores, cv_scores_df, cms = analyze_cross_val_scores(
        clf, X_train, y_train, cv_gen, 
        sample_weight_train=w, 
        sample_weight_score=w,
    )
    all_cms[scheme] = cms
    all_cv_scores_d[scheme] = cv_scores
    score = cv_scores[scoring].mean()
    recall = cv_scores_df.loc["recall", "mean"]
    recall_std = cv_scores_df.loc["recall", "std"]
    for idx, row in cv_scores_df.iterrows():
        all_cv_scores_df.loc[idx, scheme] = f"{row['mean']:.4f} ± {row['std']:.4f}"
    if scoring == "f1" and (np.allclose([recall + recall_std], [1.0], 0.025) or np.allclose([recall - recall_std], [0.0], 0.025)):
        print(f"Recall score ({all_cv_scores_df.loc['recall', scheme]}) collapses for {scheme} weighting scheme")
        continue
    best_score = max(best_score, score) if best_score is not None else score
    if score == best_score:
        best_scheme = scheme

print(f"{best_scheme.title()} is the best weighting scheme with {scoring} = {best_score:.4f}")
print("\nWeighting Scheme CV:")
all_cv_scores_df

  0%|          | 0/3 [00:00<?, ?it/s][32m2025-11-09 00:28:47.005[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
[32m2025-11-09 00:28:47.027[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m


[32m2025-11-09 00:28:47.075[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
100%|██████████| 3/3 [00:00<00:00, 32.97it/s]

Recall score (0.0000 ± 0.0000) collapses for return weighting scheme
Unweighted is the best weighting scheme with f1 = 0.5982

Weighting Scheme CV:





Unnamed: 0,unweighted,uniqueness,return
accuracy,0.5253 ± 0.0134,0.5112 ± 0.0042,0.6243 ± 0.0036
pwa,0.5420 ± 0.0180,0.5113 ± 0.0091,0.6287 ± 0.0037
neg_log_loss,-0.6911 ± 0.0017,-0.6949 ± 0.0016,-0.6841 ± 0.0035
precision,0.6361 ± 0.0166,0.6357 ± 0.0149,0.0000 ± 0.0000
recall,0.5652 ± 0.0250,0.4930 ± 0.0443,0.0000 ± 0.0000
f1,0.5982 ± 0.0182,0.5537 ± 0.0221,0.0000 ± 0.0000


- Test if time-decay improves performance of best model

In [16]:
best_model_decay_cv_scores = pd.DataFrame()

for scheme, decay_factor in tqdm(time_decay_weights.items()):
    best_scheme_o = best_scheme.split("_decay")[0]
    sample_weight = weighting_schemes[best_scheme_o] * decay_factor
    cv_scores, cv_scores_df, cms = analyze_cross_val_scores(
        clf, X_train, y_train, cv_gen, 
        sample_weight_train=sample_weight, 
        sample_weight_score=sample_weight,
    )
    score = cv_scores[scoring].mean()
    best_score = max(best_score, score) if best_score is not None else score
    scheme = f"{best_scheme_o}_{scheme}"
    all_cv_scores_d[scheme] = cv_scores
    all_cms[scheme] = cms
    for idx, row in cv_scores_df.iterrows():
        best_model_decay_cv_scores.loc[idx, scheme] = f"{row['mean']:.4f} ± {row['std']:.4f}"
    if score == best_score:
        best_scheme = scheme
        weighting_schemes[best_scheme] = sample_weight
    all_cv_scores_df[scheme] = best_model_decay_cv_scores[scheme]
best_model_decay_cv_scores[f"{best_scheme_o}_decay_1.0"] = all_cv_scores_df[best_scheme_o]
        
print(f"\n{best_scheme.title()} model achieved the best {scoring} score of {best_score:.4f}")
best_model_decay_cv_scores

  0%|          | 0/4 [00:00<?, ?it/s][32m2025-11-09 00:28:47.290[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
[32m2025-11-09 00:28:47.313[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
[32m2025-11-09 00:28:47.336[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
[32m2025-11-09 00:28:47.363[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
100%|██████████| 4/4 [00:00<00:00, 39.99it/s]


Unweighted_Decay_0.5 model achieved the best f1 score of 0.6017





Unnamed: 0,unweighted_decay_0.0,unweighted_decay_0.25,unweighted_decay_0.5,unweighted_decay_0.75,unweighted_decay_1.0
accuracy,0.5241 ± 0.0270,0.5269 ± 0.0169,0.5301 ± 0.0118,0.5277 ± 0.0187,0.5253 ± 0.0134
pwa,0.5347 ± 0.0340,0.5370 ± 0.0249,0.5404 ± 0.0196,0.5399 ± 0.0182,0.5420 ± 0.0180
neg_log_loss,-0.6920 ± 0.0031,-0.6917 ± 0.0024,-0.6913 ± 0.0018,-0.6914 ± 0.0017,-0.6911 ± 0.0017
precision,0.6336 ± 0.0251,0.6378 ± 0.0176,0.6411 ± 0.0160,0.6374 ± 0.0178,0.6361 ± 0.0166
recall,0.5729 ± 0.0651,0.5667 ± 0.0464,0.5687 ± 0.0413,0.5691 ± 0.0410,0.5652 ± 0.0250
f1,0.6002 ± 0.0435,0.5992 ± 0.0297,0.6017 ± 0.0238,0.6007 ± 0.0279,0.5982 ± 0.0182


##### Sequential Bootstrap

In [17]:
# Random Forest default of max_features is sqrt, 
# which means I don't have to calculate it.
base_rf = clone(clf).set_params(
    n_estimators=1,
    bootstrap=False,
    n_jobs=None,
    max_samples=None,
    random_state=None,
    )

seq_rf = SequentiallyBootstrappedBaggingClassifier(
    samples_info_sets=cont_train.t1,
    price_bars_index=bb_df.index,
    estimator=base_rf,
    n_estimators=N_ESTIMATORS, # set low to save time
    # max_samples=avg_u, # Set to average uniqueness
    # bootstrap_features=False,
    oob_score=True,
    n_jobs=N_JOBS,
    random_state=seed,
    verbose=False,
)
seq_rf

In [18]:
from afml.cache.cv_cache import cv_cacheable


@cv_cacheable
def train_rf(classifier, X, y, sample_weight=None):
    time0 = time.time()
    clf = clone(classifier).set_params(oob_score=True).fit(X, y, sample_weight)
    time1 = pd.Timedelta(seconds=time.time() - time0).round('1s')
    print(f"{clf.__class__.__name__} trained in {time1}.")
    return clf


In [19]:
w = weighting_schemes[best_scheme]
rf = clone(clf).set_params(oob_score=True)
seq_rf1 = clone(seq_rf).set_params(max_samples=1.0)

ensembles = {
    "standard_rf": train_rf(rf, X_train, y_train, w),
    "sequential_rf": train_rf(seq_rf, X_train, y_train, w),  # max_samples=avg_u
    "sequential_rf_all": train_rf(seq_rf1, X_train, y_train, w),  # max_samples=1.0
}

if not best_scheme.startswith("unweighted"):
    print(f"Training: Sequential Bootstrap (max_samples=avg_u) - Unweighted...")
    seq_rfu = train_rf(clone(seq_rf), X_train, y_train)  # max_samples=avg_u
    ensembles["sequential_rf_unweighted"] = seq_rfu


scoring_methods = {
            "accuracy": accuracy_score,
            "pwa": probability_weighted_accuracy,
            "neg_log_loss": log_loss,
            "precision": precision_score,
            "recall": recall_score,
            "f1": f1_score,
        }
all_scores_oos = pd.DataFrame()

for name, classifier in ensembles.items():
    prob = classifier.predict_proba(X_test)[:, 1]
    pred = (prob > 0.5).astype(int)
    for method, scoring in scoring_methods.items():
        y_pred = prob if scoring in (probability_weighted_accuracy, log_loss) else pred
        score = scoring(y_test, y_pred)
        if method == "neg_log_loss":
            score *= -1
        all_scores_oos.loc[method, name] = score
    all_scores_oos.loc["oob", name] = classifier.oob_score_
    all_scores_oos.loc["oob_test_gap", name] = abs(all_scores_oos.loc["accuracy", name] - classifier.oob_score_)

print(f"\nAverage uniqueness = {avg_u:.4f}\n")
bb_all_scores_oos = all_scores_oos.copy()

winsound.Beep(1000, 1000) # Alert

all_scores_oos.round(4)

[32m2025-11-09 00:28:47.872[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m
[32m2025-11-09 00:28:48.139[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m
[32m2025-11-09 00:28:48.364[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m



Average uniqueness = 0.5473



Unnamed: 0,standard_rf,sequential_rf,sequential_rf_all
accuracy,0.5058,0.5159,0.5159
pwa,0.5087,0.5098,0.5098
neg_log_loss,-0.6939,-0.6937,-0.6937
precision,0.6377,0.6437,0.6437
recall,0.495,0.5149,0.5149
f1,0.5574,0.5722,0.5722
oob,0.5388,0.5335,0.5335
oob_test_gap,0.033,0.0176,0.0176


**NOTE**: From the results above we can see that setting max_samples to 1.0 (_all) rather than average uniqueness <u>does not improve</u> any of the performance metrics.

## 3. Moving Average Crossover Strategy

In [20]:
from afml.strategies.ma_crossover_feature_engine import ForexFeatureEngine

ma_timeframe = "M15"
file = Path(fr"..\data\EURUSD_{ma_timeframe}_time_2018-01-01-2024-12-31.parq")
ma_time_bars = pd.read_parquet(file)

fast_window, slow_window = 50, 200
ma_strategy = MACrossoverStrategy(fast_window, slow_window)
ma_pt_barrier, ma_sl_barrier, ma_time_horizon = (0, 2, dict(days=3))
ma_vol_multiplier = 1

### Time-Bars

In [21]:
ma_side = ma_strategy.generate_signals(ma_time_bars)
ma_df = ma_time_bars.loc[sample_start : sample_end]


print(f"{ma_strategy.get_strategy_name()} Signals:")
value_counts_data(ma_side.reindex(ma_df.index), verbose=True)

# Volatility target for barriers
vol_lookback = fast_window
vol_target = get_daily_vol(ma_df.close, vol_lookback) * ma_vol_multiplier
close = ma_df.close

thres = vol_target.mean()
_, t_events = get_entries(ma_strategy, ma_df, filter_threshold=vol_target.mean())

vertical_barriers = add_vertical_barrier(t_events, close, **ma_time_horizon)

MACrossover_50_200 Signals:

       count  proportion
side                    
-1    75,984    0.513940
 1    71,663    0.484714
 0       199    0.001346



[32m2025-11-09 00:29:01.133[0m | [1mINFO    [0m | [36mafml.filters.filters[0m:[36mcusum_filter[0m:[36m151[0m - [1m8,034 CUSUM-filtered events[0m
[32m2025-11-09 00:29:01.208[0m | [1mINFO    [0m | [36mafml.strategies.signal_processing[0m:[36mget_entries[0m:[36m105[0m - [1mMACrossover_50_200 | 8,026 (5.44%) trade events selected by CUSUM filter (threshold = 0.1977%).[0m


#### Feature Engineering

In [22]:
ma_feat_engine = ForexFeatureEngine(pair_name=symbol)
ma_feat_time = ma_feat_engine.calculate_all_features(ma_time_bars, ma_timeframe, lr_period=(5, 20))
ma_feat_time.info()

Memory usage reduced from 106.62 MB to 55.49 MB (48.0% reduction)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 172386 entries, 2018-01-01 23:15:00 to 2024-12-31 00:00:00
Data columns (total 94 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   ma_10                           172386 non-null  float32
 1   ma_20                           172386 non-null  float32
 2   ma_50                           172386 non-null  float32
 3   ma_100                          172386 non-null  float32
 4   ma_200                          172386 non-null  float32
 5   ma_10_20_cross                  172386 non-null  float64
 6   ma_20_50_cross                  172386 non-null  float64
 7   ma_50_200_cross                 172386 non-null  float64
 8   ma_spread_10_20                 172386 non-null  float32
 9   ma_spread_20_50                 172386 non-null  float32
 10  ma_spread_50_200                172386 n

In [23]:
for i, col in enumerate(ma_feat_time):
    print(f"{i:>3}. {col}")

  0. ma_10
  1. ma_20
  2. ma_50
  3. ma_100
  4. ma_200
  5. ma_10_20_cross
  6. ma_20_50_cross
  7. ma_50_200_cross
  8. ma_spread_10_20
  9. ma_spread_20_50
 10. ma_spread_50_200
 11. ma_20_slope
 12. ma_50_slope
 13. price_above_ma_20
 14. price_above_ma_50
 15. ma_ribbon_aligned
 16. atr_14
 17. atr_21
 18. atr_regime
 19. realized_vol_10
 20. realized_vol_20
 21. realized_vol_50
 22. vol_of_vol
 23. hl_range
 24. hl_range_ma
 25. hl_range_regime
 26. bb_upper
 27. bb_lower
 28. bb_percent
 29. bb_bandwidth
 30. bb_squeeze
 31. efficiency_ratio_14
 32. efficiency_ratio_30
 33. adx_14
 34. dmp_14
 35. dmn_14
 36. adx_trend_strength
 37. adx_trend_direction
 38. trend_window
 39. trend_slope
 40. trend_t_value
 41. trend_rsquared
 42. trend_ret
 43. roc_10
 44. roc_20
 45. momentum_14
 46. hh_ll_20
 47. trend_persistence
 48. return_skew_20
 49. return_kurtosis_20
 50. var_95
 51. cvar_95
 52. market_stress
 53. current_drawdown
 54. days_since_high
 55. hour_sin_h1
 56. hour_cos_h1

#### Triple-Barrier Method

In [24]:
ma_events_tb = triple_barrier_labels(
    close=close,
    target=vol_target,
    t_events=t_events,
    pt_sl=[ma_pt_barrier, ma_sl_barrier],
    min_ret=min_ret,
    vertical_barrier_times=vertical_barriers,
    side_prediction=ma_side,
    vertical_barrier_zero=False,
    verbose=False,
)
ma_events_tb_time = ma_events_tb.copy()
ma_events_tb.info()

print(f"Triple-Barrier (pt={ma_pt_barrier}, sl={ma_sl_barrier}, h={ma_time_horizon}):")
value_counts_data(ma_events_tb.bin, verbose=True)

weights = get_event_weights(ma_events_tb, close)
av_uniqueness = weights['tW'].mean()
print(f"Average Uniqueness: {av_uniqueness:.4f}")

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 8016 entries, 2018-01-04 01:15:00 to 2023-12-28 16:00:00
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   t1      8016 non-null   datetime64[ns]
 1   trgt    8016 non-null   float64       
 2   ret     8016 non-null   float32       
 3   bin     8016 non-null   int8          
 4   side    8016 non-null   int8          
dtypes: datetime64[ns](1), float32(1), float64(1), int8(2)
memory usage: 234.8 KB
Triple-Barrier (pt=0, sl=2, h={'days': 3}):

     count  proportion
bin                   
0    4,846    0.604541
1    3,170    0.395459

Average Uniqueness: 0.1472


#### CV of Weighting Methods

In [25]:
from os import cpu_count

# Reserve 1 CPU if you want to do something else during training, otherwise set to -1
N_JOBS = cpu_count() - 1
N_ESTIMATORS = 100
seed = 7
min_w_leaf = 0.05
max_depth = 4
n_splits = 3
pct_embargo = 0.01
test_size = 0.2

In [26]:
cont = ma_events_tb_time.copy()
X = ma_feat_time.reindex(cont.index)
y = cont["bin"]
t1 = cont["t1"]

train, test = PurgedSplit(t1, test_size).split(X)
X_train, X_test, y_train, y_test = (
        X.iloc[train],
        X.iloc[test],
        y.iloc[train],
        y.iloc[test],
    )

cont_train = cont.iloc[train]
cont_train = get_event_weights(cont_train, bb_df.close)
bb_cont_train = cont_train.copy()

cv_gen = PurgedKFold(n_splits, cont_train["t1"], pct_embargo)

In [27]:
avg_u = cont_train.tW.mean()
print(f"Average Uniqueness in Training Set: {avg_u:.4f}")

weighting_schemes = {
    "unweighted": pd.Series(1., index=cont_train.index),
    "uniqueness": cont_train["tW"],
    "return": cont_train["w"],
    }

decay_factors = [0.0, 0.25, 0.5, 0.75]
time_decay_weights = {}
for time_decay in decay_factors:
    decay_w = get_weights_by_time_decay_optimized(
                triple_barrier_events=cont_train,
                close_index=close.index,
                last_weight=time_decay,
                linear=True,
                av_uniqueness=cont_train["tW"],
            )
    time_decay_weights[f"decay_{time_decay}"] = decay_w
        
weighting_schemes.keys()

Average Uniqueness in Training Set: 0.1510


dict_keys(['unweighted', 'uniqueness', 'return'])

##### Selection of Best Model

In [28]:
from sklearn.ensemble import RandomForestClassifier


# Initialize Random Forest

clf = RandomForestClassifier(
    criterion='entropy',
    n_estimators=N_ESTIMATORS,
    class_weight="balanced_subsample",
    max_samples=avg_u,
    min_weight_fraction_leaf=min_w_leaf,
    max_depth=max_depth,
    random_state=seed,
    n_jobs=N_JOBS,  # Use all available cores
    )



- Analyze all CV scores for all weighting schemes to find the best scheme

In [29]:
all_cv_scores_df = pd.DataFrame()
all_cv_scores_d = {}
all_cms = {}
best_score, best_scheme = None, None

if set(y_train.values) == {0, 1}:
    scoring = "f1"  # f1 for meta-labeling
else:
    scoring = "neg_log_loss"  # symmetric towards all cases

for scheme, w in tqdm(weighting_schemes.items()):
    cv_scores, cv_scores_df, cms = analyze_cross_val_scores(
        clf, X_train, y_train, cv_gen, 
        sample_weight_train=w, 
        sample_weight_score=w,
    )
    all_cms[scheme] = cms
    all_cv_scores_d[scheme] = cv_scores
    score = cv_scores[scoring].mean()
    recall = cv_scores_df.loc["recall", "mean"]
    recall_std = cv_scores_df.loc["recall", "std"]
    for idx, row in cv_scores_df.iterrows():
        all_cv_scores_df.loc[idx, scheme] = f"{row['mean']:.4f} ± {row['std']:.4f}"
    if scoring == "f1" and (np.allclose([recall + recall_std], [1.0], 0.025) or np.allclose([recall - recall_std], [0.0], 0.025)):
        print(f"Recall score ({all_cv_scores_df.loc['recall', scheme]}) collapses for {scheme} weighting scheme")
        continue
    best_score = max(best_score, score) if best_score is not None else score
    if score == best_score:
        best_scheme = scheme

print(f"{best_scheme.title()} is the best weighting scheme with {scoring} = {best_score:.4f}")
print("\nWeighting Scheme CV:")
all_cv_scores_df

  0%|          | 0/3 [00:00<?, ?it/s][32m2025-11-09 00:29:14.522[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
[32m2025-11-09 00:29:14.552[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
[32m2025-11-09 00:29:14.577[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for analyze_cross_val_scores[0m
100%|██████████| 3/3 [00:00<00:00, 38.96it/s]

Recall score (0.9964 ± 0.0013) collapses for return weighting scheme
Unweighted is the best weighting scheme with f1 = 0.4744

Weighting Scheme CV:





Unnamed: 0,unweighted,uniqueness,return
accuracy,0.4887 ± 0.0432,0.6304 ± 0.0087,0.5150 ± 0.0075
pwa,0.4870 ± 0.0496,0.6809 ± 0.0167,0.5006 ± 0.0097
neg_log_loss,-0.7051 ± 0.0166,-0.6589 ± 0.0041,-0.7127 ± 0.0057
precision,0.4174 ± 0.0240,0.4320 ± 0.0410,0.5142 ± 0.0073
recall,0.5894 ± 0.1829,0.1954 ± 0.0775,0.9964 ± 0.0013
f1,0.4744 ± 0.0410,0.2629 ± 0.0835,0.6783 ± 0.0066


- Test if time-decay improves performance of best model

In [31]:
best_model_decay_cv_scores = pd.DataFrame()

for scheme, decay_factor in tqdm(time_decay_weights.items()):
    best_scheme_o = best_scheme.split("_decay")[0]
    sample_weight = weighting_schemes[best_scheme_o] * decay_factor
    cv_scores, cv_scores_df, cms = analyze_cross_val_scores(
        clf, X_train, y_train, cv_gen, 
        sample_weight_train=sample_weight, 
        sample_weight_score=sample_weight,
    )
    score = cv_scores[scoring].mean()
    best_score = max(best_score, score) if best_score is not None else score
    scheme = f"{best_scheme_o}_{scheme}"
    all_cv_scores_d[scheme] = cv_scores
    all_cms[scheme] = cms
    for idx, row in cv_scores_df.iterrows():
        best_model_decay_cv_scores.loc[idx, scheme] = f"{row['mean']:.4f} ± {row['std']:.4f}"
    if score == best_score:
        best_scheme = scheme
        weighting_schemes[best_scheme] = sample_weight
    all_cv_scores_df[scheme] = best_model_decay_cv_scores[scheme]
best_model_decay_cv_scores[f"{best_scheme_o}_decay_1.0"] = all_cv_scores_df[best_scheme_o]
        
print(f"\n{best_scheme.title()} model achieved the best {scoring} score of {best_score:.4f}")
best_model_decay_cv_scores

  0%|          | 0/4 [00:00<?, ?it/s][32m2025-11-09 00:29:14.843[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m238[0m - [1mCV cache miss for analyze_cross_val_scores - computing...[0m
[32m2025-11-09 00:29:18.217[0m | [34m[1mDEBUG   [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m245[0m - [34m[1mCached CV result: 8ba41a32b8862f4ec782f842b26bce1c[0m
 25%|██▌       | 1/4 [00:03<00:10,  3.39s/it][32m2025-11-09 00:29:18.232[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m238[0m - [1mCV cache miss for analyze_cross_val_scores - computing...[0m
[32m2025-11-09 00:29:21.670[0m | [34m[1mDEBUG   [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m245[0m - [34m[1mCached CV result: 41a35d31acf1e04aa2fd235518cf7c86[0m
 50%|█████     | 2/4 [00:06<00:06,  3.43s/it][32m2025-11-09 00:29:21.691[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m238[0m - [1mCV cache miss for analyz


Unweighted_Decay_0.75 model achieved the best f1 score of 0.4825





Unnamed: 0,unweighted_decay_0.0,unweighted_decay_0.25,unweighted_decay_0.5,unweighted_decay_0.75,unweighted_decay_1.0
accuracy,0.5012 ± 0.0414,0.4929 ± 0.0450,0.4914 ± 0.0446,0.4896 ± 0.0393,0.4887 ± 0.0432
pwa,0.4981 ± 0.0658,0.4837 ± 0.0569,0.4847 ± 0.0532,0.4823 ± 0.0463,0.4870 ± 0.0496
neg_log_loss,-0.7025 ± 0.0167,-0.7051 ± 0.0167,-0.7058 ± 0.0176,-0.7056 ± 0.0161,-0.7051 ± 0.0166
precision,0.4148 ± 0.0247,0.4171 ± 0.0242,0.4177 ± 0.0267,0.4187 ± 0.0231,0.4174 ± 0.0240
recall,0.5118 ± 0.2299,0.5645 ± 0.1771,0.5775 ± 0.1739,0.6023 ± 0.1615,0.5894 ± 0.1829
f1,0.4336 ± 0.0834,0.4650 ± 0.0438,0.4710 ± 0.0402,0.4825 ± 0.0329,0.4744 ± 0.0410


##### Sequential Bootstrap

In [32]:
# Random Forest default of max_features is sqrt, 
# which means I don't have to calculate it.
base_rf = clone(clf).set_params(
    n_estimators=1,
    bootstrap=False,
    n_jobs=None,
    max_samples=None,
    random_state=None,
    )

seq_rf = SequentiallyBootstrappedBaggingClassifier(
    samples_info_sets=cont_train.t1,
    price_bars_index=bb_df.index,
    estimator=base_rf,
    n_estimators=N_ESTIMATORS, # set low to save time
    max_samples=avg_u, # Set to average uniqueness
    # bootstrap_features=False,
    oob_score=True,
    n_jobs=N_JOBS,
    random_state=seed,
    verbose=False,
)
seq_rf

In [33]:
from afml.cache.cv_cache import cv_cacheable


@cv_cacheable
def train_rf(classifier, X, y, sample_weight=None):
    time0 = time.time()
    clf = clone(classifier).set_params(oob_score=True).fit(X, y, sample_weight)
    time1 = pd.Timedelta(seconds=time.time() - time0).round('1s')
    print(f"{clf.__class__.__name__} trained in {time1}.")
    return clf


In [None]:
w = weighting_schemes[best_scheme]
rf = clone(clf).set_params(oob_score=True)
seq_rf1 = clone(seq_rf).set_params(max_samples=1.0)
seq_rf_bf = clone(seq_rf).set_params(bootstrap_features=False)
ensembles = {
    "standard_rf": train_rf(rf, X_train, y_train, w),
    "sequential_rf": train_rf(seq_rf, X_train, y_train, w),  # max_samples=avg_u
    # "sequential_rf_no_bootstrap": train_rf(
    #     seq_rf_bf, X_train, y_train, w
    #     ),  # max_samples=avg_u, bootstrap_features=False
    "sequential_rf_all": train_rf(seq_rf1, X_train, y_train, w),  # max_samples=1.0
}

if best_scheme != "unweighted":
    print(f"Training: Sequential Bootstrap (max_samples=avg_u) - Unweighted...")
    seq_rfu = train_rf(clone(seq_rf), X_train, y_train)  # max_samples=avg_u
    ensembles["sequential_rf_unweighted"] = seq_rfu

    print(f"Training: Sequential Bootstrap (max_samples=1.0) - Unweighted...")
    seq_rfu1 = train_rf(clone(seq_rf1), X_train, y_train)  # max_samples=avg_u
    ensembles["sequential_rf_unweighted_all"] = seq_rfu1

scoring_methods = {
            "accuracy": accuracy_score,
            "pwa": probability_weighted_accuracy,
            "neg_log_loss": log_loss,
            "precision": precision_score,
            "recall": recall_score,
            "f1": f1_score,
        }

all_scores_oos = pd.DataFrame()

for name, classifier in ensembles.items():
    prob = classifier.predict_proba(X_test)[:, 1]
    pred = (prob > 0.5).astype(int)
    for method, scoring in scoring_methods.items():
        y_pred = prob if scoring in (probability_weighted_accuracy, log_loss) else pred
        score = scoring(y_test, y_pred)
        if method == "neg_log_loss":
            score *= -1
        all_scores_oos.loc[method, name] = score
    all_scores_oos.loc["oob", name] = classifier.oob_score_
    all_scores_oos.loc["oob_test_gap", name] = abs(all_scores_oos.loc["accuracy", name] - classifier.oob_score_)

print(f"\nAverage uniqueness = {avg_u:.4f}\n")
ma_all_scores_oos = all_scores_oos.copy()

# winsound.Beep(1000, 1000) # Alert

all_scores_oos.round(4)

[32m2025-11-09 05:08:33.366[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m
[32m2025-11-09 05:08:33.667[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m
[32m2025-11-09 05:08:33.953[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m


Training: Sequential Bootstrap (max_samples=avg_u) - Unweighted...


[32m2025-11-09 05:08:35.498[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m


Training: Sequential Bootstrap (max_samples=1.0) - Unweighted...


[32m2025-11-09 05:08:36.635[0m | [1mINFO    [0m | [36mafml.cache.cv_cache[0m:[36mwrapper[0m:[36m230[0m - [1mCV cache hit for train_rf[0m



Average uniqueness = 0.1510



Unnamed: 0,standard_rf,sequential_rf,sequential_rf_all,sequential_rf_unweighted,sequential_rf_unweighted_all
accuracy,0.587,0.5852,0.592,0.5758,0.587
pwa,0.6185,0.6086,0.6114,0.5982,0.6181
neg_log_loss,-0.6797,-0.6814,-0.681,-0.6831,-0.6805
precision,0.3861,0.3954,0.4104,0.3915,0.4043
recall,0.3565,0.4077,0.4479,0.4388,0.4442
f1,0.3707,0.4014,0.4283,0.4138,0.4233
oob,0.5895,0.579,0.5724,0.5802,0.5721
oob_test_gap,0.0024,0.0062,0.0196,0.0044,0.0149


### Conclusion

```
From the results above we can see that setting max_samples to 1.0 (_all) rather than average uniqueness does not justify the increase in training time from 7 min -> 62 min (8.6x) for the F1 score improvements—[('weighted', 6.7%), ('unweighted', 2.3%)], which comes with concomitant increases in the OOB score test gap (>2x for both).
```

Let's analyze the reasoning behind my statement above:

1. **Diminishing Returns**: You are paying an 860% cost in compute time for a 6.7% gain in performance. In a production environment or during research iteration, this is a very poor trade-off. The time saved could be used for feature engineering, hyperparameter tuning on other models, or analyzing more data.

2. **Increased Overfitting Risk**: The performance "gain" is highly suspect. The dramatically larger OOB test gap is a classic warning sign of overfitting. The improved test metrics might be a result of the model memorizing time-dependent patterns in the training set that will not recur in the future. A model that overfits less (like the standard sequential_rf) is often more reliable in live trading.

3. **Practical Viability**: The sequential_rf_all model is the most complex and costly, yet offers the worst generalization signal. In finance, a simpler, faster, and more robust model is almost always preferable to a complex, expensive, and potentially overfit one.

##### Additional Observations and Nuances
- **The "Winner"**: While sequential_rf_all has the highest F1, the most practical model from this set appears to be the standard sequential_rf. It maintains a good balance of performance, a minimal overfitting gap, and reasonable compute time.

- **Role of Weighting**: Compare sequential_rf (weighted) to sequential_rf_unweighted. The weighted version has a better F1 (0.4014 vs 0.4138 is a ~3% drop for unweighted) and a much smaller overfitting gap (0.0062 vs 0.0044). This suggests the sample weighting in sequential bootstrapping is providing value.

- **Surprising Performer**: The standard_rf is a very strong contender. It has the highest accuracy and PWA, a low overfitting gap, and is likely the fastest to train. Its lower F1 suggests it might be worse at handling the minority class, but its overall robustness is excellent.