## Regime Detection

Description:     
We detect market regimes (up, down, sideways) using a simple rule—if a short moving average is above a longer one, label = +1 (“up” regime), if below, label = -1 (“down” regime), otherwise 0 (“sideways”). By identifying these regimes, we can adapt trading strategies to different market states or selectively trade only in certain conditions.

#### 📌 Important Note:
This notebook contains *interactive charts generated using Vectorbt.  
GitHub does not display interactive Plotly charts, so the graphs will not be visible here.  

✅ To view the charts, please download this notebook and run it on your local machine.  
Make sure you have Vectorbt and its dependencies installed to regenerate the visualizations.


## Part 1: Data & Feature Engineering

**Objective:**  
Load raw price data (MetaTrader 5 or CSV) and transform it into a feature-rich dataset.

**Tasks:**
- Fetch historical bars  
- Apply `ta.add_all_ta_features` or custom features  
- (Optionally) create specific labels (multi-bar, double-barrier, regime, etc.)  
- Clean/prepare the final feature matrix **X** and target **y**  

In [None]:
# REGIME DETECTION + LabelEncoder approach for SHIFTED labels
import sys
import os
import warnings
from pathlib import Path

# ---------------------------------------------------------------------------
# 1) SET PROJECT ROOT AND UPDATE PATH/WORKING DIRECTORY
# ---------------------------------------------------------------------------
project_root = Path.cwd().parent.parent  # Adjust if your notebook is in notebooks/time_series
sys.path.append(str(project_root))
os.chdir(str(project_root))
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import MetaTrader5 as mt5
import vectorbt as vbt
import joblib

# Our modules
from data.data_loader import get_data_mt5
from features.feature_engineering import add_all_ta_features
from models.model_training import (
    select_features_rf_reg,
    walk_forward_splits
)
from backtests.simple_backtest import simulate_trading, calculate_sharpe_ratio
from features.labeling_schemes import create_labels_regime_detection  # or define inline




# Sklearn / Models
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.naive_bayes import GaussianNB, BernoulliNB


###########################################################
# 1) DATA LOADING & FEATURE ENGINEERING
###########################################################
if not mt5.initialize():
    print("Failed to initialize MT5")
else:
    data = get_data_mt5(symbol="BTCUSD", timeframe=mt5.TIMEFRAME_H4, n_bars=2000, start_pos=2000)
    mt5.shutdown()

df = add_all_ta_features(data)

###########################################################
# 2) Regime Detection Labeling
###########################################################


df_lbl = create_labels_regime_detection(df, short_window=20, long_window=50)

# X => all features except the label columns
X = df_lbl.drop(columns=["regime_label", "ma_short", "ma_long"])
y = df_lbl["regime_label"]  # in {-1,0,+1}

###########################################################
# SHIFT LABELS? 
# We'll SHIFT them to [0,1,2] in each fold with LabelEncoder
# so we can handle folds with only two classes, e.g. {0,2}.
###########################################################

###########################################################
# 3) WALK-FORWARD SPLITS
###########################################################
folds = walk_forward_splits(X, y, n_splits=3)
print(f"Number of folds created: {len(folds)}")

###########################################################
# 4) DEFINE CLASSIFICATION MODELS
###########################################################
models = {
    "RandomForestClassifier": RandomForestClassifier(n_estimators=100, random_state=42),
    "GradientBoostingClassifier": GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=5, random_state=42),
    "SVC": SVC(C=1.0, kernel='rbf', probability=True),
    "XGBClassifier": XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42),
    "LGBMClassifier": LGBMClassifier(n_estimators=100, learning_rate=0.1, random_state=42),
    "GaussianNB": GaussianNB(),  # <-- Added Bayesian Classification Model
    "BernoulliNB": BernoulliNB() # <-- Another Bayesian Model (good for binary data)
}

###########################################################
# 5) LOOP OVER FOLDS + SIMPLE BACKTEST (with LabelEncoder)
###########################################################
fold_results = {}

for fold_i, (X_train_fold, y_train_fold, X_test_fold, y_test_fold) in enumerate(folds, start=1):
    print(f"\n===== Fold {fold_i} =====")
    print(f"Train size: {len(X_train_fold)}, Test size: {len(X_test_fold)}")
    
    # 5.1 SHIFT labels with LabelEncoder so classes become consecutive integers
    le = LabelEncoder()
    
    # Fit on the training labels (which might be e.g. {-1,+1}, => SHIFT with le)
    # or {0,1} => SHIFT, etc. This ensures e.g. {0,2} becomes {0,1}.
    le.fit(y_train_fold)
    y_train_enc = le.transform(y_train_fold)
    
    # 5.2 Feature selection
    # We can pass the encoded y to select_features_rf_reg
    # because it just needs an estimator with .fit(X, y).
    # SHIFT isn't strictly necessary for random forest's .feature_importances_,
    # but let's be consistent.
    rf_for_fs = RandomForestClassifier(n_estimators=100, random_state=42)
    X_train_sel, selected_idx = select_features_rf_reg(
        X_train_fold, y_train_enc, estimator=rf_for_fs, max_features=20
    )
    feats = X_train_fold.columns[selected_idx]
    print(f"Selected features for Fold {fold_i}: {feats.tolist()}")

    X_test_sel = X_test_fold[feats]

    # 5.3 Scale
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train_sel)
    X_test_scaled  = scaler.transform(X_test_sel)

    fold_results[fold_i] = {}

    for model_name, model in models.items():
        # 5.4 Train on the encoded training labels
        model.fit(X_train_scaled, y_train_enc)

        # 5.5 Predict encoded classes on test
        preds_enc = model.predict(X_test_scaled)
        
        # 5.6 Decode them back to the original label set (which might be e.g. [-1,0,+1])
        preds_fold = le.inverse_transform(preds_enc)

        # 5.7 Evaluate accuracy on the unencoded test labels
        acc = accuracy_score(y_test_fold, preds_fold)

        # 5.8 Convert classification => signals
        # If your original labels are: +1 => up, -1 => down, 0 => sideways,
        # then we interpret them as: +1 => buy, -1 => sell, 0 => no trade
        signals = preds_fold

        # Align with test portion
        df_test_fold = df_lbl.loc[X_test_fold.index].copy()

        # 5.9 Simple backtest with cost
        daily_returns, total_return = simulate_trading(signals, df_test_fold, cost=0.0002)
        sr = calculate_sharpe_ratio(np.array(daily_returns))

        fold_results[fold_i][model_name] = {
            "Accuracy": acc,
            "TotalReturn": total_return,
            "Sharpe": sr
        }

###########################################################
# 6) PRINT RESULTS
###########################################################
for fold_i, model_dict in fold_results.items():
    print(f"\n=== Fold {fold_i} Results ===")
    for model_name, stats in model_dict.items():
        acc = stats["Accuracy"]
        ret = stats["TotalReturn"]
        sr = stats["Sharpe"]
        print(f"{model_name}: ACC={acc:.2f}, Return={ret:.2f}%, Sharpe={sr:.2f}")


Number of folds created: 3

===== Fold 1 =====
Train size: 237, Test size: 237
Selected features for Fold 1: ['trend_kst_sig', 'trend_macd_signal', 'momentum_ppo_signal', 'trend_visual_ichimoku_b', 'momentum_tsi', 'trend_kst', 'trend_trix', 'trend_aroon_up', 'trend_visual_ichimoku_a', 'trend_macd', 'momentum_ppo', 'momentum_stoch_rsi_k', 'volatility_ui', 'trend_ichimoku_b', 'momentum_stoch_rsi_d', 'trend_ichimoku_conv', 'trend_mass_index', 'trend_adx_neg', 'momentum_wr', 'momentum_stoch_rsi']


  File "c:\Users\moham\miniconda3\envs\ml\Lib\site-packages\joblib\externals\loky\backend\context.py", line 257, in _count_physical_cores
    cpu_info = subprocess.run(
               ^^^^^^^^^^^^^^^
  File "c:\Users\moham\miniconda3\envs\ml\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\moham\miniconda3\envs\ml\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "c:\Users\moham\miniconda3\envs\ml\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


[LightGBM] [Info] Number of positive: 166, number of negative: 71
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000173 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1418
[LightGBM] [Info] Number of data points in the train set: 237, number of used features: 20
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.700422 -> initscore=0.849308
[LightGBM] [Info] Start training from score 0.849308

===== Fold 2 =====
Train size: 474, Test size: 237
Selected features for Fold 2: ['trend_kst_sig', 'trend_macd_signal', 'momentum_ppo_signal', 'trend_trix', 'trend_kst', 'momentum_tsi', 'momentum_ppo', 'trend_macd', 'momentum_stoch_rsi', 'momentum_ppo_hist', 'momentum_stoch_rsi_k', 'trend_macd_diff', 'trend_visual_ichimoku_a', 'trend_kst_diff', 'momentum_stoch_rsi_d', 'volume_nvi', 'trend_aroon_up', 'trend_aroon_ind', 'trend_aroon_down', 'trend_adx_neg']
[LightGBM] [Info] Number of positive: 347, number of ne

## Part 2: Model Training & Hyperparameter Tuning

**Objective:**  
Train an ML model (e.g., RandomForest, XGBoost) on the engineered features to predict the chosen labels.

**Tasks:**
- Perform time-based or walk-forward splits  
- Select top features if desired (e.g., using RandomForest feature importance)  
- Use `RandomizedSearchCV` or `GridSearchCV` to find optimal hyperparameters  
- Save the best model pipeline (e.g., `best_rf_pipeline.pkl`) 

In [None]:
# Code 2: Hyperparameter Tuning for a Classification Model (Regime Detection)

import sys
import os
import warnings
from pathlib import Path

# ---------------------------------------------------------------------------
# 1) SET PROJECT ROOT AND UPDATE PATH/WORKING DIRECTORY
# ---------------------------------------------------------------------------
project_root = Path.cwd().parent.parent  # Adjust if your notebook is in notebooks/time_series
sys.path.append(str(project_root))
os.chdir(str(project_root))
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import MetaTrader5 as mt5
import joblib

# Sklearn / Models
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import TimeSeriesSplit, RandomizedSearchCV
from sklearn.metrics import accuracy_score, make_scorer
from sklearn.pipeline import Pipeline

# Your modules
from data.data_loader import get_data_mt5
from features.feature_engineering import add_all_ta_features
from features.labeling_schemes import create_labels_regime_detection  # or define inline




###########################################################
# 1) DATA LOADING & FEATURE ENGINEERING
###########################################################
if not mt5.initialize():
    print("Failed to initialize MT5")
else:
    data = get_data_mt5(symbol="BTCUSD", timeframe=mt5.TIMEFRAME_H4, n_bars=2000, start_pos=2000)
    mt5.shutdown()

df = add_all_ta_features(data)

###########################################################
# 2) Regime Detection Labeling
###########################################################

# Create classification labels for regime detection
df_lbl = create_labels_regime_detection(df, short_window=20, long_window=50)

# Now we have columns: 'ma_short', 'ma_long', 'regime_label' in {-1,0,+1}
X_full = df_lbl.drop(columns=["regime_label", "ma_short", "ma_long"])
y_full = df_lbl["regime_label"]

# SHIFT LABELS from [-1,0,+1] => [0,1,2]
# so the classifier won't complain about negative labels
y_full_shifted = y_full + 1  # -1->0, 0->1, +1->2

print("Unique classes in y_full:", y_full.unique())
print("Unique classes in y_full_shifted:", y_full_shifted.unique())

# Ensure chronological order if needed
# X_full = X_full.sort_index()
# y_full_shifted = y_full_shifted.loc[X_full.index]

###########################################################
# 3) DEFINE YOUR TRAIN PORTION
###########################################################
# e.g., first 80% for hyperparam tuning
split_idx = int(len(X_full) * 0.8)
X_tune = X_full.iloc[:split_idx].copy()
y_tune_shifted = y_full_shifted.iloc[:split_idx].copy()

print(f"Tuning portion size: {len(X_tune)}")

###########################################################
# 4) TIME-BASED CV (TimeSeriesSplit)
###########################################################
tscv = TimeSeriesSplit(n_splits=3)

# We'll define a scoring for classification (accuracy)
scorer = make_scorer(accuracy_score)

###########################################################
# 5) BUILD A PIPELINE
###########################################################
pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", RandomForestClassifier(random_state=42))
])

###########################################################
# 6) DEFINE PARAM DISTRIBUTIONS FOR RandomForestClassifier
###########################################################
param_distributions = {
    "clf__n_estimators": [100, 200, 300],
    "clf__max_depth": [None, 5, 10, 15],
    "clf__min_samples_split": [2, 5, 10],
    "clf__max_features": ["auto", "sqrt", 0.5]
}

###########################################################
# 7) SET UP RandomizedSearchCV
###########################################################
random_search = RandomizedSearchCV(
    estimator=pipeline,
    param_distributions=param_distributions,
    n_iter=10,               # how many random combos
    scoring=scorer,          # 'accuracy' metric
    cv=tscv,                 # time-based folds
    random_state=42,
    n_jobs=-1,
    verbose=2
)

###########################################################
# 8) FIT ON TUNING PORTION
###########################################################
random_search.fit(X_tune, y_tune_shifted)

print("Best params:", random_search.best_params_)
print("Best score (accuracy):", random_search.best_score_)

best_estimator = random_search.best_estimator_

###########################################################
# 9) SAVE THE BEST ESTIMATOR
###########################################################
joblib.dump(best_estimator, "models/saved_models/best_rf_rd_pipeline.pkl")
print("Saved best estimator to 'best_rf_rd_pipeline.pkl'")


Unique classes in y_full: [-1  1]
Unique classes in y_full_shifted: [0 2]
Tuning portion size: 1560
Fitting 3 folds for each of 10 candidates, totalling 30 fits
Best params: {'clf__n_estimators': 200, 'clf__min_samples_split': 5, 'clf__max_features': 'sqrt', 'clf__max_depth': 10}
Best score (accuracy): 0.9290598290598292
Saved best estimator to 'best_rf_rd_pipeline.pkl'


## Part 3: Backtesting & Performance Evaluation

**Objective:**  
Evaluate how well the trained model performs on unseen data, simulating real trades.

**Tasks:**
- Use walk-forward or expanding splits to mimic “live” conditions  
- Convert model predictions to signals ([-1, 0, +1] or buy/sell/hold)  
- Run a simple backtest script or VectorBT for performance metrics  
- Calculate returns, Sharpe ratio, drawdowns, confusion matrix, etc.  
- Visualize results (equity curve, trades, etc.) to judge strategy viability  

In [None]:
import sys
import os
import warnings
from pathlib import Path

# ---------------------------------------------------------------------------
# 1) SET PROJECT ROOT AND UPDATE PATH/WORKING DIRECTORY
# ---------------------------------------------------------------------------
project_root = Path.cwd().parent.parent
sys.path.append(str(project_root))
os.chdir(str(project_root))
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import MetaTrader5 as mt5
import vectorbt as vbt
import joblib

# Our modules
from data.data_loader import get_data_mt5
from features.feature_engineering import add_all_ta_features
from features.labeling_schemes import create_labels_regime_detection

# Sklearn
from sklearn.metrics import accuracy_score

###########################################################
# 1) DATA LOADING & FEATURE ENGINEERING
###########################################################
if not mt5.initialize():
    print("Failed to initialize MT5")
else:
    # Fetch 2000 most recent bars for backtesting
    data = get_data_mt5(symbol="BTCUSD", timeframe=mt5.TIMEFRAME_H4, n_bars=2000, start_pos=0)
    mt5.shutdown()

# Add technical features
df = add_all_ta_features(data)

# Create regime detection labels
df_lbl = create_labels_regime_detection(df, short_window=20, long_window=50)

# Separate features and labels
X = df_lbl.drop(columns=["regime_label", "ma_short", "ma_long"])
y = df_lbl["regime_label"]          # in {-1, 0, +1}
y_shifted = y + 1                    # in {0,1,2} if your classifier used shifted labels

###########################################################
# 2) LOAD PRE-TRAINED CLASSIFICATION MODEL (NO RETRAINING)
###########################################################
best_pipeline = joblib.load("models/saved_models/best_rf_rd_pipeline.pkl")
print("Loaded best classification model from 'best_rf_rd_pipeline.pkl'")

# Identify the columns the pipeline was trained on
trained_columns = best_pipeline["scaler"].feature_names_in_  # Adjust if your pipeline step name differs

# Subset X to match these columns
X_test = X[trained_columns]

# Predict on the new dataset
preds_shifted = best_pipeline.predict(X_test)

# Shift predictions back to {-1, 0, +1}
preds = preds_shifted - 1

# Accuracy on the exact prediction rows
accuracy = accuracy_score(y.loc[X_test.index], preds)
print(f"\nOut-of-Sample Accuracy: {accuracy:.4f}")

###########################################################
# 3) BACKTEST via target exposure (-1, 0, +1)  << UPDATED
###########################################################
# predictions already in {-1, 0, +1}
exposure = pd.Series(preds.astype(float), index=X_test.index)

# Align prices exactly to prediction rows (no padding)
close = df_lbl.loc[X_test.index, "close"]

# Optional: trade on next bar to avoid look-ahead (set to 0 for same-bar)
execution_lag = 1
if execution_lag > 0:
    exposure = exposure.shift(execution_lag).fillna(0.0)

fees = 0.0002  # 0.02% transaction cost per trade

pf = vbt.Portfolio.from_orders(
    close=close,
    size=exposure,              # -1 short, 0 flat, +1 long
    size_type='targetpercent',
    init_cash=10000,
    freq='4H',
    fees=fees
)

total_return = pf.total_return()
sharpe_ratio = pf.sharpe_ratio()

print("\nFull Backtest Results:")
print(f"Accuracy={accuracy:.4f}, Return={total_return:.2f}%, Sharpe={sharpe_ratio:.2f}")
print(pf.stats())

# Optional: Plot the backtest
fig = pf.plot()
fig.show()


Loaded best classification model from 'best_rf_rd_pipeline.pkl'

Out-of-Sample Accuracy: 0.9821

Running Full Backtest on the Last 2000 Bars...

Full Backtest Results:
Accuracy=0.98, Return=-0.17%, Sharpe=-0.39
Start                                2024-04-11 16:00:00
End                                  2025-03-02 16:00:00
Period                                 325 days 04:00:00
Start Value                                      10000.0
End Value                                     8340.15852
Total Return [%]                              -16.598415
Benchmark Return [%]                           21.623307
Max Gross Exposure [%]                             100.0
Total Fees Paid                                94.120141
Max Drawdown [%]                               29.083683
Max Drawdown Duration                  213 days 12:00:00
Total Trades                                          25
Total Closed Trades                                   25
Total Open Trades                               