# Model selection for Market Regime Prediction

In this notebook we define which model gives better perfomance for the market regime prediction.

The tree models we will try are Random Forest, Logistic Regression and XGboost.

In [1]:
import numpy as np
import pandas as pd

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

from sklearn.metrics import (
    accuracy_score,
    balanced_accuracy_score,
    f1_score,
    classification_report,
)
from sklearn.model_selection import TimeSeriesSplit, cross_val_score

## Dataset Preperation

We start by loading the dataset we are going to use for this analysis.

In [2]:
df = pd.read_csv("dataset_factor_prediction.csv") 
df = df.drop(columns=["best_factor"])
df["date"] = pd.to_datetime(df["date"])
df = df.sort_values("date").reset_index(drop=True)
df = df.set_index("date")

We separate the target variable from the predictive variables.

In [3]:
target_regime_col = "market_regime"

stage1_feature_cols = [
    c for c in df.columns
    if c not in [target_regime_col]
]

stage2_feature_cols = stage1_feature_cols

In [4]:
df

Unnamed: 0_level_0,market_regime,cpilfesl,1 yr core cpi,10 yr-core cpi,gdpc1,gdp yoy%,GDP YoY%,nfcirisk,nfcicredit,nfcileverage,...,healthcare,consumer_staples,consumer_discretionary,utilities,industrials,telecom,materials,gs1,gs2,gs10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1994-01-01,0,154.500,2.912621,25.457991,10939.116,3.316913,3.430707,-1.08379,-0.47922,-0.30827,...,-0.061834,-0.010436,-0.009915,-0.061372,-0.020356,-0.059392,-0.028812,0.0354,0.0414,0.0575
1994-02-01,0,154.800,2.971576,25.157233,10939.116,3.316913,3.430707,-0.96289,-0.52480,-0.25823,...,-0.061834,-0.010436,-0.009915,-0.061372,-0.020356,-0.059392,-0.028812,0.0387,0.0447,0.0597
1994-03-01,1,155.300,2.962009,25.094103,10939.116,3.316913,3.430707,-0.78907,-0.52248,-0.29539,...,-0.058534,-0.041859,-0.056363,-0.054472,-0.047056,-0.031477,-0.042874,0.0432,0.0500,0.0648
1994-04-01,1,155.500,3.151125,25.140713,11087.361,3.316913,4.225611,-0.68282,-0.44638,-0.43840,...,0.031335,0.024034,-0.003287,0.008703,-0.011440,0.022185,-0.000824,0.0482,0.0555,0.0697
1994-05-01,1,155.900,3.078897,24.812968,11087.361,3.316913,4.225611,-0.68497,-0.37455,-0.53126,...,0.057632,-0.014477,-0.004617,-0.054675,0.006034,0.015897,0.040915,0.0531,0.0597,0.0718
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-07-01,0,318.933,3.048603,35.848855,23478.570,2.783592,2.791390,-0.36354,0.00500,-0.09324,...,0.018446,0.006058,0.039492,0.012407,0.037648,0.007857,0.040583,0.0490,0.0450,0.0425
2024-08-01,1,319.839,3.112191,35.848855,23478.570,2.783592,2.791390,-0.38881,-0.00348,-0.24788,...,0.018446,0.006058,0.039492,0.012407,0.037648,0.007857,0.040583,0.0443,0.0397,0.0387
2024-09-01,0,320.835,3.025543,35.848855,23478.570,2.783592,2.791390,-0.41621,-0.03088,-0.33971,...,0.018446,0.006058,0.039492,0.012407,0.037648,0.007857,0.040583,0.0403,0.0362,0.0372
2024-10-01,0,321.688,3.025543,35.848855,23586.542,2.783592,2.399788,-0.42528,-0.06098,-0.31632,...,0.018446,0.006058,0.039492,0.012407,0.037648,0.007857,0.040583,0.0420,0.0397,0.0410


## Training of the Models

We define the **Out-of-Sample (OOS)** start date as February 2007.
* Training Data: 1994–2007
* Test Data: 2007–2025

In [5]:
oos_start = pd.Timestamp("2007-02-01")
oos_end   = pd.Timestamp("2025-01-31")

We split the dataset chronologically based on the `oos_start` date defined above. This ensures we strictly respect the time order, avoiding any look-ahead bias in our validation.

In [6]:
X_step1 = df[stage1_feature_cols].copy()
y_step1 = df[target_regime_col].astype(int)

train_mask = df.index < oos_start
test_mask = (df.index >= oos_start) & (df.index <= oos_end)

X_train_step1 = X_step1.loc[train_mask]
y_train_step1 = y_step1.loc[train_mask]

X_test_step1 = X_step1.loc[test_mask]
y_test_step1 = y_step1.loc[test_mask]

print("X_train_step1:", X_train_step1.shape)
print("y_train_step1:", y_train_step1.shape)
print("X_test_step1: ", X_test_step1.shape)
print("y_test_step1: ", y_test_step1.shape)

X_train_step1: (157, 25)
y_train_step1: (157,)
X_test_step1:  (214, 25)
y_test_step1:  (214,)


We define the three candidate classifiers with specific hyperparameters:
1.  Logistic Regression
2.  Random Forest
3.  XGBoost

We also define a TimeSeriesSplit (5 splits) for cross-validation, ensuring we test the models' stability over expanding time windows.

In [7]:
tscv_step1 = TimeSeriesSplit(n_splits=5)
RANDOM_STATE = 42

models_step1 = {
    "LogisticRegression": Pipeline(
        steps=[
            ("scaler", StandardScaler()),
            (
                "clf",
                LogisticRegression(
                    class_weight="balanced",
                    max_iter=2000,
                    random_state=RANDOM_STATE,
                    n_jobs=-1,
                ),
            ),
        ]
    ),
    "RandomForest": RandomForestClassifier(
        n_estimators=500,
        max_depth=None,
        min_samples_split=5,
        min_samples_leaf=2,
        class_weight="balanced",
        n_jobs=-1,
        random_state=RANDOM_STATE,
    ),
    "XGBoost": XGBClassifier(
        objective="multi:softmax",
        num_class=len(np.unique(y_train_step1)),
        eval_metric="mlogloss",
        random_state=RANDOM_STATE,
        n_jobs=-1,
    ),
}

We iterate through each model to:
1.  Fit the model on the training set (Pre-2007).
2.  Predict market regimes on the test set (2007-2025).
3.  Calculate Metrics: Accuracy, Balanced Accuracy, and F1-Score (Macro).
4.  Run Cross-Validation: Perform Time-Series CV on the training data to verify the model isn't simply memorizing the past.

In [8]:
results = []
fitted_models_step1 = {}

for name, model in models_step1.items():
    print(f"\n===== {name} =====")

    model.fit(X_train_step1, y_train_step1)
    fitted_models_step1[name] = model

    y_pred = model.predict(X_test_step1)

    acc = accuracy_score(y_test_step1, y_pred)
    bal_acc = balanced_accuracy_score(y_test_step1, y_pred)
    f1_macro = f1_score(y_test_step1, y_pred, average="macro")

    print(f"Test accuracy:          {acc:.3f}")
    print(f"Test balanced accuracy: {bal_acc:.3f}")
    print(f"Test F1 (macro):        {f1_macro:.3f}")

    cv_scores = cross_val_score(
        model,
        X_train_step1,
        y_train_step1,
        cv=tscv_step1,
        scoring="f1_macro",
        n_jobs=-1,
    )
    print(f"CV F1 (macro), mean: {cv_scores.mean():.3f}, std: {cv_scores.std():.3f}")

    print("\nClassification report (test):")
    print(classification_report(y_test_step1, y_pred))

    results.append(
        {
            "model": name,
            "test_accuracy": acc,
            "test_balanced_accuracy": bal_acc,
            "test_f1_macro": f1_macro,
            "cv_f1_macro_mean": cv_scores.mean(),
            "cv_f1_macro_std": cv_scores.std(),
        }
    )


===== LogisticRegression =====
Test accuracy:          0.687
Test balanced accuracy: 0.644
Test F1 (macro):        0.648
CV F1 (macro), mean: 0.510, std: 0.113

Classification report (test):
              precision    recall  f1-score   support

           0       0.72      0.81      0.76       134
           1       0.60      0.47      0.53        80

    accuracy                           0.69       214
   macro avg       0.66      0.64      0.65       214
weighted avg       0.68      0.69      0.68       214


===== RandomForest =====
Test accuracy:          0.776
Test balanced accuracy: 0.803
Test F1 (macro):        0.774
CV F1 (macro), mean: 0.428, std: 0.090

Classification report (test):
              precision    recall  f1-score   support

           0       0.93      0.69      0.79       134
           1       0.64      0.91      0.75        80

    accuracy                           0.78       214
   macro avg       0.79      0.80      0.77       214
weighted avg       0.82

## Results

Based on the evaluation metrics above, we analyze the trade-offs to select our final Stage 1 model:

1.  **Random Forest:**
    * While it achieved the highest Test F1-Score (0.77), it performed poorly in Time-Series Cross-Validation (0.42). This large discrepancy suggests the model is overfitting.

2.  **Logistic Regression:**
    * This model showed decent stability but the lowest Test F1-Score (0.64). Being a linear model, it likely struggles to capture the complex, non-linear interactions between macroeconomic variables (e.g., the non-linear impact of inflation on market regimes).

3.  **XGBoost:**
    * XGBoost provided the optimal balance. It significantly outperformed Logistic Regression in predictive power (Test F1 0.70) while maintaining much better generalization consistency than Random Forest. It successfully captures non-linear market dynamics without the severe overfitting seen in the RF model.

Hence, we select **XGBoost** as our final Market Regime classifier for the rolling backtest.