##  Results Summary

This notebook consolidates outputs from feature engineering,
regime detection, trading strategy, machine learning models,
and outlier analysis into final result tables stored under
the `results/` directory.

No modeling or signal generation is performed here.


In [3]:
import pandas as pd
import numpy as np
import os

# Create results directory if it doesn't exist
RESULTS_DIR = "../results"
os.makedirs(RESULTS_DIR, exist_ok=True)


In [5]:
# Load outputs from previous stages
df_backtest = pd.read_csv("df_backtest.csv", parse_dates=["timestamp"])
df_regime = pd.read_csv("df_regime.csv", parse_dates=["timestamp"])
df_trades_ml = pd.read_csv("df_trades_ml.csv")


In [7]:
print("Backtest rows:", df_backtest.shape)
print("Regime rows:", df_regime.shape)
print("ML trades rows:", df_trades_ml.shape)

df_backtest.head(2)


Backtest rows: (99, 31)
Regime rows: (99, 23)
ML trades rows: (49, 12)


Unnamed: 0,timestamp,open,high,low,close,volume,ema_5,ema_15,spot_return,futures_return,...,market_regime_smooth,regime_group,ema_diff,cross_up,cross_down,signal,position,open_return,position_lag,strategy_return
0,2025-08-25,226.48,229.3,226.23,227.16,30983133,227.56,227.685,-0.002634,0.0,...,1.0,1,-0.125,False,False,0,0,0.0,0.0,0.0
1,2025-08-26,226.87,229.49,224.69,229.31,54575107,228.143333,227.888125,0.009465,0.0,...,1.0,2,0.255208,True,False,1,0,0.001722,0.0,0.0


In [9]:
split_idx = int(len(df_backtest) * 0.7)

train_df = df_backtest.iloc[:split_idx].copy()
test_df = df_backtest.iloc[split_idx:].copy()

print("Train size:", train_df.shape)
print("Test size:", test_df.shape)


Train size: (69, 31)
Test size: (30, 31)


In [13]:
# Ensure chronological order 
df_backtest = df_backtest.sort_values("timestamp").reset_index(drop=True)
df_regime = df_regime.sort_values("timestamp").reset_index(drop=True)


In [15]:
# Add explicit train/test label for clarity
df_backtest["set"] = "train"
df_backtest.loc[df_backtest.index >= split_idx, "set"] = "test"

df_backtest["set"].value_counts()


set
train    69
test     30
Name: count, dtype: int64

## Data Integrity & Split Validation

- Data is sorted chronologically by timestamp to avoid look-ahead bias.
- A time-based 70/30 split is used:
  - First 70% → Training
  - Last 30% → Testing
- Train/Test labels are explicitly stored for transparent evaluation.


In [18]:
# Performance Metric Functions
def sharpe_ratio(returns, freq=252*78):
    """Annualized Sharpe Ratio for 5-min data"""
    return np.sqrt(freq) * returns.mean() / returns.std()

def sortino_ratio(returns, freq=252*78):
    """Annualized Sortino Ratio"""
    downside = returns[returns < 0]
    return np.sqrt(freq) * returns.mean() / downside.std()

def max_drawdown(equity_curve):
    """Maximum Drawdown"""
    rolling_max = equity_curve.cummax()
    drawdown = (equity_curve - rolling_max) / rolling_max
    return drawdown.min()


In [20]:
# Ensure equity curve exists
if "equity" not in df_backtest.columns:
    df_backtest["equity"] = (1 + df_backtest["strategy_return"]).cumprod()

# Split equity for train/test
train_equity = df_backtest.loc[df_backtest["set"] == "train", "equity"]
test_equity = df_backtest.loc[df_backtest["set"] == "test", "equity"]


### Performance Metrics

The following metrics are computed separately for training and testing periods:

- **Total Return (%)**
- **Sharpe Ratio** (annualized for 5-minute data)
- **Sortino Ratio**
- **Maximum Drawdown**
- **Win Rate**
- **Total Trades**


In [22]:
# Baseline Strategy Metrics

baseline_metrics = {
    "Total Return (%)": [
        train_df["strategy_return"].sum() * 100,
        test_df["strategy_return"].sum() * 100
    ],
    "Sharpe Ratio": [
        sharpe_ratio(train_df["strategy_return"]),
        sharpe_ratio(test_df["strategy_return"])
    ],
    "Sortino Ratio": [
        sortino_ratio(train_df["strategy_return"]),
        sortino_ratio(test_df["strategy_return"])
    ],
    "Max Drawdown (%)": [
        max_drawdown(train_equity) * 100,
        max_drawdown(test_equity) * 100
    ],
    "Win Rate (%)": [
        (train_df["strategy_return"] > 0).mean() * 100,
        (test_df["strategy_return"] > 0).mean() * 100
    ],
    "Total Trades": [
        (train_df["strategy_return"] != 0).sum(),
        (test_df["strategy_return"] != 0).sum()
    ]
}

baseline_metrics_df = pd.DataFrame(
    baseline_metrics,
    index=["Train", "Test"]
).T.round(3)

baseline_metrics_df


Unnamed: 0,Train,Test
Total Return (%),8.837,-1.808
Sharpe Ratio,15.373,-11.677
Sortino Ratio,25.384,-9.665
Max Drawdown (%),-5.467,-4.675
Win Rate (%),52.174,16.667
Total Trades,59.0,11.0


 ### Interpretation Note

- The baseline strategy shows strong in-sample (training) performance but
  deteriorates out-of-sample (testing), indicating limited generalization.
- High Sharpe/Sortino values in training are driven by:
  - Small sample size
  - Annualization of 5-minute returns
  - Low in-sample volatility
- This behavior motivates regime filtering and ML-based trade selection
  explored in later sections.


In [25]:
# Save baseline metrics to results directory
baseline_metrics_df.to_csv(
    os.path.join(RESULTS_DIR, "baseline_metrics.csv")
)

print("Saved: results/baseline_metrics.csv")


Saved: results/baseline_metrics.csv


## Baseline Trade Log

This table records all executed trades generated by the **baseline strategy**.

Each row corresponds to a time interval where the strategy held a non-zero position
and generated a realized return.

**Column description:**
- `timestamp` – Trade timestamp
- `signal` – Raw trading signal (EMA-based)
- `position` – Final position taken by the strategy
- `open_return` – Market return for the interval
- `strategy_return` – Strategy-adjusted return after signals and regime filtering
- `market_regime_smooth` – Detected market regime at trade time
- `set` – Train/Test split label

This trade log is exported as a CSV file for transparent inspection and validation.


In [27]:
# Baseline Trade Log

trade_log = df_backtest.loc[
    df_backtest["strategy_return"] != 0,
    [
        "timestamp",
        "signal",
        "position",
        "open_return",
        "strategy_return",
        "market_regime_smooth",
        "set"
    ]
].copy()

trade_log.reset_index(drop=True, inplace=True)

trade_log.head()


Unnamed: 0,timestamp,signal,position,open_return,strategy_return,market_regime_smooth,set
0,2025-08-28,0,1,0.009667,0.009667,1.0,train
1,2025-08-29,0,1,0.007322,0.007322,-1.0,train
2,2025-09-02,0,1,-0.014021,-0.014021,1.0,train
3,2025-09-03,0,1,0.034722,0.034722,-1.0,train
4,2025-09-04,0,1,0.005227,0.005227,1.0,train


In [31]:
trade_log["signal_type"] = trade_log["signal"].map({
    1: "LONG",
    -1: "SHORT",
    0: "FLAT"
})

trade_log.head()


Unnamed: 0,timestamp,signal,position,open_return,strategy_return,market_regime_smooth,set,signal_type
0,2025-08-28,0,1,0.009667,0.009667,1.0,train,FLAT
1,2025-08-29,0,1,0.007322,0.007322,-1.0,train,FLAT
2,2025-09-02,0,1,-0.014021,-0.014021,1.0,train,FLAT
3,2025-09-03,0,1,0.034722,0.034722,-1.0,train,FLAT
4,2025-09-04,0,1,0.005227,0.005227,1.0,train,FLAT


In [35]:
# Human-readable trade side
trade_log["trade_side"] = trade_log["position"].map({
    1: "LONG",
    -1: "SHORT",
    0: "FLAT"
})

trade_log.head()


Unnamed: 0,timestamp,signal,position,open_return,strategy_return,market_regime_smooth,set,signal_type,trade_side
0,2025-08-28,0,1,0.009667,0.009667,1.0,train,FLAT,LONG
1,2025-08-29,0,1,0.007322,0.007322,-1.0,train,FLAT,LONG
2,2025-09-02,0,1,-0.014021,-0.014021,1.0,train,FLAT,LONG
3,2025-09-03,0,1,0.034722,0.034722,-1.0,train,FLAT,LONG
4,2025-09-04,0,1,0.005227,0.005227,1.0,train,FLAT,LONG


In [37]:
trade_log.to_csv(
    os.path.join(RESULTS_DIR, "trade_log_baseline.csv"),
    index=False
)

print("Saved: results/trade_log_baseline.csv")


Saved: results/trade_log_baseline.csv


 ## REGIME SUMMARY

In [40]:
# Map regime codes to names
regime_map = {
    1: "Uptrend",
    -1: "Downtrend",
    0: "Sideways"
}

df_backtest["regime_label"] = df_backtest["market_regime_smooth"].map(regime_map)


In [42]:
# Regime-wise Performance Summary
regime_summary = (
    df_backtest[df_backtest["strategy_return"] != 0]
    .groupby("regime_label")
    .agg(
        Total_Trades=("strategy_return", "count"),
        Win_Rate_pct=("strategy_return", lambda x: (x > 0).mean() * 100),
        Avg_Return_pct=("strategy_return", lambda x: x.mean() * 100),
        Total_Return_pct=("strategy_return", lambda x: x.sum() * 100)
    )
    .round(3)
)

regime_summary


Unnamed: 0_level_0,Total_Trades,Win_Rate_pct,Avg_Return_pct,Total_Return_pct
regime_label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Downtrend,33,63.636,0.267,8.803
Uptrend,37,54.054,-0.048,-1.773


In [44]:
regime_summary.to_csv(
    os.path.join(RESULTS_DIR, "regime_summary.csv")
)

print("Saved: results/regime_summary.csv")


Saved: results/regime_summary.csv


## Regime-wise Performance Analysis

This section analyzes how the baseline strategy performs under different
market regimes detected by the HMM model.

Trades are grouped by regime label, and key performance statistics are
computed for each regime to understand regime sensitivity.


## Overall Findings & Conclusions

- The baseline EMA strategy shows strong in-sample performance but fails to
  generalize out-of-sample.
- Strategy performance is highly regime-dependent, with most profitability
  coming from downtrend regimes.
- These findings justify the use of:
  - Regime-aware filtering
  - Machine learning–based trade selection


## Machine Learning & Outlier Analysis (Summary)

Machine learning models (XGBoost and LSTM) were trained separately to predict
trade profitability and filter baseline signals.

Detailed model training, validation, and outlier pattern analysis are documented
in dedicated notebooks.

This results notebook focuses exclusively on aggregated performance metrics
and final trade-level outputs.
