# Barrier Analysis Example

This notebook demonstrates how to use the `BarrierAnalysis` module to evaluate signal quality using triple barrier outcomes (take-profit, stop-loss, timeout) instead of raw returns.

**Key Features:**
- Hit rate analysis by signal quantile
- Profit factor analysis with monotonicity tests
- Precision/recall curves for signal quality
- Time-to-target analysis for trade timing
- Interactive Plotly visualizations
- HTML/JSON export

**Reference:** Lopez de Prado (2018) "Advances in Financial Machine Learning" Chapter 3

In [None]:
from datetime import date, timedelta

import numpy as np
import polars as pl

from ml4t.diagnostic.config import BarrierAnalysisConfig

# ML4T Diagnostic imports
from ml4t.diagnostic.evaluation import BarrierAnalysis
from ml4t.diagnostic.visualization import (
    plot_hit_rate_heatmap,
    plot_precision_recall_curve,
    plot_profit_factor_bar,
    plot_time_to_target_box,
)

## 1. Generate Synthetic Data

We'll create synthetic data where:
- Higher signal values predict take-profit (TP) outcomes
- Lower signal values predict stop-loss (SL) outcomes

This simulates a momentum-like signal with predictive power.

In [None]:
np.random.seed(42)

# Generate dates and assets
n_dates = 252  # ~1 year of trading days
n_assets = 50
start_date = date(2023, 1, 1)

dates = [start_date + timedelta(days=i) for i in range(n_dates)]
assets = [f"ASSET_{i:03d}" for i in range(n_assets)]

print(f"Dataset: {n_dates} dates x {n_assets} assets = {n_dates * n_assets:,} observations")

In [None]:
# Create signal data
signal_rows = []
for d in dates:
    for asset in assets:
        # Signal is a momentum-like score from -1 to 1
        signal = np.random.uniform(-1, 1)
        signal_rows.append({"date": d, "asset": asset, "signal": signal})

signal_df = pl.DataFrame(signal_rows)
print("Signal DataFrame:")
signal_df.head()

In [None]:
# Create barrier labels correlated with signal
barrier_rows = []

for d in dates:
    for asset in assets:
        # Get same random signal (reproducible with same seed pattern)
        signal = np.random.uniform(-1, 1)

        # Higher signals -> higher TP probability
        # Lower signals -> higher SL probability
        p_tp = 0.25 + 0.35 * (signal + 1) / 2  # Range: 0.25 to 0.60
        p_sl = 0.30 - 0.15 * (signal + 1) / 2  # Range: 0.30 to 0.15
        p_timeout = 1 - p_tp - p_sl

        # Sample outcome
        outcome = np.random.choice([1, -1, 0], p=[p_tp, p_sl, p_timeout])

        # Generate return based on outcome
        if outcome == 1:  # Take-profit
            ret = np.random.uniform(0.02, 0.05)  # 2-5% gain
            bars = np.random.randint(3, 15)  # Faster exits for winners
        elif outcome == -1:  # Stop-loss
            ret = np.random.uniform(-0.03, -0.01)  # 1-3% loss
            bars = np.random.randint(2, 10)  # Quick stops
        else:  # Timeout
            ret = np.random.uniform(-0.01, 0.01)  # Small return
            bars = 20  # Hit max holding period

        barrier_rows.append(
            {
                "date": d,
                "asset": asset,
                "label": outcome,
                "label_return": ret,
                "label_bars": bars,
            }
        )

barrier_df = pl.DataFrame(barrier_rows)
print("Barrier Labels DataFrame:")
barrier_df.head()

In [None]:
# Check outcome distribution
print("\nOutcome Distribution:")
outcome_counts = barrier_df.group_by("label").agg(pl.count().alias("count"))
print(outcome_counts)

## 2. Create BarrierAnalysis

Configure and instantiate the analysis with our data.

In [None]:
# Configure analysis
config = BarrierAnalysisConfig(
    n_quantiles=10,  # Use deciles (D1-D10)
    signal_name="momentum",  # Name for reports
    significance_level=0.05,  # Alpha for statistical tests
)

# Create analysis instance
analysis = BarrierAnalysis(signal_df, barrier_df, config=config)

print("Analysis created:")
print(f"  - Observations: {analysis.n_observations:,}")
print(f"  - Assets: {analysis.n_assets}")
print(f"  - Dates: {analysis.n_dates}")
print(f"  - Date range: {analysis.date_range[0]} to {analysis.date_range[1]}")
print(f"  - Quantile labels: {analysis.quantile_labels}")

## 3. Hit Rate Analysis

Analyze how hit rates (TP%, SL%, Timeout%) vary by signal quantile.

In [None]:
# Compute hit rates
hit_rates = analysis.compute_hit_rates()

print("Hit Rate Summary:")
print(f"  - Chi-square statistic: {hit_rates.chi2_statistic:.2f}")
print(f"  - P-value: {hit_rates.chi2_p_value:.6f}")
print(f"  - Significant: {hit_rates.is_significant}")
print(f"  - TP rate monotonic: {hit_rates.tp_rate_monotonic} ({hit_rates.tp_rate_direction})")
print(f"  - Spearman correlation: {hit_rates.tp_rate_spearman:.3f}")

In [None]:
# View hit rates as DataFrame
hit_rates.get_dataframe("hit_rates")

In [None]:
# Visualize hit rate heatmap
fig = plot_hit_rate_heatmap(hit_rates, show_counts=True, show_chi2=True)
fig.show()

## 4. Profit Factor Analysis

Analyze profit factor (sum of TP returns / |sum of SL returns|) by quantile.

In [None]:
# Compute profit factor
profit_factor = analysis.compute_profit_factor()

print("Profit Factor Summary:")
print(f"  - Overall PF: {profit_factor.overall_profit_factor:.2f}")
print(f"  - PF monotonic: {profit_factor.pf_monotonic} ({profit_factor.pf_direction})")
print(f"  - Spearman correlation: {profit_factor.pf_spearman:.3f}")
print(f"  - Overall avg return: {profit_factor.overall_avg_return:.4%}")

In [None]:
# View profit factor by quantile
profit_factor.get_dataframe("profit_factor")

In [None]:
# Visualize profit factor bar chart
fig = plot_profit_factor_bar(profit_factor, show_reference_line=True, show_average_return=True)
fig.show()

## 5. Precision/Recall Analysis

Analyze precision and recall for identifying TP outcomes when trading top signal quantiles.

In [None]:
# Compute precision/recall
precision_recall = analysis.compute_precision_recall()

print("Precision/Recall Summary:")
print(f"  - Baseline TP rate: {precision_recall.baseline_tp_rate:.2%}")
print(f"  - Total TP count: {precision_recall.total_tp_count:,}")
print(f"  - Best F1 quantile: {precision_recall.best_f1_quantile}")
print(f"  - Best F1 score: {precision_recall.best_f1_score:.4f}")

In [None]:
# View cumulative precision/recall
precision_recall.get_dataframe("cumulative")

In [None]:
# Visualize precision/recall curves
fig = plot_precision_recall_curve(precision_recall, show_f1_peak=True, show_lift=True)
fig.show()

## 6. Time-to-Target Analysis

Analyze how quickly trades exit (bars to exit) by quantile and outcome type.

In [None]:
# Compute time-to-target
time_to_target = analysis.compute_time_to_target()

print("Time-to-Target Summary:")
print(f"  - Overall mean bars: {time_to_target.overall_mean_bars:.1f}")
print(f"  - Overall median bars: {time_to_target.overall_median_bars:.1f}")
print(f"  - TP mean bars: {time_to_target.overall_mean_bars_tp:.1f}")
print(f"  - SL mean bars: {time_to_target.overall_mean_bars_sl:.1f}")

In [None]:
# View detailed time-to-target by quantile
time_to_target.get_dataframe("detailed")

In [None]:
# Visualize time-to-target (comparison mode: TP vs SL side by side)
fig = plot_time_to_target_box(time_to_target, outcome_type="comparison", show_median_line=True)
fig.show()

In [None]:
# Alternative: view all outcomes
fig = plot_time_to_target_box(time_to_target, outcome_type="all", show_median_line=True)
fig.show()

## 7. Complete Tear Sheet

Generate a comprehensive tear sheet with all analyses and export to HTML.

In [None]:
# Create complete tear sheet with visualizations
tear_sheet = analysis.create_tear_sheet(
    include_figures=True,
    include_time_to_target=True,
    theme="default",
)

print("Tear Sheet Summary:")
print(tear_sheet.summary())

In [None]:
# List available DataFrames
print("Available DataFrames:")
for name in tear_sheet.list_dataframes():
    print(f"  - {name}")

In [None]:
# Export to HTML (uncomment to save)
# tear_sheet.save_html("barrier_analysis_report.html")
# print("Report saved to barrier_analysis_report.html")

In [None]:
# Export to JSON (uncomment to save)
# tear_sheet.save_json("barrier_analysis_metrics.json", exclude_figures=True)
# print("Metrics saved to barrier_analysis_metrics.json")

## 8. Interpretation

### What to Look For

1. **Hit Rate Heatmap:**
   - Green cells (high TP rate) should cluster in high quantiles (D8-D10)
   - Red cells (high SL rate) should cluster in low quantiles (D1-D3)
   - Significant chi-square test indicates signal predicts outcomes

2. **Profit Factor Bar:**
   - Bars above 1.0 (reference line) are profitable
   - Increasing pattern from D1 to D10 indicates monotonicity
   - Overall PF > 1.5 suggests a strong signal

3. **Precision/Recall Curves:**
   - Precision above baseline indicates above-random performance
   - Best F1 quantile shows optimal trade selection threshold
   - Lift > 1 means signal beats random selection

4. **Time-to-Target:**
   - TP trades exiting faster than SL is a good sign ("winners run fast")
   - Top quantiles having faster TP exits validates signal quality
   - Large spread in timing suggests regime-dependent behavior

## 9. Advanced: Theme Customization

All visualizations support different themes for various use cases.

In [None]:
# Dark theme (for dark mode presentations)
fig = plot_hit_rate_heatmap(hit_rates, theme="dark")
fig.show()

In [None]:
# Print theme (for papers and reports)
fig = plot_profit_factor_bar(profit_factor, theme="print")
fig.show()

In [None]:
# Presentation theme (larger fonts)
fig = plot_precision_recall_curve(precision_recall, theme="presentation")
fig.show()

---

## Summary

The `BarrierAnalysis` module provides a comprehensive framework for evaluating signal quality using triple barrier outcomes. Key methods:

| Method | Returns | Purpose |
|--------|---------|--------|
| `compute_hit_rates()` | `HitRateResult` | TP/SL/Timeout rates by quantile |
| `compute_profit_factor()` | `ProfitFactorResult` | Profit factor by quantile |
| `compute_precision_recall()` | `PrecisionRecallResult` | Precision/recall curves |
| `compute_time_to_target()` | `TimeToTargetResult` | Bars to exit analysis |
| `create_tear_sheet()` | `BarrierTearSheet` | Complete analysis + figures |

For more details, see the [module documentation](../src/ml4t/diagnostic/evaluation/barrier_analysis.py).