<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [1]</a>'.</span>

# Research: ML Strategy Robustness Analysis

## Objectif

Valider la robustesse de la stratégie Machine Learning pour BTC en explorant:

1. **Impact d'une période d'entraînement étendue**: Entraîner sur 2017-2022 (5 ans) vs 2021-2022 (2 ans)
2. **Stabilité des features par régime**: Quelles features sont importantes en marché haussier vs baissier?
3. **Fréquence optimale de retraining**: Comparer 30, 60, 90 jours
4. **Walk-forward validation**: Le test ultime pour éviter l'overfitting
5. **Comparaison au buy-and-hold**: Le ML bat-il vraiment le HODL?

## Défis du ML en Trading

Le Machine Learning appliqué au trading crypto pose des défis majeurs:

### 1. Overfitting (surajustement)
- Les modèles peuvent "mémoriser" les patterns du passé sans vraie généralisation
- Symptôme: excellente performance sur données d'entraînement, mauvaise sur données test
- Solution: Walk-forward validation, régularisation, limitation de la complexité du modèle

### 2. Data Leakage (fuite de données)
- Utiliser involontairement des informations futures pour entraîner le modèle
- Exemples: features calculées sur toute la série temporelle, normalisation globale
- Solution: Séparation stricte train/test, calcul rolling des features

### 3. Regime Changes (changements de régime)
- Les crypto alternent entre marchés haussiers/baissiers avec des caractéristiques différentes
- Un modèle entraîné sur un bull market peut échouer en bear market
- Solution: Retraining fréquent, features adaptatives, walk-forward

### 4. Stationnarité
- Les distributions de prix changent dans le temps (non-stationnaires)
- Les relations entre features et labels ne sont pas constantes
- Solution: Features différenciées (returns), fenêtres glissantes

## Pourquoi le Walk-Forward est le Gold Standard

La **walk-forward validation** est la méthode la plus robuste pour tester une stratégie ML:

```
Train [========] Test [==] Retrain [========] Test [==] Retrain [...]
  2017-2018         2019      2018-2019         2020      2019-2020
```

**Avantages**:
- Simule exactement le processus de production (entraîner sur le passé, prédire le futur)
- Teste la capacité du modèle à s'adapter aux changements de régime
- Évalue la stabilité des performances dans le temps
- Évite le biais de "cherry-picking" une période de test favorable

**Paramètres clés**:
- **Train period**: Combien de données historiques pour entraîner (e.g., 730 jours)
- **Test period**: Durée de prédiction out-of-sample (e.g., 90 jours)
- **Retrain interval**: Fréquence de mise à jour du modèle (e.g., 30, 60, 90 jours)

## Hypothèses à tester

1. **H1**: Entraîner sur 2017-2022 (5 ans, incluant bull + bears) améliore le Sharpe vs 2021-2022 (2 ans)
2. **H2**: Les features importantes changent selon le régime (RSI en bear, momentum en bull)
3. **H3**: Un retraining tous les 30 jours offre le meilleur trade-off adaptation/stabilité
4. **H4**: La stratégie ML bat le buy-and-hold après coûts de transaction
5. **H5**: Le walk-forward montre des performances cohérentes (pas de lucky period)

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [1]:
# Cell 2: Setup et chargement des données
from AlgorithmImports import *
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Configuration
qb = QuantBook()
print("QuantBook initialized")

# Chargement BTC depuis 2017 (maximiser la fenêtre)
btc = qb.AddCrypto("BTCUSDT", Resolution.Daily, Market.Binance).Symbol
start_date = datetime(2017, 1, 1)
end_date = datetime.now()

print(f"Loading BTC data from {start_date} to {end_date}...")
history = qb.History(btc, start_date, end_date, Resolution.Daily)

# Conversion en DataFrame simple
if isinstance(history.index, pd.MultiIndex):
    df = history.loc[btc].copy()
else:
    df = history.copy()

print(f"\nBTC data loaded:")
print(f"  - Total bars: {len(df)}")
print(f"  - Date range: {df.index[0]} to {df.index[-1]}")
print(f"  - Price range: ${df['close'].min():.2f} to ${df['close'].max():.2f}")
print(f"  - Total return: {(df['close'].iloc[-1] / df['close'].iloc[0] - 1) * 100:.1f}%")

# Affichage du prix BTC sur toute la période
plt.figure(figsize=(14, 6))
plt.plot(df.index, df['close'], linewidth=1.5, label='BTC Price')
plt.title('Bitcoin Price History (2017-Present)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price (USDT)')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.legend()

# Marqueurs des régimes
plt.axvspan(datetime(2017, 1, 1), datetime(2018, 12, 31), alpha=0.1, color='red', label='2017-2018 Bull+Bear')
plt.axvspan(datetime(2019, 1, 1), datetime(2020, 12, 31), alpha=0.1, color='orange', label='2019-2020 Recovery+Bull')
plt.axvspan(datetime(2021, 1, 1), datetime(2022, 12, 31), alpha=0.1, color='blue', label='2021-2022 Bull+Bear')
plt.axvspan(datetime(2023, 1, 1), datetime.now(), alpha=0.1, color='green', label='2023+ Recovery')

plt.tight_layout()
plt.show()

print("\nData ready for feature engineering.")

ModuleNotFoundError: No module named 'AlgorithmImports'

In [None]:
# Cell 3: Feature Engineering - Calcul des 9 features ML

def compute_features(df_input):
    """
    Calcule les 9 features utilisées par la stratégie:
    1. SMA_ratio (close / SMA(20))
    2. RSI_14
    3. DailyReturn (pct_change)
    4-7. EMA ratios (10, 20, 50, 200) / close
    8. ADX (approximation via True Range)
    9. ATR / close (normalized)
    10. Label: next-day direction (1=up, 0=down)
    """
    df = df_input.copy()
    features = pd.DataFrame(index=df.index)
    
    # 1. SMA ratio
    features['sma_ratio'] = df['close'] / df['close'].rolling(20).mean()
    
    # 2. RSI (14-period Wilder's RSI)
    delta = df['close'].diff()
    gain = delta.where(delta > 0, 0)
    loss = -delta.where(delta < 0, 0)
    avg_gain = gain.rolling(14).mean()
    avg_loss = loss.rolling(14).mean()
    rs = avg_gain / avg_loss.replace(0, 1e-10)
    features['rsi'] = 100 - (100 / (1 + rs))
    
    # 3. Daily return
    features['daily_return'] = df['close'].pct_change()
    
    # 4-7. EMA ratios (normalized by current price)
    for period in [10, 20, 50, 200]:
        ema = df['close'].ewm(span=period, adjust=False).mean()
        features[f'ema_{period}'] = ema / df['close']
    
    # 8. ADX approximation (simplified True Range)
    high_low = df['high'] - df['low']
    high_close = (df['high'] - df['close'].shift()).abs()
    low_close = (df['low'] - df['close'].shift()).abs()
    true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
    features['adx'] = true_range.rolling(14).mean()
    
    # 9. ATR normalized
    features['atr'] = true_range.rolling(14).mean() / df['close']
    
    # LABEL: Next-day direction (1 = price up, 0 = price down)
    # CRITICAL: Use shift(-1) pour éviter data leakage
    features['label'] = (df['close'].shift(-1) > df['close']).astype(int)
    
    # Drop NaN (premiers jours + dernier jour sans label)
    features_clean = features.dropna()
    
    return features_clean

# Calcul des features sur toute la période
print("Computing features...")
features_df = compute_features(df)

print(f"\nFeatures computed:")
print(f"  - Total samples: {len(features_df)}")
print(f"  - Features: {[c for c in features_df.columns if c != 'label']}")
print(f"  - Label distribution: {features_df['label'].value_counts().to_dict()}")
print(f"  - Date range: {features_df.index[0]} to {features_df.index[-1]}")

# Vérification data leakage: le label ne doit PAS être prédictible par les features du même jour
from sklearn.ensemble import RandomForestClassifier
feature_cols = [c for c in features_df.columns if c != 'label']
X_leakage_test = features_df[feature_cols].iloc[:1000].values
y_leakage_test = features_df['label'].iloc[:1000].values
rf_leakage = RandomForestClassifier(n_estimators=50, max_depth=3, random_state=42)
rf_leakage.fit(X_leakage_test, y_leakage_test)
leakage_score = rf_leakage.score(X_leakage_test, y_leakage_test)

print(f"\nData leakage check (training accuracy on same period):")
print(f"  - Score: {leakage_score:.3f}")
if leakage_score > 0.65:
    print("  - ⚠️ WARNING: Possible data leakage (score too high)")
else:
    print("  - ✓ OK: No obvious data leakage (score reasonable for noisy financial data)")

print("\nFeatures ready for ML training.")

In [None]:
# Cell 4: Hypothèse 1 - Impact de la période d'entraînement étendue
# Comparer: 2021-2022 (2 ans) vs 2017-2022 (5 ans)
# Test sur: 2023-2025

def train_and_evaluate(features, train_start, train_end, test_start, test_end, label="Experiment"):
    """
    Entraîne un RandomForest sur [train_start, train_end] et teste sur [test_start, test_end].
    Retourne les métriques de performance.
    """
    # Séparation train/test
    train_df = features.loc[train_start:train_end]
    test_df = features.loc[test_start:test_end]
    
    if len(train_df) < 100 or len(test_df) < 30:
        print(f"Insufficient data for {label}")
        return None
    
    feature_cols = [c for c in features.columns if c != 'label']
    X_train = train_df[feature_cols].values
    y_train = train_df['label'].values
    X_test = test_df[feature_cols].values
    y_test = test_df['label'].values
    
    # Entraînement
    model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
    model.fit(X_train, y_train)
    
    # Prédictions
    y_pred = model.predict(X_test)
    y_proba = model.predict_proba(X_test)[:, 1]  # Proba de hausse
    
    # Métriques classification
    accuracy = accuracy_score(y_test, y_pred)
    
    # Simulation trading simple (long si proba > 0.6)
    signals = (y_proba > 0.6).astype(float) * y_proba  # Position sizing probabiliste
    daily_returns = test_df['daily_return'].values
    
    # Décalage signal pour éviter lookahead
    strategy_returns = daily_returns[1:] * signals[:-1]
    
    # Métriques trading
    mean_ret = strategy_returns.mean() * 365  # Annualisé
    std_ret = strategy_returns.std() * np.sqrt(365)
    sharpe = mean_ret / std_ret if std_ret > 0 else 0
    
    # Cumulative returns
    cumulative = (1 + strategy_returns).cumprod()
    max_dd = (cumulative / cumulative.cummax() - 1).min()
    
    # Buy-and-hold benchmark
    bnh_returns = daily_returns[1:]
    bnh_cumulative = (1 + bnh_returns).cumprod()
    bnh_total = bnh_cumulative[-1] - 1
    strategy_total = cumulative[-1] - 1
    
    return {
        'label': label,
        'train_samples': len(train_df),
        'test_samples': len(test_df),
        'accuracy': accuracy,
        'sharpe': sharpe,
        'annual_return': mean_ret,
        'volatility': std_ret,
        'max_drawdown': max_dd,
        'total_return': strategy_total,
        'bnh_total_return': bnh_total,
        'excess_return': strategy_total - bnh_total,
        'model': model,
        'feature_importance': dict(zip(feature_cols, model.feature_importances_))
    }

# Expérience 1: Training 2021-2022 (période actuelle de la stratégie)
print("Experiment 1: Training on 2021-2022 (2 years)")
result_2y = train_and_evaluate(
    features_df,
    train_start="2021-01-01",
    train_end="2022-12-31",
    test_start="2023-01-01",
    test_end=features_df.index[-1],
    label="2Y Training (2021-2022)"
)

# Expérience 2: Training 2017-2022 (période étendue)
print("\nExperiment 2: Training on 2017-2022 (5 years)")
result_5y = train_and_evaluate(
    features_df,
    train_start="2017-01-01",
    train_end="2022-12-31",
    test_start="2023-01-01",
    test_end=features_df.index[-1],
    label="5Y Training (2017-2022)"
)

# Comparaison
comparison = pd.DataFrame([result_2y, result_5y]).set_index('label')
print("\n" + "="*80)
print("HYPOTHESIS 1: Extended Training Period Impact")
print("="*80)
print(comparison[['train_samples', 'accuracy', 'sharpe', 'annual_return', 'max_drawdown', 'total_return', 'excess_return']])

# Verdict
print("\nVERDICT:")
if result_5y['sharpe'] > result_2y['sharpe']:
    improvement = (result_5y['sharpe'] - result_2y['sharpe']) / result_2y['sharpe'] * 100
    print(f"✓ H1 CONFIRMED: 5Y training improves Sharpe by {improvement:.1f}%")
    print(f"  Sharpe: {result_2y['sharpe']:.3f} → {result_5y['sharpe']:.3f}")
else:
    print(f"✗ H1 REJECTED: 2Y training performs better")
    print(f"  Sharpe: {result_2y['sharpe']:.3f} (2Y) vs {result_5y['sharpe']:.3f} (5Y)")

print(f"\nExcess return vs Buy-and-Hold:")
print(f"  2Y: {result_2y['excess_return']*100:+.1f}%")
print(f"  5Y: {result_5y['excess_return']*100:+.1f}%")

In [None]:
# Cell 5: Hypothèse 2 - Feature Importance par Régime
# Analyser quelles features sont importantes en bull vs bear market

def train_by_regime(features, start, end, regime_name):
    """
    Entraîne sur une période spécifique et retourne les feature importances.
    """
    regime_df = features.loc[start:end]
    
    if len(regime_df) < 100:
        return None
    
    feature_cols = [c for c in features.columns if c != 'label']
    X = regime_df[feature_cols].values
    y = regime_df['label'].values
    
    model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
    model.fit(X, y)
    
    importance_dict = dict(zip(feature_cols, model.feature_importances_))
    
    # Calcul du return total sur la période pour caractériser le régime
    total_return = (df.loc[start:end, 'close'].iloc[-1] / df.loc[start:end, 'close'].iloc[0] - 1) * 100
    
    return {
        'regime': regime_name,
        'period': f"{start} to {end}",
        'samples': len(regime_df),
        'total_return': total_return,
        'importance': importance_dict
    }

# Définition des régimes
regimes = [
    ("2017-01-01", "2017-12-31", "2017 Bull"),
    ("2018-01-01", "2018-12-31", "2018 Bear"),
    ("2019-01-01", "2020-03-31", "2019-Q1'20 Recovery"),
    ("2020-04-01", "2021-11-30", "2020-2021 Bull"),
    ("2021-12-01", "2022-12-31", "2022 Bear"),
    ("2023-01-01", features_df.index[-1].strftime("%Y-%m-%d"), "2023+ Recovery")
]

regime_results = []
for start, end, name in regimes:
    result = train_by_regime(features_df, start, end, name)
    if result:
        regime_results.append(result)
        print(f"Regime: {name:20s} | Return: {result['total_return']:+6.1f}% | Samples: {result['samples']}")

# Création d'une matrice d'importance par régime
importance_matrix = pd.DataFrame([r['importance'] for r in regime_results],
                                  index=[r['regime'] for r in regime_results])

print("\n" + "="*80)
print("HYPOTHESIS 2: Feature Importance Stability Across Regimes")
print("="*80)
print(importance_matrix.round(3))

# Visualisation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(importance_matrix.T, annot=True, fmt='.3f', cmap='YlOrRd', cbar_kws={'label': 'Importance'})
plt.title('Feature Importance by Market Regime', fontsize=14, fontweight='bold')
plt.xlabel('Market Regime')
plt.ylabel('Feature')
plt.tight_layout()
plt.show()

# Analyse de stabilité
importance_std = importance_matrix.std(axis=0).sort_values(ascending=False)
print("\nFeature Stability (std dev across regimes):")
print(importance_std)

print("\nVERDICT:")
most_stable = importance_std.idxmin()
least_stable = importance_std.idxmax()
print(f"  Most stable feature: {most_stable} (std={importance_std[most_stable]:.4f})")
print(f"  Least stable feature: {least_stable} (std={importance_std[least_stable]:.4f})")

# Top features par régime
print("\nTop feature by regime:")
for regime in importance_matrix.index:
    top_feature = importance_matrix.loc[regime].idxmax()
    top_value = importance_matrix.loc[regime, top_feature]
    print(f"  {regime:20s}: {top_feature:15s} ({top_value:.3f})")

In [None]:
# Cell 6: Walk-Forward Validation - Le Gold Standard
# Train 2 ans, test 3 mois, roll forward

def walk_forward_validation(features, train_days=730, test_days=90, retrain_interval=60):
    """
    Implémente la walk-forward validation.
    
    Paramètres:
    - train_days: Nombre de jours pour l'entraînement
    - test_days: Nombre de jours pour le test
    - retrain_interval: Fréquence de retraining (jours)
    
    Retourne les résultats de chaque fenêtre.
    """
    results = []
    feature_cols = [c for c in features.columns if c != 'label']
    
    idx = 0
    window_num = 0
    
    while idx + train_days + test_days <= len(features):
        window_num += 1
        
        # Séparation train/test
        train_slice = features.iloc[idx:idx+train_days]
        test_slice = features.iloc[idx+train_days:idx+train_days+test_days]
        
        X_train = train_slice[feature_cols].values
        y_train = train_slice['label'].values
        X_test = test_slice[feature_cols].values
        y_test = test_slice['label'].values
        
        # Entraînement
        model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
        model.fit(X_train, y_train)
        
        # Prédictions
        y_proba = model.predict_proba(X_test)[:, 1]
        y_pred = (y_proba > 0.5).astype(int)
        
        accuracy = accuracy_score(y_test, y_pred)
        
        # Simulation trading
        signals = (y_proba > 0.6).astype(float) * y_proba
        returns = test_slice['daily_return'].values
        
        # Éviter lookahead
        strategy_returns = returns[1:] * signals[:-1]
        
        # Métriques
        mean_ret = strategy_returns.mean() * 365
        std_ret = strategy_returns.std() * np.sqrt(365)
        sharpe = mean_ret / std_ret if std_ret > 0 else 0
        
        cumulative = (1 + strategy_returns).cumprod()
        total_return = cumulative[-1] - 1 if len(cumulative) > 0 else 0
        
        results.append({
            'window': window_num,
            'train_start': train_slice.index[0],
            'train_end': train_slice.index[-1],
            'test_start': test_slice.index[0],
            'test_end': test_slice.index[-1],
            'accuracy': accuracy,
            'sharpe': sharpe,
            'total_return': total_return,
            'n_trades': (signals > 0).sum()
        })
        
        print(f"Window {window_num}: Test {test_slice.index[0].strftime('%Y-%m-%d')} → {test_slice.index[-1].strftime('%Y-%m-%d')} | "
              f"Sharpe: {sharpe:.3f} | Accuracy: {accuracy:.2%} | Return: {total_return*100:+.1f}%")
        
        # Roll forward
        idx += retrain_interval
    
    return pd.DataFrame(results)

print("=" * 80)
print("WALK-FORWARD VALIDATION (Train: 730d, Test: 90d, Retrain: 60d)")
print("=" * 80)
print()

wf_results = walk_forward_validation(features_df, train_days=730, test_days=90, retrain_interval=60)

print("\n" + "=" * 80)
print("WALK-FORWARD SUMMARY")
print("=" * 80)
print(f"Total windows: {len(wf_results)}")
print(f"Average Sharpe: {wf_results['sharpe'].mean():.3f} ± {wf_results['sharpe'].std():.3f}")
print(f"Median Sharpe: {wf_results['sharpe'].median():.3f}")
print(f"Best Sharpe: {wf_results['sharpe'].max():.3f}")
print(f"Worst Sharpe: {wf_results['sharpe'].min():.3f}")
print(f"Positive Sharpe windows: {(wf_results['sharpe'] > 0).sum()} / {len(wf_results)}")
print(f"Average accuracy: {wf_results['accuracy'].mean():.2%}")
print(f"Average return per window: {wf_results['total_return'].mean()*100:+.2f}%")

# Visualisation de la stabilité
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Sharpe par fenêtre
axes[0].plot(wf_results['window'], wf_results['sharpe'], marker='o', linewidth=2, markersize=6)
axes[0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[0].axhline(y=wf_results['sharpe'].mean(), color='green', linestyle='--', alpha=0.5, label=f"Mean: {wf_results['sharpe'].mean():.3f}")
axes[0].set_title('Sharpe Ratio by Walk-Forward Window', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Window Number')
axes[0].set_ylabel('Sharpe Ratio')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Returns cumulatifs
cumulative_returns = (1 + wf_results['total_return']).cumprod() - 1
axes[1].plot(wf_results['window'], cumulative_returns * 100, marker='o', linewidth=2, markersize=6, color='green')
axes[1].set_title('Cumulative Returns Across Walk-Forward Windows', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Window Number')
axes[1].set_ylabel('Cumulative Return (%)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nVERDICT:")
if wf_results['sharpe'].mean() > 0.3 and (wf_results['sharpe'] > 0).sum() / len(wf_results) > 0.6:
    print("✓ H5 CONFIRMED: Walk-forward shows consistent positive performance")
    print(f"  {(wf_results['sharpe'] > 0).sum()}/{len(wf_results)} windows with positive Sharpe")
else:
    print("✗ H5 REJECTED: Performance not consistent across walk-forward windows")
    print(f"  Only {(wf_results['sharpe'] > 0).sum()}/{len(wf_results)} windows with positive Sharpe")

In [None]:
# Cell 7: Hypothèse 3 - Fréquence Optimale de Retraining
# Comparer 30, 60, 90 jours de retraining via walk-forward

print("=" * 80)
print("HYPOTHESIS 3: Optimal Retraining Frequency")
print("=" * 80)
print()

retrain_intervals = [30, 60, 90]
retrain_comparison = []

for interval in retrain_intervals:
    print(f"\nTesting retraining interval: {interval} days")
    print("-" * 80)
    
    wf = walk_forward_validation(features_df, train_days=730, test_days=90, retrain_interval=interval)
    
    retrain_comparison.append({
        'interval': interval,
        'n_windows': len(wf),
        'mean_sharpe': wf['sharpe'].mean(),
        'median_sharpe': wf['sharpe'].median(),
        'std_sharpe': wf['sharpe'].std(),
        'positive_pct': (wf['sharpe'] > 0).sum() / len(wf) * 100,
        'mean_accuracy': wf['accuracy'].mean(),
        'mean_return': wf['total_return'].mean()
    })

retrain_df = pd.DataFrame(retrain_comparison).set_index('interval')

print("\n" + "=" * 80)
print("RETRAINING FREQUENCY COMPARISON")
print("=" * 80)
print(retrain_df)

# Visualisation
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Mean Sharpe
axes[0].bar(retrain_df.index, retrain_df['mean_sharpe'], color=['#3498db', '#2ecc71', '#e74c3c'])
axes[0].set_title('Mean Sharpe by Retraining Interval', fontweight='bold')
axes[0].set_xlabel('Retraining Interval (days)')
axes[0].set_ylabel('Mean Sharpe Ratio')
axes[0].grid(True, alpha=0.3, axis='y')

# Sharpe Stability (std)
axes[1].bar(retrain_df.index, retrain_df['std_sharpe'], color=['#3498db', '#2ecc71', '#e74c3c'])
axes[1].set_title('Sharpe Stability (Lower is Better)', fontweight='bold')
axes[1].set_xlabel('Retraining Interval (days)')
axes[1].set_ylabel('Sharpe Std Dev')
axes[1].grid(True, alpha=0.3, axis='y')

# Positive Windows %
axes[2].bar(retrain_df.index, retrain_df['positive_pct'], color=['#3498db', '#2ecc71', '#e74c3c'])
axes[2].set_title('% of Positive Sharpe Windows', fontweight='bold')
axes[2].set_xlabel('Retraining Interval (days)')
axes[2].set_ylabel('Positive Windows (%)')
axes[2].axhline(y=50, color='red', linestyle='--', alpha=0.5, label='50% baseline')
axes[2].legend()
axes[2].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Verdict
best_interval = retrain_df['mean_sharpe'].idxmax()
best_sharpe = retrain_df.loc[best_interval, 'mean_sharpe']

print("\nVERDICT:")
print(f"✓ Optimal retraining interval: {best_interval} days")
print(f"  Mean Sharpe: {best_sharpe:.3f}")
print(f"  Stability: {retrain_df.loc[best_interval, 'std_sharpe']:.3f} std")
print(f"  Positive windows: {retrain_df.loc[best_interval, 'positive_pct']:.1f}%")

if best_interval == 30:
    print("\n  → Current strategy (30d) is optimal for adaptation")
elif best_interval == 60:
    print("\n  → RECOMMENDATION: Increase retraining interval to 60 days for better stability")
else:
    print("\n  → RECOMMENDATION: Increase retraining interval to 90 days (less overfitting)")

In [None]:
# Cell 8: Hypothèse 4 - Comparaison au Buy-and-Hold
# La stratégie ML bat-elle vraiment le HODL?

print("=" * 80)
print("HYPOTHESIS 4: ML vs Buy-and-Hold Benchmark")
print("=" * 80)
print()

# Période de test: 2023 à aujourd'hui (out-of-sample)
test_start = "2023-01-01"
test_end = features_df.index[-1]

# ML Strategy (train sur 2017-2022)
ml_result = train_and_evaluate(
    features_df,
    train_start="2017-01-01",
    train_end="2022-12-31",
    test_start=test_start,
    test_end=test_end,
    label="ML Strategy (5Y training)"
)

# Buy-and-Hold
test_df = features_df.loc[test_start:test_end]
bnh_returns = test_df['daily_return'].values
bnh_cumulative = (1 + bnh_returns).cumprod()
bnh_total_return = bnh_cumulative[-1] - 1
bnh_sharpe = (bnh_returns.mean() * 365) / (bnh_returns.std() * np.sqrt(365))
bnh_max_dd = (bnh_cumulative / np.maximum.accumulate(bnh_cumulative) - 1).min()

# Comparaison
comparison = pd.DataFrame([
    {
        'Strategy': 'Buy-and-Hold BTC',
        'Total Return (%)': bnh_total_return * 100,
        'Sharpe Ratio': bnh_sharpe,
        'Max Drawdown (%)': bnh_max_dd * 100,
        'Annual Return (%)': (bnh_returns.mean() * 365) * 100,
        'Volatility (%)': (bnh_returns.std() * np.sqrt(365)) * 100
    },
    {
        'Strategy': 'ML Strategy',
        'Total Return (%)': ml_result['total_return'] * 100,
        'Sharpe Ratio': ml_result['sharpe'],
        'Max Drawdown (%)': ml_result['max_drawdown'] * 100,
        'Annual Return (%)': ml_result['annual_return'] * 100,
        'Volatility (%)': ml_result['volatility'] * 100
    }
]).set_index('Strategy')

print(comparison.round(2))

# Visualisation comparative
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Total Return
strategies = comparison.index.tolist()
returns = comparison['Total Return (%)'].values
axes[0, 0].bar(strategies, returns, color=['#3498db', '#2ecc71'])
axes[0, 0].set_title('Total Return (%)', fontweight='bold')
axes[0, 0].set_ylabel('Return (%)')
axes[0, 0].grid(True, alpha=0.3, axis='y')

# Sharpe Ratio
sharpes = comparison['Sharpe Ratio'].values
axes[0, 1].bar(strategies, sharpes, color=['#3498db', '#2ecc71'])
axes[0, 1].set_title('Sharpe Ratio (Risk-Adjusted)', fontweight='bold')
axes[0, 1].set_ylabel('Sharpe')
axes[0, 1].grid(True, alpha=0.3, axis='y')

# Max Drawdown
drawdowns = comparison['Max Drawdown (%)'].values
axes[1, 0].bar(strategies, drawdowns, color=['#e74c3c', '#e67e22'])
axes[1, 0].set_title('Max Drawdown (%) - Lower is Better', fontweight='bold')
axes[1, 0].set_ylabel('Drawdown (%)')
axes[1, 0].grid(True, alpha=0.3, axis='y')

# Volatility
vols = comparison['Volatility (%)'].values
axes[1, 1].bar(strategies, vols, color=['#9b59b6', '#8e44ad'])
axes[1, 1].set_title('Annualized Volatility (%)', fontweight='bold')
axes[1, 1].set_ylabel('Volatility (%)')
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Verdict
print("\n" + "=" * 80)
print("VERDICT - ML vs Buy-and-Hold")
print("=" * 80)

excess_return = ml_result['total_return'] - bnh_total_return
sharpe_improvement = ml_result['sharpe'] - bnh_sharpe

print(f"\nExcess Return: {excess_return * 100:+.2f}%")
print(f"Sharpe Improvement: {sharpe_improvement:+.3f}")

if ml_result['sharpe'] > bnh_sharpe:
    print("\n✓ H4 CONFIRMED: ML strategy beats buy-and-hold on risk-adjusted basis")
    print(f"  ML Sharpe: {ml_result['sharpe']:.3f} vs BnH Sharpe: {bnh_sharpe:.3f}")
else:
    print("\n✗ H4 REJECTED: Buy-and-hold has better risk-adjusted returns")
    print(f"  BnH Sharpe: {bnh_sharpe:.3f} vs ML Sharpe: {ml_result['sharpe']:.3f}")

if ml_result['max_drawdown'] > bnh_max_dd:
    print(f"\n⚠️ WARNING: ML has worse max drawdown ({ml_result['max_drawdown']*100:.1f}% vs {bnh_max_dd*100:.1f}%)")
else:
    print(f"\n✓ ML reduces max drawdown ({ml_result['max_drawdown']*100:.1f}% vs {bnh_max_dd*100:.1f}%)")

## Findings - Synthèse des Résultats

### Hypothèse 1: Période d'Entraînement Étendue

**Résultat**: [À compléter après exécution]

- **2021-2022 (2 ans)**: Sharpe = ?, Return = ?
- **2017-2022 (5 ans)**: Sharpe = ?, Return = ?
- **Conclusion**: L'entraînement sur 5 ans incluant plusieurs cycles bull/bear...

### Hypothèse 2: Stabilité des Features par Régime

**Résultat**: [À compléter après exécution]

- **Features les plus stables**: [Top 3]
- **Features changeantes**: [Lesquelles varient entre bull/bear]
- **Conclusion**: Les features momentum (EMA) sont plus importantes en bull, RSI/ADX en bear...

### Hypothèse 3: Fréquence Optimale de Retraining

**Résultat**: [À compléter après exécution]

| Interval | Mean Sharpe | Stability | Positive % |
|----------|-------------|-----------|------------|
| 30 jours | ? | ? | ? |
| 60 jours | ? | ? | ? |
| 90 jours | ? | ? | ? |

- **Conclusion**: Un retraining tous les X jours offre le meilleur compromis...

### Hypothèse 4: ML vs Buy-and-Hold

**Résultat**: [À compléter après exécution]

| Métrique | BnH | ML | Différence |
|----------|-----|----|-----------|
| Total Return | ? | ? | ? |
| Sharpe | ? | ? | ? |
| Max DD | ? | ? | ? |

- **Conclusion**: Le ML [bat/ne bat pas] le buy-and-hold sur base ajustée au risque...

### Hypothèse 5: Walk-Forward Consistency

**Résultat**: [À compléter après exécution]

- **Nombre de fenêtres**: ?
- **Sharpe moyen**: ? ± ?
- **Windows positifs**: ? / ?
- **Conclusion**: La stratégie [montre/ne montre pas] une performance cohérente...

## Insights Clés

1. **Data Leakage**: [Observations sur la séparation train/test]
2. **Regime Dependency**: [Comment la stratégie performe selon le régime]
3. **Overfitting Risk**: [Signes d'overfitting détectés ou non]
4. **Retraining Trade-off**: [Balance entre adaptation et stabilité]

## Limites de l'Étude

- **Coûts de transaction**: Non inclus (spread, slippage, fees)
- **Exécution parfaite**: Assume fills instantanés au prix prédit
- **Régime futur**: Les patterns passés peuvent ne pas se répéter
- **Features statiques**: Pas d'adaptation dynamique des features

## Recommandations pour la Stratégie Principale

[À compléter après exécution avec les meilleurs paramètres identifiés]

In [None]:
# Cell 10: Recommendations JSON - Pour implémentation dans main.py

import json

# Recommendations basées sur les résultats de recherche
recommendations = {
    "research_date": datetime.now().strftime("%Y-%m-%d"),
    "project": "BTC-MachineLearning (21047688)",
    "methodology": "Walk-forward validation + regime analysis",
    
    "findings": {
        "h1_extended_training": {
            "confirmed": None,  # À remplir après exécution
            "optimal_training_period": "2017-2022 (5 years)" if None else "2021-2022 (2 years)",
            "sharpe_improvement": None
        },
        "h2_feature_stability": {
            "most_stable_features": [],  # Top 3
            "regime_dependent_features": [],  # Features changeantes
            "recommendation": "Consider regime-adaptive feature weighting"
        },
        "h3_retraining_frequency": {
            "optimal_interval_days": None,  # 30, 60, ou 90
            "current_interval": 30,
            "change_recommended": False
        },
        "h4_ml_vs_bnh": {
            "ml_beats_bnh": None,
            "excess_return_pct": None,
            "sharpe_improvement": None
        },
        "h5_walk_forward": {
            "consistent_performance": None,
            "mean_sharpe": None,
            "positive_windows_pct": None
        }
    },
    
    "implementation_recommendations": [
        {
            "parameter": "TRAIN_START",
            "current_value": "2021-01-01",
            "recommended_value": None,  # Basé sur H1
            "reason": "Extended training period improves generalization"
        },
        {
            "parameter": "RETRAIN_INTERVAL_DAYS",
            "current_value": 30,
            "recommended_value": None,  # Basé sur H3
            "reason": "Optimal balance between adaptation and stability"
        },
        {
            "parameter": "CONFIDENCE_LONG_THRESHOLD",
            "current_value": 0.6,
            "recommended_value": 0.6,  # Peut ajuster selon walk-forward
            "reason": "Walk-forward validation suggests..."
        }
    ],
    
    "next_steps": [
        "Update main.py with optimal training period",
        "Adjust retraining interval if needed",
        "Compile and backtest with new parameters",
        "Monitor live performance vs walk-forward expectations",
        "Consider adding transaction costs to backtest"
    ],
    
    "risks": [
        "Past performance does not guarantee future results",
        "Crypto markets are highly volatile and non-stationary",
        "Regime changes may invalidate historical patterns",
        "Transaction costs not included in this research"
    ]
}

# Sauvegarde
recommendations_json = json.dumps(recommendations, indent=2)
print(recommendations_json)

# Note: Remplir les valeurs None après exécution complète du notebook
print("\n" + "="*80)
print("RESEARCH COMPLETE")
print("="*80)
print("\nNext steps:")
print("1. Fill in the None values in recommendations based on cell outputs")
print("2. Update main.py with optimal parameters")
print("3. Compile via MCP QC")
print("4. Backtest via web UI (API requires paid account)")
print("5. Compare backtest results to walk-forward predictions")