# QC-Py-15 - Parameter Optimization et Walk-Forward Analysis

> **Optimiser les parametres de trading tout en evitant l'overfitting**
> Duree: 75 minutes | Niveau: Avance | Python + QuantConnect

---

## Objectifs d'Apprentissage

A la fin de ce notebook, vous serez capable de :

1. Comprendre les **risques de l'optimisation** (overfitting, curve fitting)
2. Definir des **parametres optimisables** dans QCAlgorithm
3. Implementer une **Grid Search** manuelle pour tester des combinaisons
4. Realiser une **Walk-Forward Analysis** pour valider la robustesse
5. Detecter et **eviter l'overfitting** avec des techniques eprouvees
6. Utiliser des **fonctions de fitness multi-objectif**
7. Tester la **robustesse** des parametres optimaux

## Prerequis

- Notebooks QC-Py-01 a QC-Py-12 completes
- Comprehension des metriques de performance (Sharpe, Drawdown)
- Notions de base en statistiques et machine learning

## Structure du Notebook

1. Introduction a l'Optimisation (15 min)
2. Parameter Sets dans QC (20 min)
3. Grid Search Manuel (25 min)
4. Walk-Forward Analysis (25 min)
5. Eviter l'Overfitting (15 min)
6. Optimisation Avancee (20 min)
7. Exemple Complet (15 min)

---

## Setup et Imports

In [None]:
# Imports standards
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from typing import Dict, List, Tuple, Optional, Callable
from itertools import product
import warnings
warnings.filterwarnings('ignore')

# Configuration matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

# Ajouter shared/ au path
import sys
sys.path.append('../shared')

print("Imports reussis")

In [None]:
# Importer helpers du repository
from backtest_helpers import calculate_metrics, format_backtest_summary
from plotting import plot_backtest_results

print("Helpers importes avec succes")

---

## Partie 1: Introduction a l'Optimisation (15 min)

### 1.1 Pourquoi Optimiser? (10 min)

L'optimisation des parametres vise a trouver les valeurs qui maximisent la performance d'une strategie. Cependant, c'est un processus delicat qui comporte des risques importants.

#### Objectifs de l'Optimisation

| Objectif | Description | Exemple |
|----------|-------------|---------|
| **Trouver les meilleurs parametres** | Identifier les valeurs optimales | SMA fast=10, slow=50 |
| **Maximiser la performance ajustee** | Sharpe ratio, Calmar ratio | Sharpe > 1.5 |
| **Equilibrer rendement et risque** | Compromis return/drawdown | Max DD < 15% |
| **Assurer la robustesse** | Parametres stables dans le temps | Performance OOS consistante |

In [None]:
# Illustration: Impact des parametres sur la performance

# Generer des donnees de prix simulees (2 ans)
np.random.seed(42)
n_days = 504
dates = pd.date_range('2022-01-01', periods=n_days, freq='B')

# Prix avec tendance et bruit
trend = np.linspace(100, 130, n_days)
noise = np.cumsum(np.random.randn(n_days) * 0.5)
prices = pd.Series(trend + noise, index=dates)

def simulate_sma_crossover(prices: pd.Series, fast: int, slow: int) -> pd.Series:
    """
    Simule une strategie SMA crossover simple.
    
    Returns:
        Serie equity curve
    """
    sma_fast = prices.rolling(fast).mean()
    sma_slow = prices.rolling(slow).mean()
    
    # Signaux: 1 si fast > slow, 0 sinon
    signal = (sma_fast > sma_slow).astype(int)
    signal = signal.shift(1).fillna(0)  # Eviter lookahead bias
    
    # Returns journaliers
    returns = prices.pct_change()
    
    # Returns de la strategie
    strategy_returns = signal * returns
    
    # Equity curve
    equity = 100000 * (1 + strategy_returns).cumprod()
    
    return equity

# Tester differentes combinaisons
param_sets = [
    (5, 20),
    (10, 50),
    (20, 100),
    (30, 150)
]

plt.figure(figsize=(14, 6))

for fast, slow in param_sets:
    equity = simulate_sma_crossover(prices, fast, slow)
    plt.plot(equity.index, equity / 1000, label=f'SMA({fast}, {slow})', linewidth=2)

plt.xlabel('Date', fontsize=12)
plt.ylabel('Equity (k$)', fontsize=12)
plt.title('Impact des Parametres SMA sur la Performance', fontsize=14, fontweight='bold')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Les parametres ont un impact significatif sur la performance!")

#### Risques de l'Optimisation

L'optimisation comporte des pieges importants:

| Risque | Description | Consequence |
|--------|-------------|-------------|
| **Overfitting** | Parametres sur-adaptes aux donnees historiques | Echec en live trading |
| **Curve Fitting** | Trouver des patterns qui n'existent pas | Faux signaux |
| **Data Mining Bias** | Tester trop de combinaisons | Faux positifs statistiques |
| **Lookahead Bias** | Utiliser information future | Performance irrealiste |
| **Selection Bias** | Choisir la meilleure periode | Non representatif |

In [None]:
# Demonstration de l'overfitting

# Donnees d'entrainement (In-Sample)
train_prices = prices[:300]
# Donnees de test (Out-of-Sample)
test_prices = prices[300:]

# Optimiser sur train
best_sharpe_is = -np.inf
best_params_is = None

results_is = []

for fast in range(3, 30, 3):
    for slow in range(20, 120, 10):
        if fast >= slow:
            continue
        
        equity = simulate_sma_crossover(train_prices, fast, slow)
        returns = equity.pct_change().dropna()
        
        if len(returns) > 0 and returns.std() > 0:
            sharpe = (returns.mean() * 252) / (returns.std() * np.sqrt(252))
        else:
            sharpe = 0
        
        results_is.append({
            'fast': fast,
            'slow': slow,
            'sharpe': sharpe
        })
        
        if sharpe > best_sharpe_is:
            best_sharpe_is = sharpe
            best_params_is = (fast, slow)

# Tester les meilleurs parametres sur OOS
equity_is = simulate_sma_crossover(train_prices, *best_params_is)
equity_oos = simulate_sma_crossover(test_prices, *best_params_is)

returns_is = equity_is.pct_change().dropna()
returns_oos = equity_oos.pct_change().dropna()

sharpe_is = (returns_is.mean() * 252) / (returns_is.std() * np.sqrt(252))
sharpe_oos = (returns_oos.mean() * 252) / (returns_oos.std() * np.sqrt(252))

print("="*60)
print("DEMONSTRATION DE L'OVERFITTING")
print("="*60)
print(f"\nMeilleurs parametres (optimises sur In-Sample): SMA({best_params_is[0]}, {best_params_is[1]})")
print(f"\n{'Metrique':<25} {'In-Sample':>15} {'Out-of-Sample':>15}")
print("-"*60)
print(f"{'Sharpe Ratio':<25} {sharpe_is:>15.2f} {sharpe_oos:>15.2f}")
print(f"{'Degradation':<25} {'-':>15} {((sharpe_oos-sharpe_is)/sharpe_is*100):>14.1f}%")
print("\n> La degradation IS -> OOS est un signe classique d'overfitting!")

### 1.2 Types d'Optimisation (5 min)

Plusieurs approches existent pour optimiser les parametres:

| Methode | Description | Avantages | Inconvenients |
|---------|-------------|-----------|---------------|
| **Grid Search** | Teste toutes les combinaisons | Exhaustif, simple | Lent, cout exponentiel |
| **Random Search** | Echantillonne aleatoirement | Plus rapide, bon coverage | Peut manquer l'optimum |
| **Bayesian Optimization** | Modele probabiliste | Efficace, peu d'essais | Complexe a implementer |
| **Genetic Algorithms** | Evolution des parametres | Espace large, parallelisable | Pas de garantie optimum |

In [None]:
# Comparaison des methodes d'optimisation

def count_combinations(param_ranges: Dict[str, List]) -> int:
    """Compte le nombre de combinaisons pour Grid Search."""
    total = 1
    for values in param_ranges.values():
        total *= len(values)
    return total

# Exemple d'espace de parametres
param_space = {
    'fast_period': list(range(5, 50, 5)),       # 9 valeurs
    'slow_period': list(range(20, 200, 20)),    # 9 valeurs
    'stop_loss': [0.02, 0.03, 0.05, 0.07],      # 4 valeurs
    'take_profit': [0.05, 0.10, 0.15, 0.20],    # 4 valeurs
}

n_combinations = count_combinations(param_space)
time_per_backtest = 5  # secondes

print("="*50)
print("COMPARAISON DES METHODES D'OPTIMISATION")
print("="*50)
print(f"\nEspace de parametres:")
for param, values in param_space.items():
    print(f"  - {param}: {len(values)} valeurs")

print(f"\nNombre total de combinaisons: {n_combinations:,}")
print(f"Temps estime (Grid Search): {n_combinations * time_per_backtest / 60:.1f} minutes")
print(f"Temps estime (Random 100): {100 * time_per_backtest / 60:.1f} minutes")
print(f"Temps estime (Bayesian 50): {50 * time_per_backtest / 60:.1f} minutes")

---

## Partie 2: Parameter Sets dans QuantConnect (20 min)

### 2.1 Definir des Parametres Optimisables (10 min)

QuantConnect permet de definir des parametres qui peuvent etre modifies sans changer le code de l'algorithme. C'est essentiel pour l'optimisation.

In [None]:
# Exemple de strategie avec parametres optimisables

optimizable_strategy_code = '''
from AlgorithmImports import *

class OptimizableStrategy(QCAlgorithm):
    """
    Strategie SMA Crossover avec parametres optimisables.
    
    Parametres:
        - fast_period: Periode de la SMA rapide
        - slow_period: Periode de la SMA lente
        - stop_loss: Pourcentage de stop loss
        - take_profit: Pourcentage de take profit
    """
    
    def Initialize(self):
        self.SetStartDate(2022, 1, 1)
        self.SetEndDate(2023, 12, 31)
        self.SetCash(100000)
        
        # === PARAMETRES OPTIMISABLES ===
        # GetParameter() recupere la valeur passee par l'optimiseur
        # Le 2eme argument est la valeur par defaut
        
        self.fast_period = self.GetParameter("fast_period", 10)
        self.slow_period = self.GetParameter("slow_period", 50)
        self.stop_loss = self.GetParameter("stop_loss", 0.05)
        self.take_profit = self.GetParameter("take_profit", 0.10)
        
        # IMPORTANT: Convertir en types appropries
        # GetParameter retourne toujours une string!
        self.fast_period = int(self.fast_period)
        self.slow_period = int(self.slow_period)
        self.stop_loss = float(self.stop_loss)
        self.take_profit = float(self.take_profit)
        
        # Validation des parametres
        if self.fast_period >= self.slow_period:
            raise ValueError("fast_period doit etre < slow_period")
        
        # Setup trading
        self.symbol = self.AddEquity("SPY", Resolution.Daily).Symbol
        
        # Indicateurs utilisant les parametres
        self.sma_fast = self.SMA(self.symbol, self.fast_period, Resolution.Daily)
        self.sma_slow = self.SMA(self.symbol, self.slow_period, Resolution.Daily)
        
        # Variables de tracking
        self.entry_price = 0
        
        # Warmup
        self.SetWarmup(self.slow_period)
        
        # Log des parametres
        self.Debug(f"Parameters: fast={self.fast_period}, slow={self.slow_period}, "
                   f"SL={self.stop_loss:.2%}, TP={self.take_profit:.2%}")
    
    def OnData(self, data):
        if self.IsWarmingUp:
            return
        
        if not (self.sma_fast.IsReady and self.sma_slow.IsReady):
            return
        
        price = self.Securities[self.symbol].Price
        
        # Verifier stop loss / take profit si position ouverte
        if self.Portfolio[self.symbol].Invested:
            pnl_pct = (price - self.entry_price) / self.entry_price
            
            if pnl_pct <= -self.stop_loss:
                self.Liquidate(self.symbol)
                self.Debug(f"{self.Time.date()}: STOP LOSS hit at {pnl_pct:.2%}")
                return
            
            if pnl_pct >= self.take_profit:
                self.Liquidate(self.symbol)
                self.Debug(f"{self.Time.date()}: TAKE PROFIT hit at {pnl_pct:.2%}")
                return
        
        # Logique de trading
        if not self.Portfolio[self.symbol].Invested:
            # Signal d\'achat: SMA fast croise au-dessus de SMA slow
            if self.sma_fast.Current.Value > self.sma_slow.Current.Value:
                self.SetHoldings(self.symbol, 1.0)
                self.entry_price = price
        else:
            # Signal de vente: SMA fast croise en-dessous de SMA slow
            if self.sma_fast.Current.Value < self.sma_slow.Current.Value:
                self.Liquidate(self.symbol)
'''

print("Strategie OptimizableStrategy:")
print(optimizable_strategy_code)

### 2.2 Configuration d'Optimisation (JSON) (10 min)

Dans QuantConnect, l'optimisation est configuree via un fichier JSON qui definit les plages de parametres.

In [None]:
import json

# Configuration d'optimisation QuantConnect
optimization_config = {
    "algorithm-id": "OptimizableStrategy",
    "parameters": [
        {
            "name": "fast_period",
            "min": 5,
            "max": 50,
            "step": 5
        },
        {
            "name": "slow_period",
            "min": 20,
            "max": 200,
            "step": 20
        },
        {
            "name": "stop_loss",
            "min": 0.02,
            "max": 0.10,
            "step": 0.02
        },
        {
            "name": "take_profit",
            "min": 0.05,
            "max": 0.20,
            "step": 0.05
        }
    ],
    "optimization": {
        "target": "SharpeRatio",
        "direction": "maximize",
        "constraint": {
            "MaximumDrawdown": {
                "max": 0.20
            }
        }
    },
    "backtest": {
        "start": "2022-01-01",
        "end": "2023-12-31"
    }
}

print("Configuration d'optimisation QuantConnect:")
print(json.dumps(optimization_config, indent=2))

# Calculer le nombre de combinaisons
n_fast = (optimization_config['parameters'][0]['max'] - optimization_config['parameters'][0]['min']) // optimization_config['parameters'][0]['step'] + 1
n_slow = (optimization_config['parameters'][1]['max'] - optimization_config['parameters'][1]['min']) // optimization_config['parameters'][1]['step'] + 1
n_sl = int((optimization_config['parameters'][2]['max'] - optimization_config['parameters'][2]['min']) / optimization_config['parameters'][2]['step']) + 1
n_tp = int((optimization_config['parameters'][3]['max'] - optimization_config['parameters'][3]['min']) / optimization_config['parameters'][3]['step']) + 1

total = n_fast * n_slow * n_sl * n_tp
print(f"\nNombre total de combinaisons: {total}")

In [None]:
# Fitness functions disponibles dans QuantConnect

fitness_functions = {
    'SharpeRatio': 'Return ajuste au risque (volatilite totale)',
    'SortinoRatio': 'Return ajuste au risque (downside only)',
    'CalmarRatio': 'CAGR / Max Drawdown',
    'ProbabilisticSharpeRatio': 'Sharpe avec correction statistique',
    'CompoundingAnnualReturn': 'CAGR pur',
    'TotalReturn': 'Return total',
    'MaxDrawdown': 'Drawdown maximum (a minimiser)',
    'WinRate': 'Pourcentage de trades gagnants',
    'ProfitFactor': 'Gains / Pertes'
}

print("Fitness Functions disponibles dans QuantConnect:")
print("="*60)
for name, description in fitness_functions.items():
    print(f"  - {name:<30} : {description}")

---

## Partie 3: Grid Search Manuel (25 min)

### 3.1 Implementation Grid Search

Nous allons implementer une Grid Search complete pour optimiser une strategie SMA Crossover.

In [None]:
def backtest_sma_strategy(prices: pd.Series, 
                          fast_period: int, 
                          slow_period: int,
                          stop_loss: float = 0.05,
                          take_profit: float = 0.10,
                          initial_capital: float = 100000) -> Dict:
    """
    Backtest complet d'une strategie SMA Crossover.
    
    Args:
        prices: Serie de prix
        fast_period: Periode SMA rapide
        slow_period: Periode SMA lente
        stop_loss: Stop loss en pourcentage
        take_profit: Take profit en pourcentage
        initial_capital: Capital initial
    
    Returns:
        Dict avec metriques de performance
    """
    if fast_period >= slow_period:
        return {'valid': False}
    
    # Calculer SMAs
    sma_fast = prices.rolling(fast_period).mean()
    sma_slow = prices.rolling(slow_period).mean()
    
    # Initialisation
    capital = initial_capital
    position = 0
    entry_price = 0
    trades = []
    equity = [capital]
    
    for i in range(slow_period, len(prices)):
        current_price = prices.iloc[i]
        
        if position > 0:
            # Verifier SL/TP
            pnl_pct = (current_price - entry_price) / entry_price
            
            if pnl_pct <= -stop_loss or pnl_pct >= take_profit:
                # Fermer position
                pnl = position * (current_price - entry_price)
                capital += position * current_price
                trades.append({
                    'pnl': pnl,
                    'pnl_pct': pnl_pct,
                    'exit_reason': 'SL' if pnl_pct <= -stop_loss else 'TP'
                })
                position = 0
                entry_price = 0
            
            # Signal de vente (crossover down)
            elif sma_fast.iloc[i] < sma_slow.iloc[i] and sma_fast.iloc[i-1] >= sma_slow.iloc[i-1]:
                pnl = position * (current_price - entry_price)
                capital += position * current_price
                trades.append({
                    'pnl': pnl,
                    'pnl_pct': pnl_pct,
                    'exit_reason': 'SIGNAL'
                })
                position = 0
                entry_price = 0
        
        else:
            # Signal d'achat (crossover up)
            if sma_fast.iloc[i] > sma_slow.iloc[i] and sma_fast.iloc[i-1] <= sma_slow.iloc[i-1]:
                position = capital / current_price
                entry_price = current_price
                capital = 0
        
        # Calculer equity
        current_equity = capital + position * current_price
        equity.append(current_equity)
    
    # Fermer position ouverte a la fin
    if position > 0:
        pnl = position * (prices.iloc[-1] - entry_price)
        trades.append({
            'pnl': pnl,
            'pnl_pct': (prices.iloc[-1] - entry_price) / entry_price,
            'exit_reason': 'END'
        })
    
    # Calculer metriques
    equity_series = pd.Series(equity, index=prices.index[slow_period-1:])
    returns = equity_series.pct_change().dropna()
    
    if len(returns) == 0 or returns.std() == 0:
        return {'valid': False}
    
    # Metriques
    total_return = (equity[-1] - initial_capital) / initial_capital
    sharpe = (returns.mean() * 252) / (returns.std() * np.sqrt(252))
    
    # Max Drawdown
    cummax = equity_series.cummax()
    drawdown = (equity_series - cummax) / cummax
    max_drawdown = drawdown.min()
    
    # Trade stats
    if len(trades) > 0:
        wins = [t for t in trades if t['pnl'] > 0]
        win_rate = len(wins) / len(trades)
        avg_win = np.mean([t['pnl_pct'] for t in wins]) if wins else 0
        losses = [t for t in trades if t['pnl'] <= 0]
        avg_loss = np.mean([t['pnl_pct'] for t in losses]) if losses else 0
    else:
        win_rate = 0
        avg_win = 0
        avg_loss = 0
    
    return {
        'valid': True,
        'total_return': total_return,
        'sharpe': sharpe,
        'max_drawdown': max_drawdown,
        'n_trades': len(trades),
        'win_rate': win_rate,
        'avg_win': avg_win,
        'avg_loss': avg_loss,
        'equity': equity_series
    }

print("Fonction backtest_sma_strategy() definie")

In [None]:
# Grid Search complet

# Definir l'espace de parametres
fast_periods = [5, 10, 15, 20, 25, 30]
slow_periods = [50, 75, 100, 125, 150, 175, 200]
stop_losses = [0.03, 0.05, 0.07, 0.10]
take_profits = [0.05, 0.10, 0.15, 0.20]

print("Execution de la Grid Search...")
print(f"Combinaisons a tester: {len(fast_periods) * len(slow_periods) * len(stop_losses) * len(take_profits)}")

results = []
total_combinations = 0
valid_combinations = 0

for fast, slow, sl, tp in product(fast_periods, slow_periods, stop_losses, take_profits):
    total_combinations += 1
    
    if fast >= slow:
        continue  # Skip invalid combinations
    
    # Executer backtest
    result = backtest_sma_strategy(
        prices, 
        fast_period=fast, 
        slow_period=slow,
        stop_loss=sl,
        take_profit=tp
    )
    
    if not result['valid']:
        continue
    
    valid_combinations += 1
    
    results.append({
        'fast_period': fast,
        'slow_period': slow,
        'stop_loss': sl,
        'take_profit': tp,
        'sharpe': result['sharpe'],
        'total_return': result['total_return'],
        'max_drawdown': result['max_drawdown'],
        'n_trades': result['n_trades'],
        'win_rate': result['win_rate']
    })

print(f"\nGrid Search complete!")
print(f"Combinaisons valides: {valid_combinations}/{total_combinations}")

In [None]:
# Analyser les resultats
results_df = pd.DataFrame(results)

# Trier par Sharpe Ratio
results_df_sorted = results_df.sort_values('sharpe', ascending=False)

print("="*70)
print("TOP 10 COMBINAISONS PAR SHARPE RATIO")
print("="*70)
print(results_df_sorted.head(10).to_string(index=False))

# Meilleure combinaison
best = results_df_sorted.iloc[0]
print(f"\nMeilleure combinaison:")
print(f"  Fast Period: {best['fast_period']}")
print(f"  Slow Period: {best['slow_period']}")
print(f"  Stop Loss: {best['stop_loss']:.2%}")
print(f"  Take Profit: {best['take_profit']:.2%}")
print(f"\n  Sharpe Ratio: {best['sharpe']:.2f}")
print(f"  Total Return: {best['total_return']:.2%}")
print(f"  Max Drawdown: {best['max_drawdown']:.2%}")

In [None]:
# Visualisation: Heatmap Sharpe par Fast/Slow (moyennee sur SL/TP)

pivot = results_df.pivot_table(
    values='sharpe',
    index='slow_period',
    columns='fast_period',
    aggfunc='mean'
)

plt.figure(figsize=(12, 8))
sns.heatmap(pivot, annot=True, fmt='.2f', cmap='RdYlGn', center=0,
            linewidths=0.5, cbar_kws={'label': 'Sharpe Ratio'})
plt.title('Sharpe Ratio Moyen par Combinaison Fast/Slow Period', fontsize=14, fontweight='bold')
plt.xlabel('Fast Period', fontsize=12)
plt.ylabel('Slow Period', fontsize=12)
plt.tight_layout()
plt.show()

In [None]:
# Visualisation: Distribution des Sharpe Ratios

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogramme
ax1 = axes[0]
ax1.hist(results_df['sharpe'], bins=30, color='steelblue', edgecolor='black', alpha=0.7)
ax1.axvline(x=results_df['sharpe'].mean(), color='red', linestyle='--', 
            label=f'Mean: {results_df["sharpe"].mean():.2f}')
ax1.axvline(x=results_df['sharpe'].median(), color='orange', linestyle='--',
            label=f'Median: {results_df["sharpe"].median():.2f}')
ax1.set_xlabel('Sharpe Ratio', fontsize=12)
ax1.set_ylabel('Frequence', fontsize=12)
ax1.set_title('Distribution des Sharpe Ratios', fontsize=14, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Scatter: Sharpe vs Max Drawdown
ax2 = axes[1]
scatter = ax2.scatter(results_df['max_drawdown'] * 100, results_df['sharpe'],
                     c=results_df['total_return'] * 100, cmap='viridis',
                     alpha=0.6, s=50)
plt.colorbar(scatter, ax=ax2, label='Total Return (%)')
ax2.set_xlabel('Max Drawdown (%)', fontsize=12)
ax2.set_ylabel('Sharpe Ratio', fontsize=12)
ax2.set_title('Sharpe vs Max Drawdown', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Partie 4: Walk-Forward Analysis (25 min)

### 4.1 Concept Walk-Forward (10 min)

La **Walk-Forward Analysis** est une technique de validation qui simule comment une strategie aurait ete optimisee et utilisee en temps reel.

#### Principe

1. **In-Sample (IS)**: Periode d'optimisation - on trouve les meilleurs parametres
2. **Out-of-Sample (OOS)**: Periode de test - on utilise les parametres trouves
3. **Rolling**: On repete en avancant la fenetre dans le temps

```
Timeline:
|-- IS_1 --|-- OOS_1 --|
           |-- IS_2 --|-- OOS_2 --|
                      |-- IS_3 --|-- OOS_3 --|
```

#### Avantages

- Evite le lookahead bias
- Simule l'utilisation reelle
- Detecte l'instabilite des parametres
- Mesure la performance OOS realiste

In [None]:
# Visualisation du concept Walk-Forward

fig, ax = plt.subplots(figsize=(14, 4))

train_months = 12
test_months = 3

n_folds = 5
colors_is = plt.cm.Blues(np.linspace(0.4, 0.8, n_folds))
colors_oos = plt.cm.Oranges(np.linspace(0.4, 0.8, n_folds))

for i in range(n_folds):
    start = i * test_months
    
    # In-Sample
    ax.barh(i, train_months, left=start, height=0.6, color=colors_is[i], 
            edgecolor='black', label='In-Sample' if i == 0 else '')
    ax.text(start + train_months/2, i, 'IS', ha='center', va='center', fontweight='bold')
    
    # Out-of-Sample
    ax.barh(i, test_months, left=start + train_months, height=0.6, color=colors_oos[i],
            edgecolor='black', label='Out-of-Sample' if i == 0 else '')
    ax.text(start + train_months + test_months/2, i, 'OOS', ha='center', va='center', fontweight='bold')

ax.set_yticks(range(n_folds))
ax.set_yticklabels([f'Fold {i+1}' for i in range(n_folds)])
ax.set_xlabel('Mois', fontsize=12)
ax.set_title('Walk-Forward Analysis - Schema des Folds', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print(f"Configuration: {train_months} mois IS + {test_months} mois OOS")
print(f"Chaque fold avance de {test_months} mois")

### 4.2 Implementation Walk-Forward (15 min)

In [None]:
def optimize_on_period(prices: pd.Series, 
                       fast_periods: List[int],
                       slow_periods: List[int],
                       fitness_func: str = 'sharpe') -> Tuple[Dict, float]:
    """
    Optimise les parametres sur une periode donnee.
    
    Args:
        prices: Serie de prix
        fast_periods: Liste des periodes fast a tester
        slow_periods: Liste des periodes slow a tester
        fitness_func: 'sharpe', 'return', 'calmar'
    
    Returns:
        Tuple (meilleurs_params, meilleur_score)
    """
    best_score = -np.inf
    best_params = None
    
    for fast in fast_periods:
        for slow in slow_periods:
            if fast >= slow:
                continue
            
            result = backtest_sma_strategy(prices, fast, slow)
            
            if not result['valid']:
                continue
            
            # Selectionner la metrique de fitness
            if fitness_func == 'sharpe':
                score = result['sharpe']
            elif fitness_func == 'return':
                score = result['total_return']
            elif fitness_func == 'calmar':
                score = result['total_return'] / abs(result['max_drawdown']) if result['max_drawdown'] != 0 else 0
            else:
                score = result['sharpe']
            
            if score > best_score:
                best_score = score
                best_params = {'fast_period': fast, 'slow_period': slow}
    
    return best_params, best_score

print("Fonction optimize_on_period() definie")

In [None]:
def walk_forward_analysis(prices: pd.Series,
                          train_days: int = 252,
                          test_days: int = 63,
                          fast_periods: List[int] = None,
                          slow_periods: List[int] = None,
                          fitness_func: str = 'sharpe') -> pd.DataFrame:
    """
    Realise une Walk-Forward Analysis complete.
    
    Args:
        prices: Serie de prix complete
        train_days: Nombre de jours d'entrainement (IS)
        test_days: Nombre de jours de test (OOS)
        fast_periods: Periodes fast a tester
        slow_periods: Periodes slow a tester
        fitness_func: Fonction de fitness pour optimisation
    
    Returns:
        DataFrame avec resultats par fold
    """
    if fast_periods is None:
        fast_periods = [5, 10, 15, 20, 25]
    if slow_periods is None:
        slow_periods = [50, 75, 100, 125, 150]
    
    results = []
    fold = 0
    current_start = 0
    
    while current_start + train_days + test_days <= len(prices):
        # Definir les periodes
        train_end = current_start + train_days
        test_end = train_end + test_days
        
        # Donnees d'entrainement (In-Sample)
        train_prices = prices.iloc[current_start:train_end]
        
        # Optimiser sur IS
        best_params, is_score = optimize_on_period(
            train_prices, fast_periods, slow_periods, fitness_func
        )
        
        if best_params is None:
            current_start += test_days
            continue
        
        # Donnees de test (Out-of-Sample)
        test_prices = prices.iloc[train_end:test_end]
        
        # Tester avec les meilleurs parametres
        oos_result = backtest_sma_strategy(
            test_prices, 
            best_params['fast_period'], 
            best_params['slow_period']
        )
        
        if oos_result['valid']:
            results.append({
                'fold': fold,
                'train_start': prices.index[current_start],
                'train_end': prices.index[train_end - 1],
                'test_start': prices.index[train_end],
                'test_end': prices.index[test_end - 1],
                'fast_period': best_params['fast_period'],
                'slow_period': best_params['slow_period'],
                'is_sharpe': is_score,
                'oos_sharpe': oos_result['sharpe'],
                'oos_return': oos_result['total_return'],
                'oos_drawdown': oos_result['max_drawdown'],
                'oos_trades': oos_result['n_trades']
            })
        
        # Avancer la fenetre
        current_start += test_days
        fold += 1
    
    return pd.DataFrame(results)

print("Fonction walk_forward_analysis() definie")

In [None]:
# Generer plus de donnees pour le Walk-Forward
np.random.seed(42)
n_days_wf = 1260  # ~5 ans
dates_wf = pd.date_range('2019-01-01', periods=n_days_wf, freq='B')

# Prix avec tendance et cycles
trend_wf = np.linspace(100, 180, n_days_wf)
# Ajouter cycles saisonniers
cycle = 10 * np.sin(np.linspace(0, 10 * np.pi, n_days_wf))
noise_wf = np.cumsum(np.random.randn(n_days_wf) * 0.5)
prices_wf = pd.Series(trend_wf + cycle + noise_wf, index=dates_wf)

print("Execution de la Walk-Forward Analysis...")
print(f"Periode totale: {dates_wf[0].date()} - {dates_wf[-1].date()}")
print(f"Configuration: 252 jours IS + 63 jours OOS")

wf_results = walk_forward_analysis(
    prices_wf,
    train_days=252,   # 1 an d'entrainement
    test_days=63,     # 3 mois de test
    fast_periods=[5, 10, 15, 20, 25, 30],
    slow_periods=[50, 75, 100, 125, 150, 175, 200],
    fitness_func='sharpe'
)

print(f"\nNombre de folds: {len(wf_results)}")

In [None]:
# Afficher les resultats Walk-Forward
print("="*80)
print("RESULTATS WALK-FORWARD ANALYSIS")
print("="*80)

# Afficher le tableau
display_cols = ['fold', 'test_start', 'test_end', 'fast_period', 'slow_period', 
                'is_sharpe', 'oos_sharpe', 'oos_return']
print(wf_results[display_cols].to_string(index=False))

# Statistiques agregees
print("\n" + "="*80)
print("STATISTIQUES AGREGEES")
print("="*80)
print(f"\n{'Metrique':<30} {'In-Sample':>15} {'Out-of-Sample':>15}")
print("-"*60)
print(f"{'Sharpe Moyen':<30} {wf_results['is_sharpe'].mean():>15.2f} {wf_results['oos_sharpe'].mean():>15.2f}")
print(f"{'Sharpe Median':<30} {wf_results['is_sharpe'].median():>15.2f} {wf_results['oos_sharpe'].median():>15.2f}")
print(f"{'Sharpe Std':<30} {wf_results['is_sharpe'].std():>15.2f} {wf_results['oos_sharpe'].std():>15.2f}")
print(f"{'Return Moyen OOS':<30} {'-':>15} {wf_results['oos_return'].mean()*100:>14.2f}%")

# Ratio IS/OOS (mesure d'overfitting)
efficiency_ratio = wf_results['oos_sharpe'].mean() / wf_results['is_sharpe'].mean()
print(f"\nWalk-Forward Efficiency Ratio: {efficiency_ratio:.2%}")
print("(Ratio OOS/IS - plus proche de 100% = meilleur)")

In [None]:
# Visualisation Walk-Forward

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: IS vs OOS Sharpe par fold
ax1 = axes[0, 0]
x = np.arange(len(wf_results))
width = 0.35
ax1.bar(x - width/2, wf_results['is_sharpe'], width, label='In-Sample', color='steelblue')
ax1.bar(x + width/2, wf_results['oos_sharpe'], width, label='Out-of-Sample', color='coral')
ax1.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax1.set_xlabel('Fold', fontsize=12)
ax1.set_ylabel('Sharpe Ratio', fontsize=12)
ax1.set_title('Sharpe Ratio: IS vs OOS par Fold', fontsize=14, fontweight='bold')
ax1.set_xticks(x)
ax1.set_xticklabels([f'F{i}' for i in range(len(wf_results))])
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Parametres optimaux par fold
ax2 = axes[0, 1]
ax2.plot(x, wf_results['fast_period'], 'o-', label='Fast Period', color='green', markersize=8)
ax2.plot(x, wf_results['slow_period'], 's-', label='Slow Period', color='purple', markersize=8)
ax2.set_xlabel('Fold', fontsize=12)
ax2.set_ylabel('Periode', fontsize=12)
ax2.set_title('Parametres Optimaux par Fold', fontsize=14, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels([f'F{i}' for i in range(len(wf_results))])
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: OOS Returns cumules
ax3 = axes[1, 0]
cumulative_return = (1 + wf_results['oos_return']).cumprod() - 1
ax3.plot(wf_results['test_end'], cumulative_return * 100, 'o-', color='navy', linewidth=2, markersize=8)
ax3.axhline(y=0, color='red', linestyle='--', linewidth=1)
ax3.fill_between(wf_results['test_end'], 0, cumulative_return * 100,
                 where=cumulative_return >= 0, color='green', alpha=0.3)
ax3.fill_between(wf_results['test_end'], 0, cumulative_return * 100,
                 where=cumulative_return < 0, color='red', alpha=0.3)
ax3.set_xlabel('Date', fontsize=12)
ax3.set_ylabel('Return Cumulatif (%)', fontsize=12)
ax3.set_title('Returns OOS Cumules', fontsize=14, fontweight='bold')
ax3.grid(True, alpha=0.3)
plt.setp(ax3.xaxis.get_majorticklabels(), rotation=45, ha='right')

# Plot 4: Scatter IS vs OOS
ax4 = axes[1, 1]
ax4.scatter(wf_results['is_sharpe'], wf_results['oos_sharpe'], s=100, alpha=0.7, c='navy')
# Ligne de regression
z = np.polyfit(wf_results['is_sharpe'], wf_results['oos_sharpe'], 1)
p = np.poly1d(z)
x_line = np.linspace(wf_results['is_sharpe'].min(), wf_results['is_sharpe'].max(), 100)
ax4.plot(x_line, p(x_line), 'r--', linewidth=2, label=f'Regression')
# Ligne identite
ax4.plot([0, wf_results['is_sharpe'].max()], [0, wf_results['is_sharpe'].max()], 
         'g--', linewidth=1, alpha=0.5, label='y=x (no overfitting)')
ax4.set_xlabel('In-Sample Sharpe', fontsize=12)
ax4.set_ylabel('Out-of-Sample Sharpe', fontsize=12)
ax4.set_title('IS vs OOS Sharpe Correlation', fontsize=14, fontweight='bold')
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Correlation IS-OOS
correlation = wf_results['is_sharpe'].corr(wf_results['oos_sharpe'])
print(f"Correlation IS-OOS: {correlation:.2f}")
print("(Correlation elevee = bonne previsibilite des parametres)")

---

## Partie 5: Eviter l'Overfitting (15 min)

### 5.1 Signes d'Overfitting (8 min)

Comment detecter si votre strategie est overfittee?

In [None]:
def detect_overfitting_signs(wf_results: pd.DataFrame) -> Dict[str, bool]:
    """
    Detecte les signes d'overfitting dans les resultats Walk-Forward.
    
    Returns:
        Dict avec les indicateurs d'overfitting
    """
    signs = {}
    
    # 1. IS >> OOS performance
    mean_is = wf_results['is_sharpe'].mean()
    mean_oos = wf_results['oos_sharpe'].mean()
    degradation = (mean_is - mean_oos) / mean_is if mean_is > 0 else 0
    signs['is_much_better_than_oos'] = degradation > 0.5  # Plus de 50% de degradation
    
    # 2. Parametres instables (variance elevee)
    fast_stability = wf_results['fast_period'].std() / wf_results['fast_period'].mean()
    slow_stability = wf_results['slow_period'].std() / wf_results['slow_period'].mean()
    signs['unstable_parameters'] = fast_stability > 0.3 or slow_stability > 0.3
    
    # 3. Faible correlation IS-OOS
    correlation = wf_results['is_sharpe'].corr(wf_results['oos_sharpe'])
    signs['low_is_oos_correlation'] = correlation < 0.3
    
    # 4. Performance OOS negative
    signs['negative_oos_sharpe'] = mean_oos < 0
    
    # 5. Grande variance OOS
    oos_cv = wf_results['oos_sharpe'].std() / abs(mean_oos) if mean_oos != 0 else np.inf
    signs['high_oos_variance'] = oos_cv > 2.0
    
    return signs, {
        'is_oos_degradation': degradation,
        'fast_stability_cv': fast_stability,
        'slow_stability_cv': slow_stability,
        'is_oos_correlation': correlation,
        'mean_oos_sharpe': mean_oos,
        'oos_coefficient_variation': oos_cv
    }

# Analyser nos resultats
overfitting_signs, metrics = detect_overfitting_signs(wf_results)

print("="*60)
print("ANALYSE D'OVERFITTING")
print("="*60)

print("\nMetriques:")
for key, value in metrics.items():
    print(f"  - {key}: {value:.2f}")

print("\nSignes d'overfitting detectes:")
n_warnings = 0
for sign, detected in overfitting_signs.items():
    status = "[WARNING]" if detected else "[OK]"
    print(f"  {status} {sign.replace('_', ' ').title()}")
    if detected:
        n_warnings += 1

print(f"\nConclusion: {n_warnings}/5 signes d'overfitting detectes")
if n_warnings >= 3:
    print("ATTENTION: Risque eleve d'overfitting!")
elif n_warnings >= 1:
    print("Prudence: Quelques signes d'overfitting presents.")
else:
    print("Bon: Peu de signes d'overfitting.")

### 5.2 Techniques Anti-Overfitting (7 min)

| Technique | Description | Implementation |
|-----------|-------------|----------------|
| **Reduire les parametres** | Moins de degres de liberte | Fixer certains parametres |
| **Bounds raisonnables** | Limiter l'espace de recherche | Min/max realistes |
| **Cross-validation temporelle** | Valider sur plusieurs periodes | Walk-Forward Analysis |
| **Penaliser la complexite** | Favoriser la simplicite | Ajouter regularisation |
| **Monte Carlo** | Tester la robustesse | Permutations aleatoires |

In [None]:
# Technique 1: Penalite de complexite dans la fitness function

def fitness_with_penalty(result: Dict, n_params: int) -> float:
    """
    Calcule une fitness penalisee par le nombre de parametres.
    
    Utilise le critere d'information d'Akaike (AIC-like).
    """
    if not result['valid']:
        return -np.inf
    
    sharpe = result['sharpe']
    n_trades = result['n_trades']
    
    # Penalite basee sur le nombre de parametres
    # Plus de parametres = penalite plus forte
    penalty = 0.1 * n_params
    
    # Bonus pour plus de trades (evidence statistique)
    trade_bonus = 0.01 * np.log(n_trades + 1)
    
    fitness = sharpe - penalty + trade_bonus
    
    return fitness

# Demonstration
print("Fitness avec penalite de complexite:")
print("="*50)

# Simuler deux strategies
strategy_simple = {'valid': True, 'sharpe': 1.2, 'n_trades': 50}
strategy_complex = {'valid': True, 'sharpe': 1.4, 'n_trades': 50}

fitness_simple = fitness_with_penalty(strategy_simple, n_params=2)   # 2 parametres
fitness_complex = fitness_with_penalty(strategy_complex, n_params=6)  # 6 parametres

print(f"\nStrategie Simple (2 params):")
print(f"  Sharpe brut: {strategy_simple['sharpe']:.2f}")
print(f"  Fitness penalisee: {fitness_simple:.2f}")

print(f"\nStrategie Complexe (6 params):")
print(f"  Sharpe brut: {strategy_complex['sharpe']:.2f}")
print(f"  Fitness penalisee: {fitness_complex:.2f}")

print(f"\n-> La strategie {'simple' if fitness_simple > fitness_complex else 'complexe'} est preferee apres penalite!")

In [None]:
# Technique 2: Test de robustesse par variation des parametres

def parameter_sensitivity_test(prices: pd.Series, 
                              base_params: Dict,
                              variation_pct: float = 0.10) -> pd.DataFrame:
    """
    Teste la sensibilite de la performance aux variations de parametres.
    
    Si la performance chute drastiquement avec +/-10% sur les parametres,
    c'est un signe d'overfitting.
    """
    results = []
    
    fast = base_params['fast_period']
    slow = base_params['slow_period']
    
    # Variations
    variations = [-variation_pct, 0, variation_pct]
    
    for fast_var in variations:
        for slow_var in variations:
            test_fast = int(fast * (1 + fast_var))
            test_slow = int(slow * (1 + slow_var))
            
            if test_fast >= test_slow:
                continue
            
            result = backtest_sma_strategy(prices, test_fast, test_slow)
            
            if result['valid']:
                results.append({
                    'fast_var': f"{fast_var:+.0%}",
                    'slow_var': f"{slow_var:+.0%}",
                    'fast_period': test_fast,
                    'slow_period': test_slow,
                    'sharpe': result['sharpe'],
                    'return': result['total_return']
                })
    
    return pd.DataFrame(results)

# Tester la sensibilite des meilleurs parametres
best_params = {'fast_period': 15, 'slow_period': 100}

sensitivity_df = parameter_sensitivity_test(prices_wf, best_params, variation_pct=0.15)

print("Test de sensibilite des parametres (+/-15%):")
print("="*60)
print(sensitivity_df.to_string(index=False))

# Analyser la stabilite
sharpe_std = sensitivity_df['sharpe'].std()
sharpe_mean = sensitivity_df['sharpe'].mean()
sharpe_cv = sharpe_std / sharpe_mean if sharpe_mean != 0 else np.inf

print(f"\nStabilite:")
print(f"  Sharpe moyen: {sharpe_mean:.2f}")
print(f"  Sharpe std: {sharpe_std:.2f}")
print(f"  Coefficient de variation: {sharpe_cv:.2%}")

if sharpe_cv < 0.20:
    print("\n-> Parametres ROBUSTES (CV < 20%)")
elif sharpe_cv < 0.50:
    print("\n-> Parametres MODEREMENT SENSIBLES (20% < CV < 50%)")
else:
    print("\n-> Parametres TRES SENSIBLES - Risque d'overfitting (CV > 50%)")

---

## Partie 6: Optimisation Avancee (20 min)

### 6.1 Optimization Objectives (10 min)

Les fonctions de fitness multi-objectif permettent d'equilibrer plusieurs criteres.

In [None]:
def multi_objective_fitness(result: Dict,
                           sharpe_weight: float = 0.5,
                           drawdown_weight: float = 0.3,
                           trades_weight: float = 0.2) -> float:
    """
    Fonction de fitness multi-objectif.
    
    Combine:
    - Sharpe Ratio (a maximiser)
    - Max Drawdown (a minimiser)
    - Nombre de trades (penalise si trop peu ou trop)
    
    Args:
        result: Resultats du backtest
        sharpe_weight: Poids du Sharpe
        drawdown_weight: Poids du drawdown
        trades_weight: Poids des trades
    
    Returns:
        Score de fitness combine
    """
    if not result['valid']:
        return -np.inf
    
    sharpe = result['sharpe']
    max_dd = abs(result['max_drawdown'])
    n_trades = result['n_trades']
    
    # Normaliser Sharpe (typiquement entre -2 et 3)
    sharpe_score = np.clip(sharpe / 2, -1, 1.5)  # Normalise autour de 0-1
    
    # Normaliser Drawdown (0% = parfait, 30%+ = mauvais)
    dd_score = 1 - np.clip(max_dd / 0.30, 0, 1)  # 1 si 0%, 0 si 30%+
    
    # Score trades (optimal: 20-100 trades, penaliser < 10 ou > 200)
    if n_trades < 10:
        trade_score = n_trades / 10
    elif n_trades > 200:
        trade_score = max(0, 1 - (n_trades - 200) / 200)
    else:
        trade_score = 1.0
    
    # Combiner
    fitness = (sharpe_weight * sharpe_score + 
               drawdown_weight * dd_score + 
               trades_weight * trade_score)
    
    return fitness

# Demonstration
print("Fitness Multi-Objectif:")
print("="*60)

test_results = [
    {'name': 'High Sharpe, High DD', 'valid': True, 'sharpe': 2.0, 'max_drawdown': -0.35, 'n_trades': 50},
    {'name': 'Medium Sharpe, Low DD', 'valid': True, 'sharpe': 1.2, 'max_drawdown': -0.10, 'n_trades': 50},
    {'name': 'High Sharpe, Few Trades', 'valid': True, 'sharpe': 2.0, 'max_drawdown': -0.15, 'n_trades': 5},
    {'name': 'Balanced', 'valid': True, 'sharpe': 1.5, 'max_drawdown': -0.12, 'n_trades': 60},
]

for res in test_results:
    fitness = multi_objective_fitness(res)
    print(f"\n{res['name']}:")
    print(f"  Sharpe: {res['sharpe']:.2f}, DD: {res['max_drawdown']:.2%}, Trades: {res['n_trades']}")
    print(f"  Fitness Score: {fitness:.3f}")

In [None]:
# Calmar Ratio comme fitness

def calmar_fitness(result: Dict, min_trades: int = 10) -> float:
    """
    Calmar Ratio = CAGR / Max Drawdown
    
    Favorise les strategies avec bon return ET faible drawdown.
    """
    if not result['valid'] or result['n_trades'] < min_trades:
        return -np.inf
    
    total_return = result['total_return']
    max_dd = abs(result['max_drawdown'])
    
    if max_dd == 0:
        return total_return * 100 if total_return > 0 else -np.inf
    
    calmar = total_return / max_dd
    
    return calmar

# Grid Search avec Calmar comme fitness
print("Grid Search avec Calmar Ratio:")
print("="*50)

calmar_results = []

for fast in [10, 15, 20, 25]:
    for slow in [50, 75, 100, 150]:
        if fast >= slow:
            continue
        
        result = backtest_sma_strategy(prices_wf, fast, slow)
        
        if result['valid']:
            calmar_results.append({
                'fast': fast,
                'slow': slow,
                'sharpe': result['sharpe'],
                'calmar': calmar_fitness(result),
                'return': result['total_return'],
                'max_dd': result['max_drawdown']
            })

calmar_df = pd.DataFrame(calmar_results)
calmar_df_sorted = calmar_df.sort_values('calmar', ascending=False)

print("\nTop 5 par Calmar Ratio:")
print(calmar_df_sorted.head().to_string(index=False))

print("\nTop 5 par Sharpe Ratio:")
print(calmar_df.sort_values('sharpe', ascending=False).head().to_string(index=False))

### 6.2 Robustness Testing (10 min)

Le test de robustesse verifie que les parametres performent bien dans differentes conditions.

In [None]:
def monte_carlo_robustness(prices: pd.Series,
                          params: Dict,
                          n_simulations: int = 100,
                          resample_block_size: int = 21) -> Dict:
    """
    Test de robustesse Monte Carlo par bootstrap des returns.
    
    Reshuffle les blocks de returns pour creer des series alternatives.
    Teste si la strategie performe encore bien.
    
    Args:
        prices: Serie de prix originale
        params: Parametres a tester
        n_simulations: Nombre de simulations
        resample_block_size: Taille des blocks pour le bootstrap
    
    Returns:
        Dict avec statistiques de robustesse
    """
    np.random.seed(42)
    
    # Calculer returns originaux
    returns = prices.pct_change().dropna()
    n_returns = len(returns)
    
    sharpes = []
    drawdowns = []
    
    for _ in range(n_simulations):
        # Bootstrap par blocks
        n_blocks = n_returns // resample_block_size + 1
        blocks = []
        
        for _ in range(n_blocks):
            start = np.random.randint(0, n_returns - resample_block_size)
            block = returns.iloc[start:start + resample_block_size].values
            blocks.append(block)
        
        # Concatener et tronquer
        resampled_returns = np.concatenate(blocks)[:n_returns]
        
        # Reconstruire les prix
        resampled_prices = prices.iloc[0] * (1 + pd.Series(resampled_returns)).cumprod()
        resampled_prices = pd.Series(resampled_prices.values, index=prices.index[1:])
        
        # Backtest
        result = backtest_sma_strategy(
            resampled_prices,
            params['fast_period'],
            params['slow_period']
        )
        
        if result['valid']:
            sharpes.append(result['sharpe'])
            drawdowns.append(result['max_drawdown'])
    
    # Statistiques
    sharpes = np.array(sharpes)
    drawdowns = np.array(drawdowns)
    
    return {
        'sharpe_mean': sharpes.mean(),
        'sharpe_std': sharpes.std(),
        'sharpe_5pct': np.percentile(sharpes, 5),
        'sharpe_95pct': np.percentile(sharpes, 95),
        'prob_positive_sharpe': (sharpes > 0).mean(),
        'dd_mean': drawdowns.mean(),
        'dd_5pct': np.percentile(drawdowns, 5),
        'dd_95pct': np.percentile(drawdowns, 95),
        'sharpes': sharpes,
        'drawdowns': drawdowns
    }

# Tester la robustesse des meilleurs parametres
print("Test de robustesse Monte Carlo (100 simulations)...")
best_params_test = {'fast_period': 15, 'slow_period': 100}

robustness = monte_carlo_robustness(prices_wf, best_params_test, n_simulations=100)

print("\nResultats:")
print("="*50)
print(f"Sharpe Ratio:")
print(f"  Moyenne: {robustness['sharpe_mean']:.2f}")
print(f"  Ecart-type: {robustness['sharpe_std']:.2f}")
print(f"  Intervalle 90%: [{robustness['sharpe_5pct']:.2f}, {robustness['sharpe_95pct']:.2f}]")
print(f"  Probabilite Sharpe > 0: {robustness['prob_positive_sharpe']:.1%}")
print(f"\nMax Drawdown:")
print(f"  Moyenne: {robustness['dd_mean']:.2%}")
print(f"  Intervalle 90%: [{robustness['dd_5pct']:.2%}, {robustness['dd_95pct']:.2%}]")

In [None]:
# Visualisation robustesse

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Distribution Sharpe
ax1 = axes[0]
ax1.hist(robustness['sharpes'], bins=30, color='steelblue', edgecolor='black', alpha=0.7)
ax1.axvline(x=robustness['sharpe_mean'], color='red', linestyle='--', linewidth=2,
            label=f'Mean: {robustness["sharpe_mean"]:.2f}')
ax1.axvline(x=0, color='black', linestyle='-', linewidth=1)
ax1.axvline(x=robustness['sharpe_5pct'], color='orange', linestyle=':', linewidth=2,
            label=f'5th pct: {robustness["sharpe_5pct"]:.2f}')
ax1.axvline(x=robustness['sharpe_95pct'], color='orange', linestyle=':', linewidth=2,
            label=f'95th pct: {robustness["sharpe_95pct"]:.2f}')
ax1.set_xlabel('Sharpe Ratio', fontsize=12)
ax1.set_ylabel('Frequence', fontsize=12)
ax1.set_title('Distribution Monte Carlo des Sharpe Ratios', fontsize=14, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Distribution Drawdown
ax2 = axes[1]
ax2.hist(robustness['drawdowns'] * 100, bins=30, color='coral', edgecolor='black', alpha=0.7)
ax2.axvline(x=robustness['dd_mean'] * 100, color='red', linestyle='--', linewidth=2,
            label=f'Mean: {robustness["dd_mean"]:.2%}')
ax2.axvline(x=robustness['dd_5pct'] * 100, color='darkred', linestyle=':', linewidth=2,
            label=f'5th pct: {robustness["dd_5pct"]:.2%}')
ax2.set_xlabel('Max Drawdown (%)', fontsize=12)
ax2.set_ylabel('Frequence', fontsize=12)
ax2.set_title('Distribution Monte Carlo des Max Drawdowns', fontsize=14, fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Partie 7: Exemple Complet (15 min)

### 7.1 Strategie Optimisee avec Validation Complete

In [None]:
def complete_optimization_pipeline(prices: pd.Series,
                                   fast_range: List[int],
                                   slow_range: List[int],
                                   train_pct: float = 0.7) -> Dict:
    """
    Pipeline complet d'optimisation avec validation.
    
    Etapes:
    1. Split train/test
    2. Grid Search sur train
    3. Validation sur test
    4. Walk-Forward Analysis
    5. Test de robustesse
    
    Returns:
        Dict avec tous les resultats
    """
    results = {}
    
    # 1. Split train/test
    split_idx = int(len(prices) * train_pct)
    train_prices = prices.iloc[:split_idx]
    test_prices = prices.iloc[split_idx:]
    
    print(f"1. Split des donnees:")
    print(f"   Train: {train_prices.index[0].date()} - {train_prices.index[-1].date()} ({len(train_prices)} jours)")
    print(f"   Test: {test_prices.index[0].date()} - {test_prices.index[-1].date()} ({len(test_prices)} jours)")
    
    # 2. Grid Search sur train
    print(f"\n2. Grid Search sur donnees d'entrainement...")
    best_params, best_score = optimize_on_period(train_prices, fast_range, slow_range)
    
    if best_params is None:
        print("   ERREUR: Aucun parametre valide trouve!")
        return None
    
    print(f"   Meilleurs parametres: SMA({best_params['fast_period']}, {best_params['slow_period']})")
    print(f"   Score IS: {best_score:.2f}")
    results['best_params'] = best_params
    results['is_score'] = best_score
    
    # 3. Validation sur test
    print(f"\n3. Validation Out-of-Sample...")
    oos_result = backtest_sma_strategy(
        test_prices, 
        best_params['fast_period'], 
        best_params['slow_period']
    )
    
    if oos_result['valid']:
        print(f"   Sharpe OOS: {oos_result['sharpe']:.2f}")
        print(f"   Return OOS: {oos_result['total_return']:.2%}")
        print(f"   Max DD OOS: {oos_result['max_drawdown']:.2%}")
        results['oos_sharpe'] = oos_result['sharpe']
        results['oos_return'] = oos_result['total_return']
        results['oos_drawdown'] = oos_result['max_drawdown']
    
    # 4. Walk-Forward Analysis
    print(f"\n4. Walk-Forward Analysis...")
    wf_results = walk_forward_analysis(
        prices,
        train_days=180,
        test_days=60,
        fast_periods=fast_range,
        slow_periods=slow_range
    )
    
    wf_efficiency = wf_results['oos_sharpe'].mean() / wf_results['is_sharpe'].mean()
    print(f"   Folds: {len(wf_results)}")
    print(f"   WF Efficiency: {wf_efficiency:.2%}")
    results['wf_results'] = wf_results
    results['wf_efficiency'] = wf_efficiency
    
    # 5. Test de robustesse
    print(f"\n5. Test de robustesse Monte Carlo (50 simulations)...")
    robustness = monte_carlo_robustness(prices, best_params, n_simulations=50)
    
    print(f"   Prob(Sharpe > 0): {robustness['prob_positive_sharpe']:.1%}")
    print(f"   Sharpe 5th percentile: {robustness['sharpe_5pct']:.2f}")
    results['robustness'] = robustness
    
    # Score final
    print(f"\n" + "="*60)
    print("SCORE DE VALIDATION FINAL")
    print("="*60)
    
    # Scoring (ponderation des criteres)
    oos_vs_is = oos_result['sharpe'] / best_score if best_score > 0 else 0
    validation_score = (
        0.30 * min(1, oos_vs_is) +                          # OOS pas trop degrade
        0.25 * min(1, wf_efficiency) +                       # Walk-Forward efficace
        0.25 * robustness['prob_positive_sharpe'] +          # Robustesse
        0.20 * (1 if oos_result['sharpe'] > 0.5 else 0.5 if oos_result['sharpe'] > 0 else 0)
    )
    
    print(f"\nCriteres:")
    print(f"  - OOS/IS Ratio: {oos_vs_is:.2%}")
    print(f"  - WF Efficiency: {wf_efficiency:.2%}")
    print(f"  - MC Prob(Sharpe>0): {robustness['prob_positive_sharpe']:.1%}")
    print(f"  - OOS Sharpe: {oos_result['sharpe']:.2f}")
    
    print(f"\nSCORE DE VALIDATION: {validation_score:.2%}")
    
    if validation_score >= 0.80:
        print("-> EXCELLENT: Strategie bien validee")
    elif validation_score >= 0.60:
        print("-> BON: Strategie acceptable avec quelques reserves")
    elif validation_score >= 0.40:
        print("-> MOYEN: Risques d'overfitting, prudence requise")
    else:
        print("-> FAIBLE: Probable overfitting, ne pas utiliser en production")
    
    results['validation_score'] = validation_score
    
    return results

print("Fonction complete_optimization_pipeline() definie")

In [None]:
# Executer le pipeline complet
print("="*70)
print("PIPELINE D'OPTIMISATION COMPLET")
print("="*70)

final_results = complete_optimization_pipeline(
    prices_wf,
    fast_range=[5, 10, 15, 20, 25, 30],
    slow_range=[50, 75, 100, 125, 150, 175, 200],
    train_pct=0.70
)

In [None]:
# Code QCAlgorithm final avec les parametres optimises

if final_results:
    best = final_results['best_params']
    
    final_algorithm_code = f'''
from AlgorithmImports import *

class OptimizedSMACrossover(QCAlgorithm):
    """
    Strategie SMA Crossover optimisee et validee.
    
    Parametres optimaux:
        - Fast Period: {best['fast_period']}
        - Slow Period: {best['slow_period']}
    
    Validation:
        - OOS Sharpe: {final_results['oos_sharpe']:.2f}
        - WF Efficiency: {final_results['wf_efficiency']:.2%}
        - Validation Score: {final_results['validation_score']:.2%}
    """
    
    def Initialize(self):
        self.SetStartDate(2024, 1, 1)
        self.SetEndDate(2024, 12, 31)
        self.SetCash(100000)
        
        # Parametres optimises (avec GetParameter pour override)
        self.fast_period = int(self.GetParameter("fast_period", {best['fast_period']}))
        self.slow_period = int(self.GetParameter("slow_period", {best['slow_period']}))
        
        # Setup
        self.symbol = self.AddEquity("SPY", Resolution.Daily).Symbol
        
        # Indicateurs
        self.sma_fast = self.SMA(self.symbol, self.fast_period, Resolution.Daily)
        self.sma_slow = self.SMA(self.symbol, self.slow_period, Resolution.Daily)
        
        # Risk management
        self.stop_loss_pct = 0.05
        self.entry_price = 0
        
        self.SetWarmup(self.slow_period)
    
    def OnData(self, data):
        if self.IsWarmingUp or not data.ContainsKey(self.symbol):
            return
        
        if not (self.sma_fast.IsReady and self.sma_slow.IsReady):
            return
        
        price = self.Securities[self.symbol].Price
        
        # Stop loss check
        if self.Portfolio[self.symbol].Invested:
            if (price - self.entry_price) / self.entry_price <= -self.stop_loss_pct:
                self.Liquidate(self.symbol)
                return
        
        # Trading logic
        fast = self.sma_fast.Current.Value
        slow = self.sma_slow.Current.Value
        
        if fast > slow and not self.Portfolio[self.symbol].Invested:
            self.SetHoldings(self.symbol, 1.0)
            self.entry_price = price
        
        elif fast < slow and self.Portfolio[self.symbol].Invested:
            self.Liquidate(self.symbol)
'''
    
    print("Code QCAlgorithm Final:")
    print(final_algorithm_code)

---

## Conclusion et Prochaines Etapes

### Recapitulatif

Dans ce notebook, nous avons couvert:

1. **Introduction a l'optimisation**: Objectifs, risques (overfitting, curve fitting), types de methodes

2. **Parameter Sets QuantConnect**: Definition de parametres optimisables, configuration JSON

3. **Grid Search Manuel**: Implementation complete, analyse des resultats, heatmaps

4. **Walk-Forward Analysis**: Validation temporelle, split IS/OOS, efficiency ratio

5. **Anti-Overfitting**: Detection des signes, penalites de complexite, test de sensibilite

6. **Optimisation Avancee**: Fitness multi-objectif, Calmar ratio, robustesse Monte Carlo

7. **Pipeline Complet**: Integration de toutes les techniques, score de validation

### Points Cles a Retenir

| Concept | Point Cle |
|---------|----------|
| **Objectif** | Robustesse > Performance maximale |
| **Overfitting** | IS >> OOS = danger |
| **Walk-Forward** | Valide l'optimisation en temps reel |
| **Parametres** | Moins = mieux (parcimonie) |
| **Robustesse** | Monte Carlo pour valider |
| **Fitness** | Multi-objectif > Single metric |

### Checklist Avant Production

- [ ] Walk-Forward Efficiency > 60%
- [ ] OOS Sharpe > 0.5
- [ ] Prob(Sharpe > 0) Monte Carlo > 80%
- [ ] Parametres stables (CV < 30%)
- [ ] Max Drawdown acceptable
- [ ] Nombre de trades suffisant (> 30)

### Ressources Complementaires

- [QuantConnect Optimization](https://www.quantconnect.com/docs/v2/our-platform/optimization)
- [Walk-Forward Analysis Guide](https://www.investopedia.com/terms/w/walk-forward-testing.asp)
- [Avoiding Overfitting in Trading](https://quantpedia.com/how-to-avoid-overfitting-in-trading-strategies/)

---

**Notebook complete. L'optimisation des parametres est cruciale mais doit etre faite avec rigueur pour eviter l'overfitting.**