# QC-Py-21 - Portfolio Optimization avec Machine Learning

> **De Markowitz aux methodes ML modernes pour l'allocation d'actifs**
> Duree: 90 minutes | Niveau: Avance | Python + QuantConnect

---

## Objectifs d'Apprentissage

A la fin de ce notebook, vous serez capable de :

1. Comprendre la **Modern Portfolio Theory** et ses limites
2. Implementer l'optimisation **Mean-Variance** et la frontiere efficiente
3. Appliquer des techniques de **shrinkage** (Ledoit-Wolf) pour la covariance
4. Utiliser le **Machine Learning** pour predire les rendements attendus
5. Implementer le modele **Black-Litterman** avec des vues ML
6. Maitriser **Hierarchical Risk Parity** (HRP) pour l'allocation
7. Integrer ces techniques dans un **Portfolio Construction Model** QuantConnect
8. Construire une **strategie complete** ML-optimisee

## Prerequisites

- Notebooks QC-Py-01 a 20 completes
- Comprehension des concepts de risque/rendement (QC-Py-10)
- Notions de ML et feature engineering (QC-Py-18)
- Familiarite avec numpy, pandas, scipy, sklearn

## Structure du Notebook

| Partie | Sujet | Duree |
|--------|-------|-------|
| 1 | Modern Portfolio Theory et Mean-Variance | 20 min |
| 2 | Estimation de Covariance et Shrinkage | 10 min |
| 3 | ML pour Expected Returns | 15 min |
| 4 | Black-Litterman avec vues ML | 20 min |
| 5 | Hierarchical Risk Parity (HRP) | 15 min |
| 6 | Integration QuantConnect | 10 min |
| 7 | Strategie Complete ML-Optimized | 15 min |

---

## Introduction : Pourquoi optimiser le portefeuille ?

L'**allocation d'actifs** est responsable de 90% de la performance d'un portefeuille selon les etudes academiques. L'objectif est de trouver les **poids optimaux** pour maximiser le rendement ajuste au risque.

### Evolution des approches

| Epoque | Approche | Avantages | Limites |
|--------|----------|-----------|--------|
| 1952 | **Markowitz Mean-Variance** | Fondation theorique solide | Sensible aux estimations |
| 1992 | **Black-Litterman** | Integre les vues de l'investisseur | Complexe a parametrer |
| 2016 | **Hierarchical Risk Parity** | Robuste, pas d'inversion de matrice | Ignore les rendements attendus |
| 2020+ | **ML-Enhanced** | Predictions adaptatives | Risque d'overfitting |

### Pipeline d'optimisation moderne

```
Donnees Historiques
       |
       v
+------------------+
| Estimation       |
| - Expected Returns (ML)
| - Covariance (Shrinkage)
+--------+---------+
         |
         v
+------------------+
| Optimisation     |
| - Mean-Variance
| - Black-Litterman
| - HRP
+--------+---------+
         |
         v
   Poids Optimaux
```

In [None]:
# Imports necessaires
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import minimize
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
from scipy.spatial.distance import squareform
import warnings
warnings.filterwarnings('ignore')

# Configuration matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

# Sklearn pour ML et covariance
from sklearn.covariance import LedoitWolf
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

print("Imports reussis!")
print("Ce notebook couvre l'optimisation de portefeuille avec ML.")

In [None]:
# Generer des donnees de demonstration
def generate_portfolio_data(n_assets=10, n_days=500, seed=42):
    """
    Genere des donnees de prix simulees pour un portefeuille.
    
    Parameters:
    -----------
    n_assets : int
        Nombre d'actifs
    n_days : int
        Nombre de jours de donnees
    seed : int
        Graine aleatoire
    
    Returns:
    --------
    pd.DataFrame : Prix des actifs
    pd.DataFrame : Rendements des actifs
    """
    np.random.seed(seed)
    
    # Noms des actifs
    assets = [f'ASSET_{i+1}' for i in range(n_assets)]
    
    # Dates
    dates = pd.date_range(start='2022-01-01', periods=n_days, freq='B')
    
    # Parametres par actif (drift et volatilite differents)
    drifts = np.random.uniform(0.0001, 0.0005, n_assets)  # 2.5% - 12.5% annuel
    vols = np.random.uniform(0.01, 0.03, n_assets)  # 16% - 48% annuel
    
    # Matrice de correlation (structure de bloc)
    n_blocks = 3
    block_size = n_assets // n_blocks
    corr_matrix = np.eye(n_assets) * 0.3
    
    for b in range(n_blocks):
        start = b * block_size
        end = min((b + 1) * block_size, n_assets)
        corr_matrix[start:end, start:end] = 0.6
    
    np.fill_diagonal(corr_matrix, 1.0)
    
    # Convertir correlation en covariance
    cov_matrix = np.outer(vols, vols) * corr_matrix
    
    # Generer rendements correles
    returns = np.random.multivariate_normal(drifts, cov_matrix, n_days)
    
    # Convertir en prix
    prices = 100 * np.exp(np.cumsum(returns, axis=0))
    
    # DataFrames
    df_prices = pd.DataFrame(prices, index=dates, columns=assets)
    df_returns = pd.DataFrame(returns, index=dates, columns=assets)
    
    return df_prices, df_returns

# Generer donnees
prices, returns = generate_portfolio_data(n_assets=10, n_days=500)

print(f"Donnees generees: {len(prices)} jours, {len(prices.columns)} actifs")
print(f"\nApercu des prix:")
print(prices.head())

In [None]:
# Visualisation des prix
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Prix normalises (base 100)
normalized_prices = prices / prices.iloc[0] * 100
for col in normalized_prices.columns:
    axes[0].plot(normalized_prices.index, normalized_prices[col], label=col, alpha=0.7)

axes[0].set_title('Prix Normalises (Base 100)', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Prix')
axes[0].legend(loc='upper left', ncol=5, fontsize=8)

# Correlation heatmap
corr = returns.corr()
sns.heatmap(corr, annot=True, fmt='.2f', cmap='RdBu_r', center=0, 
            ax=axes[1], vmin=-1, vmax=1)
axes[1].set_title('Matrice de Correlation', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

---

## Partie 1 : Modern Portfolio Theory et Mean-Variance (20 min)

### Theorie de Markowitz (1952)

Harry Markowitz a formule le probleme d'optimisation de portefeuille comme un compromis entre **rendement attendu** et **risque** (variance).

### Formulation mathematique

**Maximiser le Sharpe Ratio :**

$$\max_w \frac{w^T \mu - r_f}{\sqrt{w^T \Sigma w}}$$

**Ou minimiser la variance pour un rendement cible :**

$$\min_w w^T \Sigma w$$

Sous contraintes :
- $w^T \mu = r_{target}$ (rendement cible)
- $\sum w_i = 1$ (poids somment a 1)
- $w_i \geq 0$ (pas de short, optionnel)

### Notations

| Symbole | Description |
|---------|-------------|
| $w$ | Vecteur des poids |
| $\mu$ | Vecteur des rendements attendus |
| $\Sigma$ | Matrice de covariance |
| $r_f$ | Taux sans risque |

In [None]:
def portfolio_variance(weights, cov_matrix):
    """
    Calcule la variance du portefeuille.
    
    Parameters:
    -----------
    weights : np.array
        Vecteur des poids
    cov_matrix : np.array
        Matrice de covariance
    
    Returns:
    --------
    float : Variance du portefeuille
    """
    return weights.T @ cov_matrix @ weights


def portfolio_return(weights, expected_returns):
    """
    Calcule le rendement attendu du portefeuille.
    
    Parameters:
    -----------
    weights : np.array
        Vecteur des poids
    expected_returns : np.array
        Vecteur des rendements attendus
    
    Returns:
    --------
    float : Rendement attendu
    """
    return weights.T @ expected_returns


def portfolio_volatility(weights, cov_matrix):
    """
    Calcule la volatilite (ecart-type) du portefeuille.
    """
    return np.sqrt(portfolio_variance(weights, cov_matrix))


def portfolio_sharpe(weights, expected_returns, cov_matrix, risk_free_rate=0.02):
    """
    Calcule le Sharpe Ratio du portefeuille.
    """
    ret = portfolio_return(weights, expected_returns)
    vol = portfolio_volatility(weights, cov_matrix)
    return (ret - risk_free_rate) / vol if vol > 0 else 0


# Calculer rendements et covariance historiques
expected_returns = returns.mean().values * 252  # Annualise
cov_matrix = returns.cov().values * 252  # Annualise

print("Rendements attendus annualises:")
for i, asset in enumerate(returns.columns):
    print(f"  {asset}: {expected_returns[i]:.2%}")

print(f"\nVolatilites annualisees:")
for i, asset in enumerate(returns.columns):
    print(f"  {asset}: {np.sqrt(cov_matrix[i,i]):.2%}")

In [None]:
def minimum_variance_portfolio(cov_matrix, allow_short=False):
    """
    Trouve le portefeuille a variance minimale (MVP).
    
    Parameters:
    -----------
    cov_matrix : np.array
        Matrice de covariance
    allow_short : bool
        Autoriser les positions short
    
    Returns:
    --------
    np.array : Poids optimaux
    """
    n_assets = len(cov_matrix)
    
    # Fonction objectif
    def objective(w):
        return portfolio_variance(w, cov_matrix)
    
    # Contrainte: somme des poids = 1
    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
    
    # Bornes
    if allow_short:
        bounds = [(-1, 1) for _ in range(n_assets)]
    else:
        bounds = [(0, 1) for _ in range(n_assets)]
    
    # Poids initiaux
    w0 = np.ones(n_assets) / n_assets
    
    # Optimisation
    result = minimize(objective, w0, method='SLSQP', bounds=bounds, constraints=constraints)
    
    return result.x if result.success else w0


def maximum_sharpe_portfolio(expected_returns, cov_matrix, risk_free_rate=0.02, allow_short=False):
    """
    Trouve le portefeuille a Sharpe Ratio maximal (tangent portfolio).
    
    Parameters:
    -----------
    expected_returns : np.array
        Rendements attendus
    cov_matrix : np.array
        Matrice de covariance
    risk_free_rate : float
        Taux sans risque
    allow_short : bool
        Autoriser les positions short
    
    Returns:
    --------
    np.array : Poids optimaux
    """
    n_assets = len(expected_returns)
    
    # Fonction objectif (minimiser le Sharpe negatif)
    def neg_sharpe(w):
        return -portfolio_sharpe(w, expected_returns, cov_matrix, risk_free_rate)
    
    # Contrainte: somme des poids = 1
    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
    
    # Bornes
    if allow_short:
        bounds = [(-1, 1) for _ in range(n_assets)]
    else:
        bounds = [(0, 1) for _ in range(n_assets)]
    
    # Poids initiaux
    w0 = np.ones(n_assets) / n_assets
    
    # Optimisation
    result = minimize(neg_sharpe, w0, method='SLSQP', bounds=bounds, constraints=constraints)
    
    return result.x if result.success else w0


# Calculer les portefeuilles optimaux
mvp_weights = minimum_variance_portfolio(cov_matrix)
msr_weights = maximum_sharpe_portfolio(expected_returns, cov_matrix)

print("Portefeuille a Variance Minimale (MVP):")
for i, asset in enumerate(returns.columns):
    if mvp_weights[i] > 0.01:
        print(f"  {asset}: {mvp_weights[i]:.1%}")

print(f"\n  Rendement: {portfolio_return(mvp_weights, expected_returns):.2%}")
print(f"  Volatilite: {portfolio_volatility(mvp_weights, cov_matrix):.2%}")
print(f"  Sharpe: {portfolio_sharpe(mvp_weights, expected_returns, cov_matrix):.2f}")

print("\nPortefeuille a Sharpe Maximum (MSR):")
for i, asset in enumerate(returns.columns):
    if msr_weights[i] > 0.01:
        print(f"  {asset}: {msr_weights[i]:.1%}")

print(f"\n  Rendement: {portfolio_return(msr_weights, expected_returns):.2%}")
print(f"  Volatilite: {portfolio_volatility(msr_weights, cov_matrix):.2%}")
print(f"  Sharpe: {portfolio_sharpe(msr_weights, expected_returns, cov_matrix):.2f}")

In [None]:
def efficient_frontier(expected_returns, cov_matrix, num_portfolios=50, allow_short=False):
    """
    Calcule la frontiere efficiente.
    
    Parameters:
    -----------
    expected_returns : np.array
        Rendements attendus
    cov_matrix : np.array
        Matrice de covariance
    num_portfolios : int
        Nombre de portefeuilles sur la frontiere
    allow_short : bool
        Autoriser les positions short
    
    Returns:
    --------
    pd.DataFrame : Portefeuilles sur la frontiere
    """
    n_assets = len(expected_returns)
    results = []
    
    # Range de rendements cibles
    min_ret = expected_returns.min()
    max_ret = expected_returns.max()
    target_returns = np.linspace(min_ret, max_ret, num_portfolios)
    
    for target_return in target_returns:
        # Fonction objectif: minimiser variance
        def objective(w):
            return portfolio_variance(w, cov_matrix)
        
        # Contraintes
        constraints = [
            {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
            {'type': 'eq', 'fun': lambda w, tr=target_return: portfolio_return(w, expected_returns) - tr}
        ]
        
        # Bornes
        if allow_short:
            bounds = [(-1, 1) for _ in range(n_assets)]
        else:
            bounds = [(0, 1) for _ in range(n_assets)]
        
        # Poids initiaux
        w0 = np.ones(n_assets) / n_assets
        
        # Optimisation
        result = minimize(objective, w0, method='SLSQP', bounds=bounds, constraints=constraints)
        
        if result.success:
            weights = result.x
            ret = portfolio_return(weights, expected_returns)
            vol = portfolio_volatility(weights, cov_matrix)
            sharpe = portfolio_sharpe(weights, expected_returns, cov_matrix)
            
            results.append({
                'return': ret,
                'volatility': vol,
                'sharpe': sharpe,
                'weights': weights
            })
    
    return pd.DataFrame(results)

# Calculer la frontiere efficiente
frontier = efficient_frontier(expected_returns, cov_matrix, num_portfolios=50)

print(f"Frontiere efficiente calculee: {len(frontier)} portefeuilles")

In [None]:
# Visualisation de la frontiere efficiente
fig, ax = plt.subplots(figsize=(12, 8))

# Actifs individuels
for i, asset in enumerate(returns.columns):
    ax.scatter(np.sqrt(cov_matrix[i, i]) * 100, expected_returns[i] * 100, 
               s=100, marker='o', label=asset, alpha=0.7)

# Frontiere efficiente
ax.plot(frontier['volatility'] * 100, frontier['return'] * 100, 
        'b-', linewidth=3, label='Frontiere Efficiente')

# Portefeuille equipondere
eq_weights = np.ones(len(expected_returns)) / len(expected_returns)
eq_ret = portfolio_return(eq_weights, expected_returns) * 100
eq_vol = portfolio_volatility(eq_weights, cov_matrix) * 100
ax.scatter(eq_vol, eq_ret, s=200, marker='s', color='gray', 
           label=f'Equal Weight (Sharpe: {portfolio_sharpe(eq_weights, expected_returns, cov_matrix):.2f})', zorder=5)

# MVP
mvp_ret = portfolio_return(mvp_weights, expected_returns) * 100
mvp_vol = portfolio_volatility(mvp_weights, cov_matrix) * 100
ax.scatter(mvp_vol, mvp_ret, s=200, marker='*', color='green', 
           label=f'Min Variance (Sharpe: {portfolio_sharpe(mvp_weights, expected_returns, cov_matrix):.2f})', zorder=5)

# MSR
msr_ret = portfolio_return(msr_weights, expected_returns) * 100
msr_vol = portfolio_volatility(msr_weights, cov_matrix) * 100
ax.scatter(msr_vol, msr_ret, s=200, marker='*', color='red', 
           label=f'Max Sharpe (Sharpe: {portfolio_sharpe(msr_weights, expected_returns, cov_matrix):.2f})', zorder=5)

ax.set_xlabel('Volatilite (%)', fontsize=12)
ax.set_ylabel('Rendement Attendu (%)', fontsize=12)
ax.set_title('Frontiere Efficiente de Markowitz', fontsize=14, fontweight='bold')
ax.legend(loc='upper left', fontsize=9)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Limites de Mean-Variance

| Probleme | Description | Impact |
|----------|-------------|--------|
| **Estimation error** | Les rendements attendus sont tres bruyants | Portefeuilles concentres, instables |
| **Matrice mal conditionnee** | Avec peu de donnees, la covariance est mal estimee | Poids extremes |
| **Sensibilite aux inputs** | Petits changements -> grands changements de poids | Turnover eleve |
| **In-sample overfitting** | Optimise sur le passe, pas le futur | Sous-performance out-of-sample |

---

## Partie 2 : Estimation de Covariance et Shrinkage (10 min)

### Le probleme de l'estimation

La matrice de covariance **echantillon** est un estimateur non-biaise mais **haute variance**. Avec $N$ actifs et $T$ observations :

- Si $T < N$ : La matrice est **singuliere** (non inversible)
- Si $T \approx N$ : La matrice est **mal conditionnee**
- Meme si $T >> N$ : Forte erreur d'estimation

### Solution : Shrinkage (Ledoit-Wolf)

L'idee est de "contracter" la matrice echantillon vers une matrice cible plus stable :

$$\Sigma_{shrunk} = (1-\alpha) \times \Sigma_{sample} + \alpha \times \Sigma_{target}$$

La cible est souvent la **matrice identite** (actifs non correles) ou une **matrice a correlation constante**.

In [None]:
def compare_covariance_estimators(returns_df):
    """
    Compare la covariance echantillon vs Ledoit-Wolf.
    
    Parameters:
    -----------
    returns_df : pd.DataFrame
        DataFrame des rendements
    
    Returns:
    --------
    tuple : (cov_sample, cov_shrunk, shrinkage_coef)
    """
    # Covariance echantillon
    cov_sample = returns_df.cov().values * 252
    
    # Covariance Ledoit-Wolf
    lw = LedoitWolf()
    lw.fit(returns_df.values)
    cov_shrunk = lw.covariance_ * 252
    shrinkage_coef = lw.shrinkage_
    
    return cov_sample, cov_shrunk, shrinkage_coef

# Comparer les estimateurs
cov_sample, cov_shrunk, shrinkage = compare_covariance_estimators(returns)

print(f"Coefficient de shrinkage: {shrinkage:.3f}")
print(f"  (0 = covariance echantillon pure)")
print(f"  (1 = matrice cible pure)")

# Comparer les condition numbers
cond_sample = np.linalg.cond(cov_sample)
cond_shrunk = np.linalg.cond(cov_shrunk)

print(f"\nCondition Number (mesure de stabilite):")
print(f"  Sample: {cond_sample:.1f}")
print(f"  Shrunk: {cond_shrunk:.1f}")
print(f"  Reduction: {(1 - cond_shrunk/cond_sample)*100:.1f}%")

In [None]:
# Visualisation de l'effet du shrinkage
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Covariance echantillon
im1 = axes[0].imshow(cov_sample, cmap='RdBu_r', aspect='auto')
axes[0].set_title('Covariance Echantillon', fontsize=12, fontweight='bold')
plt.colorbar(im1, ax=axes[0])

# Covariance shrunk
im2 = axes[1].imshow(cov_shrunk, cmap='RdBu_r', aspect='auto')
axes[1].set_title(f'Covariance Shrunk (alpha={shrinkage:.2f})', fontsize=12, fontweight='bold')
plt.colorbar(im2, ax=axes[1])

# Difference
diff = cov_sample - cov_shrunk
im3 = axes[2].imshow(diff, cmap='RdBu_r', aspect='auto')
axes[2].set_title('Difference (Sample - Shrunk)', fontsize=12, fontweight='bold')
plt.colorbar(im3, ax=axes[2])

plt.tight_layout()
plt.show()

# Comparer les portefeuilles optimaux
print("\nComparaison des portefeuilles Max Sharpe:")

msr_sample = maximum_sharpe_portfolio(expected_returns, cov_sample)
msr_shrunk = maximum_sharpe_portfolio(expected_returns, cov_shrunk)

print(f"\n{'Actif':<12} {'Sample':>10} {'Shrunk':>10} {'Diff':>10}")
print("-" * 45)
for i, asset in enumerate(returns.columns):
    print(f"{asset:<12} {msr_sample[i]:>9.1%} {msr_shrunk[i]:>9.1%} {msr_shrunk[i]-msr_sample[i]:>9.1%}")

---

## Partie 3 : ML pour Expected Returns (15 min)

### Pourquoi utiliser le ML ?

L'estimation naive des rendements attendus (moyenne historique) est tres bruyante. Le ML permet de :

1. **Incorporer des features predictives** (momentum, valuation, sentiment)
2. **Capturer des relations non-lineaires**
3. **S'adapter aux changements de regime**

### Approche

```
Features (X)          Model           Expected Returns (y)
- Momentum        ->  Random Forest  ->  Prediction rendement
- Volatilite         XGBoost           prochain mois
- RSI                Neural Net
- Valuation
```

In [None]:
def create_features_for_returns(prices_df, lookback=20):
    """
    Cree des features pour predire les rendements futurs.
    
    Parameters:
    -----------
    prices_df : pd.DataFrame
        DataFrame des prix
    lookback : int
        Periode de lookback pour les features
    
    Returns:
    --------
    pd.DataFrame : Features par actif et par date
    """
    features_list = []
    
    for asset in prices_df.columns:
        prices = prices_df[asset]
        returns = prices.pct_change()
        
        # Features
        df_feat = pd.DataFrame(index=prices_df.index)
        df_feat['asset'] = asset
        
        # Momentum (rendements passes)
        df_feat['return_1d'] = returns
        df_feat['return_5d'] = returns.rolling(5).sum()
        df_feat['return_20d'] = returns.rolling(20).sum()
        
        # Volatilite
        df_feat['volatility_20d'] = returns.rolling(20).std()
        
        # Mean reversion (distance a la moyenne)
        sma_20 = prices.rolling(20).mean()
        df_feat['price_to_sma'] = prices / sma_20 - 1
        
        # Skewness et Kurtosis
        df_feat['skewness'] = returns.rolling(20).skew()
        df_feat['kurtosis'] = returns.rolling(20).kurt()
        
        # Target: rendement futur sur 20 jours
        df_feat['future_return_20d'] = returns.shift(-20).rolling(20).sum()
        
        features_list.append(df_feat)
    
    features_df = pd.concat(features_list)
    features_df = features_df.dropna()
    
    return features_df

# Creer les features
features_df = create_features_for_returns(prices)

print(f"Features creees: {len(features_df)} observations")
print(f"\nColonnes:")
print(features_df.columns.tolist())
print(f"\nApercu:")
print(features_df.head())

In [None]:
def predict_returns_ml(features_df, model_type='random_forest'):
    """
    Entraine un modele ML pour predire les rendements.
    
    Parameters:
    -----------
    features_df : pd.DataFrame
        DataFrame avec features et target
    model_type : str
        Type de modele ('random_forest')
    
    Returns:
    --------
    dict : Rendements predits par actif
    """
    feature_cols = ['return_1d', 'return_5d', 'return_20d', 'volatility_20d', 
                    'price_to_sma', 'skewness', 'kurtosis']
    
    # Separer features et target
    X = features_df[feature_cols]
    y = features_df['future_return_20d']
    
    # Train/Test split temporel (80/20)
    split_idx = int(len(X) * 0.8)
    X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
    y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]
    
    # Normalisation
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Modele
    if model_type == 'random_forest':
        model = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)
    
    model.fit(X_train_scaled, y_train)
    
    # Predictions sur les dernieres observations
    predictions = {}
    last_features = features_df.groupby('asset').last()[feature_cols]
    last_scaled = scaler.transform(last_features)
    
    for i, asset in enumerate(last_features.index):
        pred = model.predict(last_scaled[i:i+1])[0]
        predictions[asset] = pred * 12  # Annualiser (20 jours -> annuel)
    
    # Evaluation
    train_score = model.score(X_train_scaled, y_train)
    test_score = model.score(X_test_scaled, y_test)
    
    # Feature importance
    importance = pd.DataFrame({
        'feature': feature_cols,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)
    
    return predictions, {'train_r2': train_score, 'test_r2': test_score}, importance

# Predire les rendements
ml_returns, scores, importance = predict_returns_ml(features_df)

print("Rendements Predits par ML (annualises):")
for asset, ret in sorted(ml_returns.items(), key=lambda x: x[1], reverse=True):
    print(f"  {asset}: {ret:.2%}")

print(f"\nScores du modele:")
print(f"  R2 Train: {scores['train_r2']:.3f}")
print(f"  R2 Test: {scores['test_r2']:.3f}")

print(f"\nFeature Importance:")
print(importance.to_string(index=False))

In [None]:
# Comparer les rendements historiques vs ML
fig, ax = plt.subplots(figsize=(12, 6))

assets = list(returns.columns)
hist_returns = [expected_returns[i] for i in range(len(assets))]
ml_returns_list = [ml_returns[a] for a in assets]

x = np.arange(len(assets))
width = 0.35

bars1 = ax.bar(x - width/2, [r*100 for r in hist_returns], width, label='Historique', color='steelblue')
bars2 = ax.bar(x + width/2, [r*100 for r in ml_returns_list], width, label='ML Prediction', color='coral')

ax.set_xlabel('Actif', fontsize=12)
ax.set_ylabel('Rendement Attendu (%)', fontsize=12)
ax.set_title('Rendements Attendus: Historique vs ML', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(assets, rotation=45)
ax.legend()
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

---

## Partie 4 : Black-Litterman avec vues ML (20 min)

### Le modele Black-Litterman

Le modele **Black-Litterman** (1992) combine :

1. **Prior** : Rendements d'equilibre du marche (derives des poids de marche)
2. **Views** : Vues de l'investisseur sur certains actifs

Cela produit des **rendements posterieurs** plus stables que l'optimisation classique.

### Formulation

**Prior (equilibre):**
$$\pi = \delta \Sigma w_{mkt}$$

**Posterior:**
$$E[R] = [(\tau\Sigma)^{-1} + P'\Omega^{-1}P]^{-1} [(\tau\Sigma)^{-1}\pi + P'\Omega^{-1}Q]$$

Ou:
- $\pi$ : Rendements d'equilibre
- $P$ : Matrice des vues (quels actifs)
- $Q$ : Vecteur des vues (rendements attendus)
- $\Omega$ : Incertitude sur les vues
- $\tau$ : Scaling factor (~0.05)

In [None]:
def black_litterman(
    market_cap_weights,
    sigma,
    P,
    Q,
    omega,
    tau=0.05,
    risk_aversion=2.5
):
    """
    Implemente le modele Black-Litterman.
    
    Parameters:
    -----------
    market_cap_weights : np.array
        Poids de capitalisation de marche
    sigma : np.array
        Matrice de covariance
    P : np.array
        Matrice des vues (K x N)
    Q : np.array
        Vecteur des vues (K x 1)
    omega : np.array
        Matrice d'incertitude des vues (K x K)
    tau : float
        Scaling factor pour la covariance
    risk_aversion : float
        Coefficient d'aversion au risque
    
    Returns:
    --------
    np.array : Rendements posterieurs
    """
    # Prior: rendements d'equilibre
    pi = risk_aversion * sigma @ market_cap_weights
    
    # Matrice de covariance scalee
    tau_sigma = tau * sigma
    tau_sigma_inv = np.linalg.inv(tau_sigma)
    
    # Inverse de omega
    omega_inv = np.linalg.inv(omega)
    
    # Posterior covariance
    M = np.linalg.inv(tau_sigma_inv + P.T @ omega_inv @ P)
    
    # Posterior returns
    posterior_returns = M @ (tau_sigma_inv @ pi + P.T @ omega_inv @ Q)
    
    return posterior_returns, pi


def create_ml_views(ml_predictions, confidence_scaling=0.5):
    """
    Cree des vues Black-Litterman a partir des predictions ML.
    
    Parameters:
    -----------
    ml_predictions : dict
        Predictions ML par actif
    confidence_scaling : float
        Factor pour l'incertitude (plus eleve = moins confiant)
    
    Returns:
    --------
    tuple : (P, Q, omega)
    """
    n_assets = len(ml_predictions)
    assets = list(ml_predictions.keys())
    
    # Matrice des vues (identite = une vue par actif)
    P = np.eye(n_assets)
    
    # Vecteur des vues
    Q = np.array([ml_predictions[a] for a in assets])
    
    # Incertitude: proportionnelle a l'ecart-type des predictions
    # Plus simple: diagonale avec variance proportionnelle a |Q|
    uncertainties = confidence_scaling * np.abs(Q) + 0.01  # Minimum 1%
    omega = np.diag(uncertainties ** 2)
    
    return P, Q, omega, assets

# Poids de marche simules (equal weight comme proxy)
market_weights = np.ones(len(returns.columns)) / len(returns.columns)

# Creer les vues ML
P, Q, omega, assets = create_ml_views(ml_returns, confidence_scaling=0.3)

print("Vues ML (Q):")
for i, asset in enumerate(assets):
    print(f"  {asset}: {Q[i]:.2%} (incertitude: {np.sqrt(omega[i,i]):.2%})")

In [None]:
# Appliquer Black-Litterman
posterior_returns, equilibrium_returns = black_litterman(
    market_weights,
    cov_shrunk,
    P,
    Q,
    omega,
    tau=0.05
)

print("Comparaison des rendements:")
print(f"\n{'Actif':<12} {'Equilibre':>12} {'ML View':>12} {'Posterior':>12}")
print("-" * 50)
for i, asset in enumerate(assets):
    print(f"{asset:<12} {equilibrium_returns[i]:>11.2%} {Q[i]:>11.2%} {posterior_returns[i]:>11.2%}")

# Optimiser avec les rendements posterieurs
bl_weights = maximum_sharpe_portfolio(posterior_returns, cov_shrunk)

print("\nPortefeuille Black-Litterman:")
for i, asset in enumerate(assets):
    if bl_weights[i] > 0.01:
        print(f"  {asset}: {bl_weights[i]:.1%}")

print(f"\n  Rendement attendu: {portfolio_return(bl_weights, posterior_returns):.2%}")
print(f"  Volatilite: {portfolio_volatility(bl_weights, cov_shrunk):.2%}")
print(f"  Sharpe: {portfolio_sharpe(bl_weights, posterior_returns, cov_shrunk):.2f}")

In [None]:
# Visualisation Black-Litterman
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Rendements
ax1 = axes[0]
x = np.arange(len(assets))
width = 0.25

ax1.bar(x - width, equilibrium_returns * 100, width, label='Equilibre', color='steelblue')
ax1.bar(x, Q * 100, width, label='ML Views', color='coral')
ax1.bar(x + width, posterior_returns * 100, width, label='Posterior BL', color='green')

ax1.set_xlabel('Actif')
ax1.set_ylabel('Rendement (%)')
ax1.set_title('Black-Litterman: Rendements', fontsize=12, fontweight='bold')
ax1.set_xticks(x)
ax1.set_xticklabels(assets, rotation=45)
ax1.legend()
ax1.grid(True, alpha=0.3, axis='y')

# Poids
ax2 = axes[1]
eq_weights = np.ones(len(assets)) / len(assets)
mv_weights = maximum_sharpe_portfolio(expected_returns, cov_shrunk)

ax2.bar(x - width, eq_weights * 100, width, label='Equal Weight', color='gray')
ax2.bar(x, mv_weights * 100, width, label='Mean-Variance', color='orange')
ax2.bar(x + width, bl_weights * 100, width, label='Black-Litterman', color='green')

ax2.set_xlabel('Actif')
ax2.set_ylabel('Poids (%)')
ax2.set_title('Comparaison des Allocations', fontsize=12, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels(assets, rotation=45)
ax2.legend()
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

---

## Partie 5 : Hierarchical Risk Parity (HRP) (15 min)

### Probleme de Mean-Variance

L'optimisation Mean-Variance necessite l'**inversion de la matrice de covariance**, qui est instable avec :
- Peu de donnees
- Beaucoup d'actifs
- Actifs fortement correles

### Solution : HRP (Lopez de Prado, 2016)

HRP utilise le **clustering hierarchique** pour allouer le risque sans inverser la matrice :

```
1. Clustering           2. Quasi-Diagonalization     3. Recursive Bisection
   |                       |                             |
   Corrélation  -->        Réordonne actifs   -->        Alloue inversement
   distance                par clusters                  à la variance
```

### Avantages de HRP

| Aspect | Mean-Variance | HRP |
|--------|---------------|-----|
| Inversion matrice | Requise | Non requise |
| Stabilite | Instable | Stable |
| Concentration | Elevee | Diversifiee |
| Rendements attendus | Requis | Non utilises |

In [None]:
def hierarchical_risk_parity(returns_df):
    """
    Implemente Hierarchical Risk Parity.
    
    Parameters:
    -----------
    returns_df : pd.DataFrame
        DataFrame des rendements
    
    Returns:
    --------
    np.array : Poids HRP
    linkage_matrix : Matrice de linkage pour visualisation
    """
    # 1. Calculer la matrice de correlation
    corr = returns_df.corr()
    
    # 2. Convertir correlation en distance
    # Distance = sqrt((1 - correlation) / 2)
    dist = np.sqrt((1 - corr) / 2)
    
    # 3. Clustering hierarchique
    dist_condensed = squareform(dist.values)
    link = linkage(dist_condensed, method='ward')
    
    # 4. Quasi-diagonalization (reordonner les actifs)
    # Obtenir l'ordre des feuilles du dendrogramme
    from scipy.cluster.hierarchy import leaves_list
    sorted_idx = leaves_list(link)
    sorted_assets = [returns_df.columns[i] for i in sorted_idx]
    
    # 5. Recursive bisection
    cov = returns_df.cov()
    weights = _recursive_bisection(cov, sorted_idx)
    
    # Remettre dans l'ordre original
    final_weights = np.zeros(len(returns_df.columns))
    for i, idx in enumerate(sorted_idx):
        final_weights[idx] = weights[i]
    
    return final_weights, link, sorted_assets


def _recursive_bisection(cov, sorted_idx):
    """
    Recursive bisection pour HRP.
    Alloue les poids inversement a la variance dans chaque cluster.
    """
    n = len(sorted_idx)
    weights = np.ones(n)
    
    # Queue des clusters a traiter
    clusters = [list(range(n))]
    
    while len(clusters) > 0:
        cluster = clusters.pop(0)
        
        if len(cluster) == 1:
            continue
        
        # Diviser en deux moities
        mid = len(cluster) // 2
        left_cluster = cluster[:mid]
        right_cluster = cluster[mid:]
        
        # Calculer la variance de chaque sous-cluster
        left_idx = [sorted_idx[i] for i in left_cluster]
        right_idx = [sorted_idx[i] for i in right_cluster]
        
        left_var = _cluster_variance(cov.values, left_idx)
        right_var = _cluster_variance(cov.values, right_idx)
        
        # Allouer inversement a la variance
        total_var = left_var + right_var
        alpha = 1 - left_var / total_var if total_var > 0 else 0.5
        
        # Ajuster les poids
        for i in left_cluster:
            weights[i] *= alpha
        for i in right_cluster:
            weights[i] *= (1 - alpha)
        
        # Ajouter les sous-clusters a traiter
        if len(left_cluster) > 1:
            clusters.append(left_cluster)
        if len(right_cluster) > 1:
            clusters.append(right_cluster)
    
    return weights


def _cluster_variance(cov, indices):
    """
    Calcule la variance d'un cluster (inverse volatilite portfolio).
    """
    cov_sub = cov[np.ix_(indices, indices)]
    inv_var = 1 / np.diag(cov_sub)
    weights = inv_var / inv_var.sum()
    return weights @ cov_sub @ weights


# Appliquer HRP
hrp_weights, linkage_matrix, sorted_assets = hierarchical_risk_parity(returns)

print("Portefeuille HRP:")
for i, asset in enumerate(returns.columns):
    print(f"  {asset}: {hrp_weights[i]:.1%}")

print(f"\n  Rendement attendu: {portfolio_return(hrp_weights, expected_returns):.2%}")
print(f"  Volatilite: {portfolio_volatility(hrp_weights, cov_shrunk):.2%}")
print(f"  Sharpe: {portfolio_sharpe(hrp_weights, expected_returns, cov_shrunk):.2f}")

In [None]:
# Visualisation HRP
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Dendrogramme
ax1 = axes[0]
dendrogram(linkage_matrix, labels=list(returns.columns), ax=ax1, leaf_rotation=45)
ax1.set_title('Dendrogramme - Clustering Hierarchique', fontsize=12, fontweight='bold')
ax1.set_xlabel('Actif')
ax1.set_ylabel('Distance')

# Comparaison des poids
ax2 = axes[1]
x = np.arange(len(returns.columns))
width = 0.2

eq_w = np.ones(len(returns.columns)) / len(returns.columns)
mvp_w = minimum_variance_portfolio(cov_shrunk)

ax2.bar(x - width*1.5, eq_w * 100, width, label='Equal Weight', color='gray')
ax2.bar(x - width/2, mvp_w * 100, width, label='Min Variance', color='steelblue')
ax2.bar(x + width/2, bl_weights * 100, width, label='Black-Litterman', color='coral')
ax2.bar(x + width*1.5, hrp_weights * 100, width, label='HRP', color='green')

ax2.set_xlabel('Actif')
ax2.set_ylabel('Poids (%)')
ax2.set_title('Comparaison des Methodes d\'Allocation', fontsize=12, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels(returns.columns, rotation=45)
ax2.legend(loc='upper right')
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

---

## Partie 6 : Integration QuantConnect (10 min)

### ML-Enhanced Portfolio Construction Model

Integrons les techniques vues dans un **Portfolio Construction Model** QuantConnect.

In [None]:
# Code QuantConnect pour ML Portfolio Construction
# A copier dans l'IDE QuantConnect

qc_ml_portfolio_code = '''
from AlgorithmImports import *
from sklearn.covariance import LedoitWolf
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from scipy.optimize import minimize
import numpy as np
import pandas as pd

class MLPortfolioConstructionModel(PortfolioConstructionModel):
    """
    Portfolio Construction Model avec optimisation ML.
    
    Combine:
    - Ledoit-Wolf pour la covariance
    - Random Forest pour les expected returns
    - Mean-Variance ou HRP pour l'allocation
    """
    
    def __init__(self, 
                 rebalance_days=30,
                 lookback_days=252,
                 method='mean_variance',
                 use_ml_returns=True):
        """
        Parameters:
        -----------
        rebalance_days : int
            Jours entre chaque rebalancement
        lookback_days : int
            Jours d'historique pour les calculs
        method : str
            'mean_variance', 'hrp', ou 'black_litterman'
        use_ml_returns : bool
            Utiliser ML pour predire les rendements
        """
        self.rebalance_days = rebalance_days
        self.lookback_days = lookback_days
        self.method = method
        self.use_ml_returns = use_ml_returns
        self.last_rebalance = datetime.min
        self.scaler = StandardScaler()
        self.model = None
    
    def CreateTargets(self, algorithm, insights):
        """
        Cree les PortfolioTargets avec optimisation ML.
        """
        targets = []
        
        # Verifier rebalancement
        if (algorithm.Time - self.last_rebalance).days < self.rebalance_days:
            return targets
        
        self.last_rebalance = algorithm.Time
        
        # Filtrer insights actifs
        active_insights = [i for i in insights 
                          if i.Direction != InsightDirection.Flat]
        
        if len(active_insights) < 2:
            return targets
        
        symbols = [i.Symbol for i in active_insights]
        
        # Recuperer historique
        history = algorithm.History(symbols, self.lookback_days, Resolution.Daily)
        
        if history.empty:
            return targets
        
        try:
            # Calculer returns
            returns = history['close'].unstack(level=0).pct_change().dropna()
            
            if len(returns) < 60:
                return targets
            
            # Covariance Ledoit-Wolf
            lw = LedoitWolf()
            cov_matrix = lw.fit(returns).covariance_ * 252
            
            # Expected returns
            if self.use_ml_returns:
                expected_returns = self._predict_returns_ml(returns, algorithm)
            else:
                expected_returns = returns.mean().values * 252
            
            # Optimiser
            if self.method == 'mean_variance':
                weights = self._mean_variance_optimize(expected_returns, cov_matrix)
            elif self.method == 'hrp':
                weights = self._hrp_optimize(returns)
            else:
                weights = np.ones(len(symbols)) / len(symbols)
            
            # Creer targets
            for i, symbol in enumerate(symbols):
                if weights[i] > 0.01:  # Minimum 1%
                    targets.append(PortfolioTarget(symbol, weights[i]))
            
            algorithm.Debug(f"ML Portfolio: {len(targets)} targets")
            
        except Exception as e:
            algorithm.Debug(f"Error in ML Portfolio: {e}")
        
        return targets
    
    def _predict_returns_ml(self, returns, algorithm):
        """
        Predit les rendements avec Random Forest.
        """
        predicted = []
        
        for col in returns.columns:
            # Features simples
            ret = returns[col]
            features = pd.DataFrame({
                'mom_5': ret.rolling(5).sum(),
                'mom_20': ret.rolling(20).sum(),
                'vol_20': ret.rolling(20).std(),
            }).dropna()
            
            target = ret.shift(-20).rolling(20).sum()
            
            # Aligner
            df = pd.concat([features, target.rename('target')], axis=1).dropna()
            
            if len(df) < 50:
                predicted.append(ret.mean() * 252)
                continue
            
            X = df[['mom_5', 'mom_20', 'vol_20']].values
            y = df['target'].values
            
            # Train sur 80%
            split = int(len(X) * 0.8)
            
            model = RandomForestRegressor(n_estimators=50, max_depth=3, random_state=42)
            model.fit(X[:split], y[:split])
            
            # Predire avec derniere observation
            pred = model.predict(X[-1:])[0] * 12  # Annualiser
            predicted.append(pred)
        
        return np.array(predicted)
    
    def _mean_variance_optimize(self, expected_returns, cov_matrix):
        """
        Optimisation Mean-Variance (Max Sharpe).
        """
        n = len(expected_returns)
        
        def neg_sharpe(w):
            ret = w @ expected_returns
            vol = np.sqrt(w @ cov_matrix @ w)
            return -(ret - 0.02) / vol if vol > 0 else 0
        
        constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
        bounds = [(0, 0.25) for _ in range(n)]  # Max 25% par actif
        w0 = np.ones(n) / n
        
        result = minimize(neg_sharpe, w0, method='SLSQP', 
                         bounds=bounds, constraints=constraints)
        
        return result.x if result.success else w0
    
    def _hrp_optimize(self, returns):
        """
        Hierarchical Risk Parity.
        """
        cov = returns.cov()
        inv_var = 1 / np.diag(cov)
        weights = inv_var / inv_var.sum()
        return weights
    
    def OnSecuritiesChanged(self, algorithm, changes):
        for security in changes.RemovedSecurities:
            if security.Invested:
                algorithm.Liquidate(security.Symbol)
'''

print("MLPortfolioConstructionModel defini")
print("\nCaracteristiques:")
print("  - Covariance: Ledoit-Wolf Shrinkage")
print("  - Expected Returns: Random Forest (optionnel)")
print("  - Optimisation: Mean-Variance ou HRP")
print("  - Contraintes: Max 25% par actif")

---

## Partie 7 : Strategie Complete ML-Optimized (15 min)

### Architecture

```
Universe Selection (Top 50 Market Cap)
              |
              v
Alpha Model (Momentum + Value)
              |
              v
ML Portfolio Construction
  - Expected Returns: XGBoost
  - Covariance: Ledoit-Wolf
  - Optimization: Mean-Variance
              |
              v
Risk Management (Max DD 5%)
              |
              v
Execution (Immediate)
```

In [None]:
# Strategie complete ML-Optimized
# A copier dans l'IDE QuantConnect

qc_full_strategy = '''
from AlgorithmImports import *
from sklearn.covariance import LedoitWolf
from sklearn.ensemble import RandomForestRegressor
from scipy.optimize import minimize
import numpy as np
import pandas as pd

class MLOptimizedPortfolioAlgorithm(QCAlgorithm):
    """
    Strategie complete avec optimisation de portefeuille ML.
    
    - Universe: Top 50 par Market Cap
    - Alpha: Momentum (20/60 days)
    - Portfolio: ML-Enhanced Mean-Variance
    - Risk: Max 5% drawdown par position
    - Rebalancement: Mensuel
    """
    
    def Initialize(self):
        self.SetStartDate(2020, 1, 1)
        self.SetEndDate(2023, 12, 31)
        self.SetCash(100000)
        
        # Universe settings
        self.UniverseSettings.Resolution = Resolution.Daily
        self.num_stocks = 50
        
        # Add universe
        self.AddUniverse(self.CoarseFilter, self.FineFilter)
        
        # Set models
        self.SetAlpha(MomentumAlphaModel())
        self.SetPortfolioConstruction(MLPortfolioConstructionModel(
            rebalance_days=30,
            lookback_days=252,
            method='mean_variance',
            use_ml_returns=True
        ))
        self.SetExecution(ImmediateExecutionModel())
        self.SetRiskManagement(MaximumDrawdownPercentPerSecurity(0.05))
        
        # Schedule rebalancing log
        self.Schedule.On(
            self.DateRules.MonthStart(),
            self.TimeRules.AfterMarketOpen("SPY", 30),
            self.LogPortfolio
        )
    
    def CoarseFilter(self, coarse):
        filtered = [x for x in coarse
                   if x.HasFundamentalData
                   and x.Price > 10
                   and x.DollarVolume > 5000000]
        
        sorted_by_volume = sorted(filtered, 
                                  key=lambda x: x.DollarVolume, 
                                  reverse=True)
        return [x.Symbol for x in sorted_by_volume[:100]]
    
    def FineFilter(self, fine):
        filtered = [x for x in fine if x.MarketCap > 2e9]
        sorted_by_cap = sorted(filtered, 
                               key=lambda x: x.MarketCap, 
                               reverse=True)
        return [x.Symbol for x in sorted_by_cap[:self.num_stocks]]
    
    def LogPortfolio(self):
        holdings = [(s.Value, h.HoldingsValue / self.Portfolio.TotalPortfolioValue)
                   for s, h in self.Portfolio.items() if h.Invested]
        holdings.sort(key=lambda x: x[1], reverse=True)
        
        self.Debug(f"\n{self.Time.date()}: Portfolio Summary")
        self.Debug(f"  Total Value: ${self.Portfolio.TotalPortfolioValue:,.0f}")
        self.Debug(f"  Positions: {len(holdings)}")
        for sym, weight in holdings[:5]:
            self.Debug(f"    {sym}: {weight:.1%}")
    
    def OnEndOfAlgorithm(self):
        self.Debug("\n" + "="*60)
        self.Debug("FINAL SUMMARY - ML Optimized Portfolio")
        self.Debug("="*60)
        self.Debug(f"Final Value: ${self.Portfolio.TotalPortfolioValue:,.0f}")
        total_return = (self.Portfolio.TotalPortfolioValue / 100000 - 1) * 100
        self.Debug(f"Total Return: {total_return:.2f}%")


class MomentumAlphaModel(AlphaModel):
    """
    Alpha Model basee sur le momentum.
    """
    
    def __init__(self, short_period=20, long_period=60):
        self.short_period = short_period
        self.long_period = long_period
        self.securities = []
    
    def Update(self, algorithm, data):
        insights = []
        
        for symbol in self.securities:
            if not data.ContainsKey(symbol):
                continue
            
            history = algorithm.History(symbol, self.long_period + 5, Resolution.Daily)
            if history.empty or len(history) < self.long_period:
                continue
            
            try:
                prices = history['close']
                
                # Momentum signals
                mom_short = (prices.iloc[-1] / prices.iloc[-self.short_period] - 1)
                mom_long = (prices.iloc[-1] / prices.iloc[-self.long_period] - 1)
                
                # Combine signals
                combined_mom = 0.6 * mom_short + 0.4 * mom_long
                
                if combined_mom > 0.02:  # >2% momentum
                    direction = InsightDirection.Up
                    magnitude = min(combined_mom, 0.2)
                    confidence = min(abs(combined_mom) * 5, 1.0)
                elif combined_mom < -0.02:
                    direction = InsightDirection.Down
                    magnitude = min(abs(combined_mom), 0.2)
                    confidence = min(abs(combined_mom) * 5, 1.0)
                else:
                    continue
                
                insight = Insight.Price(
                    symbol,
                    timedelta(days=30),
                    direction,
                    magnitude,
                    confidence
                )
                insights.append(insight)
                
            except Exception:
                continue
        
        return insights
    
    def OnSecuritiesChanged(self, algorithm, changes):
        for security in changes.AddedSecurities:
            if security.Symbol not in self.securities:
                self.securities.append(security.Symbol)
        for security in changes.RemovedSecurities:
            if security.Symbol in self.securities:
                self.securities.remove(security.Symbol)
'''

print("MLOptimizedPortfolioAlgorithm defini")
print("\n" + "="*60)
print("RESUME DE LA STRATEGIE")
print("="*60)
print("\n1. Universe Selection:")
print("   - Coarse: Volume > $5M, Price > $10")
print("   - Fine: Top 50 par Market Cap (> $2B)")
print("\n2. Alpha Model:")
print("   - Momentum 20/60 jours")
print("   - Seuil: +/-2%")
print("\n3. Portfolio Construction:")
print("   - Covariance: Ledoit-Wolf")
print("   - Returns: Random Forest")
print("   - Optimization: Max Sharpe")
print("\n4. Risk Management:")
print("   - Max Drawdown: 5% par position")
print("\n5. Execution:")
print("   - Immediate Market Orders")
print("   - Rebalancement: Mensuel")

In [None]:
# Resume comparatif des methodes
print("\n" + "="*70)
print("COMPARAISON DES METHODES D'OPTIMISATION DE PORTEFEUILLE")
print("="*70)

methods_comparison = pd.DataFrame({
    'Methode': ['Equal Weight', 'Min Variance', 'Max Sharpe', 'Black-Litterman', 'HRP'],
    'Rendements Requis': ['Non', 'Non', 'Oui', 'Oui (vues)', 'Non'],
    'Covariance Requis': ['Non', 'Oui', 'Oui', 'Oui', 'Oui'],
    'Inversion Matrice': ['Non', 'Oui', 'Oui', 'Oui', 'Non'],
    'Stabilite': ['Haute', 'Moyenne', 'Basse', 'Moyenne', 'Haute'],
    'Diversification': ['Maximale', 'Bonne', 'Variable', 'Bonne', 'Bonne'],
    'ML Compatible': ['N/A', 'Non', 'Oui', 'Oui', 'Non']
})

print(methods_comparison.to_string(index=False))

---

## Conclusion et Prochaines Etapes

### Recapitulatif

Dans ce notebook, nous avons couvert :

| Sujet | Points Cles |
|-------|-------------|
| **Mean-Variance** | Fondation theorique, frontiere efficiente, sensible aux inputs |
| **Shrinkage** | Ledoit-Wolf stabilise la covariance, reduit le condition number |
| **ML Returns** | Random Forest pour predire les rendements, features momentum/vol |
| **Black-Litterman** | Combine equilibre marche + vues ML, rendements posterieurs |
| **HRP** | Pas d'inversion matrice, clustering hierarchique, stable |
| **Integration QC** | MLPortfolioConstructionModel, strategie complete |

### Recommandations pratiques

| Situation | Methode Recommandee |
|-----------|---------------------|
| Peu de donnees (<100 obs) | HRP ou Equal Weight |
| Beaucoup d'actifs (>50) | HRP avec Ledoit-Wolf |
| Alpha model sophistique | Black-Litterman avec vues ML |
| Production simple | Min Variance avec Shrinkage |
| Recherche/Backtest | Mean-Variance pour baseline |

### Limites a garder en tete

1. **Overfitting ML** : Les predictions de rendements sont tres bruyantes
2. **Regime changes** : Les correlations changent en periode de crise
3. **Transaction costs** : Le rebalancement frequent coute cher
4. **Estimation lag** : Les donnees historiques ne predisent pas le futur

### Ressources Complementaires

- [Advances in Financial ML](https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos/dp/1119482089) - Lopez de Prado
- [Machine Learning for Asset Managers](https://www.amazon.com/Machine-Learning-Managers-Elements-Quantitative/dp/1108792898) - Lopez de Prado
- [PyPortfolioOpt Documentation](https://pyportfolioopt.readthedocs.io/)
- [QuantConnect Portfolio Optimization](https://www.quantconnect.com/docs/v2/writing-algorithms/algorithm-framework/portfolio-construction)

---

**Notebook complete. Vous maitrisez maintenant l'optimisation de portefeuille avec ML.**