# üöÄ Google Colab Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ogautier1980/sandbox-ml/blob/main/cours/03_regression/03_demo_regularisation.ipynb)

**Si vous ex√©cutez ce notebook sur Google Colab**, ex√©cutez la cellule suivante pour installer les d√©pendances.

In [None]:
# Installation des d√©pendances (Google Colab uniquement)import sysIN_COLAB = 'google.colab' in sys.modulesif IN_COLAB:    print('üì¶ Installation des packages...')        # Packages ML de base    !pip install -q numpy pandas matplotlib seaborn scikit-learn        # D√©tection du chapitre et installation des d√©pendances sp√©cifiques    notebook_name = '03_demo_regularisation.ipynb'  # Sera remplac√© automatiquement        # Ch 06-08 : Deep Learning    if any(x in notebook_name for x in ['06_', '07_', '08_']):        !pip install -q torch torchvision torchaudio        # Ch 08 : NLP    if '08_' in notebook_name:        !pip install -q transformers datasets tokenizers        if 'rag' in notebook_name:            !pip install -q sentence-transformers faiss-cpu rank-bm25        # Ch 09 : Reinforcement Learning    if '09_' in notebook_name:        !pip install -q gymnasium[classic-control]        # Ch 04 : Boosting    if '04_' in notebook_name and 'boosting' in notebook_name:        !pip install -q xgboost lightgbm catboost        # Ch 05 : Clustering avanc√©    if '05_' in notebook_name:        !pip install -q umap-learn        # Ch 11 : S√©ries temporelles    if '11_' in notebook_name:        !pip install -q statsmodels prophet        # Ch 12 : Vision avanc√©e    if '12_' in notebook_name:        !pip install -q ultralytics timm segmentation-models-pytorch        # Ch 13 : Recommandation    if '13_' in notebook_name:        !pip install -q scikit-surprise implicit        # Ch 14 : MLOps    if '14_' in notebook_name:        !pip install -q mlflow fastapi pydantic        print('‚úÖ Installation termin√©e !')else:    print('‚ÑπÔ∏è  Environnement local d√©tect√©, les packages sont d√©j√† install√©s.')

# Chapitre 03 - R√©gularisation (Ridge, Lasso, Elastic Net)

**Objectifs :**
- Comprendre le probl√®me du surapprentissage (overfitting)
- Ma√Ætriser la r√©gularisation L2 (Ridge)
- Ma√Ætriser la r√©gularisation L1 (Lasso)
- Comprendre Elastic Net (L1 + L2)
- S√©lectionner l'hyperparam√®tre optimal (Œª)
- Comparer les m√©thodes sur des cas pratiques

**Pr√©requis :** 03_demo_regression_lineaire.ipynb

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.data  # type: ignoresets import make_regression, load_diabetes
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, RidgeCV, LassoCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Configuration
np.random.seed(42)
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úÖ Imports r√©ussis")

## 1. Probl√®me du Surapprentissage

### 1.1 G√©n√©ration de donn√©es avec multicollin√©arit√©

In [None]:
# G√©n√©rer dataset avec features corr√©l√©es et bruit
X, y = make_regression(
    n_samples=100, 
    n_features=50,  # Beaucoup de features
    n_informative=10,  # Seulement 10 sont utiles
    noise=10, 
    random_state=42
)

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Normalisation (cruciale pour la r√©gularisation)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("=== Donn√©es G√©n√©r√©es ===")
print(f"√âchantillons : {X.shape[0]}")
print(f"Features : {X.shape[1]}")
print(f"Features informatives : 10")
print(f"\nTrain : {X_train.shape[0]} √©chantillons")
print(f"Test  : {X_test.shape[0]} √©chantillons")
print(f"\n‚ö†Ô∏è  Ratio √©chantillons/features : {X_train.shape[0]/X_train.shape[1]:.2f}")
print("(Risque de surapprentissage si < 10)")

### 1.2 R√©gression Lin√©aire Sans R√©gularisation

In [None]:
# R√©gression lin√©aire classique
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)

# Pr√©dictions
y_train_pred_lr = lr.predict(X_train_scaled)
y_test_pred_lr = lr.predict(X_test_scaled)

# M√©triques
train_r2_lr = r2_score(y_train, y_train_pred_lr)
test_r2_lr = r2_score(y_test, y_test_pred_lr)
train_rmse_lr = np.sqrt(mean_squared_error(y_train, y_train_pred_lr))
test_rmse_lr = np.sqrt(mean_squared_error(y_test, y_test_pred_lr))

print("=== R√©gression Lin√©aire (Sans R√©gularisation) ===")
print(f"Train R¬≤ : {train_r2_lr:.4f}")
print(f"Test R¬≤  : {test_r2_lr:.4f}")
print(f"√âcart    : {abs(train_r2_lr - test_r2_lr):.4f}")
print(f"\nTrain RMSE : {train_rmse_lr:.2f}")
print(f"Test RMSE  : {test_rmse_lr:.2f}")

# Analyse des coefficients
print(f"\n=== Analyse des Coefficients ===")
print(f"Nombre de coefficients : {len(lr.coef_)}")
print(f"Max coefficient : {np.max(np.abs(lr.coef_)):.2f}")
print(f"Coefficients > 50 : {np.sum(np.abs(lr.coef_) > 50)}")

if abs(train_r2_lr - test_r2_lr) > 0.1:
    print("\n‚ö†Ô∏è  SURAPPRENTISSAGE D√âTECT√â !")
    print("Solution : R√©gularisation (Ridge, Lasso, Elastic Net)")

## 2. R√©gularisation Ridge (L2)

**Formule :** Minimiser $\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p w_j^2$

**Effet :** R√©duit la magnitude des coefficients (shrinkage)

### 2.1 Ridge avec diff√©rentes valeurs de Œª (alpha)

In [None]:
# Tester plusieurs valeurs de alpha (Œª)
alphas = [0.01, 0.1, 1, 10, 100]
results_ridge = []

for alpha in alphas:
    ridge = Ridge(alpha=alpha)
    ridge.fit(X_train_scaled, y_train)
    
    train_r2 = ridge.score(X_train_scaled, y_train)
    test_r2 = ridge.score(X_test_scaled, y_test)
    
    results_ridge.append({
        'alpha': alpha,
        'train_r2': train_r2,
        'test_r2': test_r2,
        'diff': abs(train_r2 - test_r2),
        'max_coef': np.max(np.abs(ridge.coef_))
    })

df_ridge = pd.DataFrame(results_ridge)
print("=== Ridge : Impact de Alpha ===")
print(df_ridge.to_string(index=False))

# Visualisation
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1 : R¬≤ vs alpha
axes[0].plot(df_ridge['alpha'], df_ridge['train_r2'], 'o-', label='Train R¬≤', linewidth=2)
axes[0].plot(df_ridge['alpha'], df_ridge['test_r2'], 's-', label='Test R¬≤', linewidth=2)
axes[0].set_xscale('log')
axes[0].set_xlabel('Alpha (Œª)')
axes[0].set_ylabel('R¬≤')
axes[0].set_title('Ridge : Performance vs Alpha')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2 : Max coefficient vs alpha
axes[1].plot(df_ridge['alpha'], df_ridge['max_coef'], 'o-', linewidth=2, color='green')
axes[1].set_xscale('log')
axes[1].set_xlabel('Alpha (Œª)')
axes[1].set_ylabel('Max |Coefficient|')
axes[1].set_title('Ridge : Shrinkage des Coefficients')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nObservation : Alpha‚Üë ‚Üí Coefficients‚Üì ‚Üí Moins de surapprentissage")

### 2.2 Ridge optimal avec Validation Crois√©e

In [None]:
# RidgeCV : trouve automatiquement le meilleur alpha
alphas_cv = np.logspace(-3, 3, 50)
ridge_cv = RidgeCV(alphas=alphas_cv, cv=5, scoring='r2')
ridge_cv.fit(X_train_scaled, y_train)

best_alpha_ridge = ridge_cv.alpha_
test_r2_ridge = ridge_cv.score(X_test_scaled, y_test)

print("=== Ridge Optimal (avec CV) ===")
print(f"Meilleur alpha : {best_alpha_ridge:.4f}")
print(f"Test R¬≤ : {test_r2_ridge:.4f}")
print(f"\nComparaison :")
print(f"  Sans r√©gularisation : R¬≤={test_r2_lr:.4f}")
print(f"  Avec Ridge optimal  : R¬≤={test_r2_ridge:.4f}")
print(f"  Am√©lioration : {(test_r2_ridge - test_r2_lr)*100:.2f}%")

## 3. R√©gularisation Lasso (L1)

**Formule :** Minimiser $\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |w_j|$

**Effet :** Met certains coefficients exactement √† 0 (s√©lection de features)

### 3.1 Lasso avec diff√©rentes valeurs de Œª

In [None]:
# Tester plusieurs valeurs de alpha
results_lasso = []

for alpha in alphas:
    lasso = Lasso(alpha=alpha, max_iter=10000)
    lasso.fit(X_train_scaled, y_train)
    
    train_r2 = lasso.score(X_train_scaled, y_train)
    test_r2 = lasso.score(X_test_scaled, y_test)
    n_nonzero = np.sum(lasso.coef_ != 0)
    
    results_lasso.append({
        'alpha': alpha,
        'train_r2': train_r2,
        'test_r2': test_r2,
        'features_selected': n_nonzero,
        'sparsity': f"{n_nonzero}/{len(lasso.coef_)}"
    })

df_lasso = pd.DataFrame(results_lasso)
print("=== Lasso : Impact de Alpha ===")
print(df_lasso.to_string(index=False))

# Visualisation
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1 : R¬≤ vs alpha
axes[0].plot(df_lasso['alpha'], df_lasso['train_r2'], 'o-', label='Train R¬≤', linewidth=2)
axes[0].plot(df_lasso['alpha'], df_lasso['test_r2'], 's-', label='Test R¬≤', linewidth=2)
axes[0].set_xscale('log')
axes[0].set_xlabel('Alpha (Œª)')
axes[0].set_ylabel('R¬≤')
axes[0].set_title('Lasso : Performance vs Alpha')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2 : Nombre de features s√©lectionn√©es
axes[1].plot(df_lasso['alpha'], df_lasso['features_selected'], 'o-', linewidth=2, color='purple')
axes[1].set_xscale('log')
axes[1].set_xlabel('Alpha (Œª)')
axes[1].set_ylabel('Features S√©lectionn√©es')
axes[1].set_title('Lasso : S√©lection de Features')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚ú® Avantage Lasso : S√©lection automatique de features (certains coef = 0)")

### 3.2 Lasso optimal avec Validation Crois√©e

In [None]:
# LassoCV : trouve automatiquement le meilleur alpha
lasso_cv = LassoCV(alphas=alphas_cv, cv=5, max_iter=10000, random_state=42)
lasso_cv.fit(X_train_scaled, y_train)

best_alpha_lasso = lasso_cv.alpha_
test_r2_lasso = lasso_cv.score(X_test_scaled, y_test)
n_selected = np.sum(lasso_cv.coef_ != 0)

print("=== Lasso Optimal (avec CV) ===")
print(f"Meilleur alpha : {best_alpha_lasso:.4f}")
print(f"Test R¬≤ : {test_r2_lasso:.4f}")
print(f"Features s√©lectionn√©es : {n_selected}/{len(lasso_cv.coef_)}")
print(f"\nComparaison :")
print(f"  Sans r√©gularisation : R¬≤={test_r2_lr:.4f}, features={X.shape[1]}")
print(f"  Avec Lasso optimal  : R¬≤={test_r2_lasso:.4f}, features={n_selected}")

# Visualiser les coefficients
plt.figure(figsize=(12, 5))
plt.stem(range(len(lasso_cv.coef_)), lasso_cv.coef_, basefmt=" ")
plt.xlabel('Index de Feature')
plt.ylabel('Coefficient')
plt.title(f'Lasso : Coefficients (alpha={best_alpha_lasso:.4f}) - {n_selected} features non-nulles')
plt.axhline(y=0, color='r', linestyle='--', linewidth=1)
plt.grid(True, alpha=0.3, axis='y')
plt.show()

## 4. Elastic Net (L1 + L2)

**Formule :** Minimiser $\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda_1 \sum_{j=1}^p |w_j| + \lambda_2 \sum_{j=1}^p w_j^2$

**Param√®tres :**
- `alpha` : Force totale de r√©gularisation
- `l1_ratio` : Balance entre L1 et L2 (0=Ridge, 1=Lasso)

### 4.1 Grid Search pour Elastic Net

In [None]:
# Grid search sur alpha et l1_ratio
param_grid = {
    'alpha': np.logspace(-3, 1, 10),
    'l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9]
}

elastic_net = ElasticNet(max_iter=10000, random_state=42)
grid_search = GridSearchCV(
    elastic_net, param_grid, cv=5, scoring='r2', n_jobs=-1
)
grid_search.fit(X_train_scaled, y_train)

best_elastic = grid_search.best_estimator_
best_params = grid_search.best_params_
test_r2_elastic = best_elastic.score(X_test_scaled, y_test)
n_selected_elastic = np.sum(best_elastic.coef_ != 0)

print("=== Elastic Net Optimal ===")
print(f"Meilleur alpha : {best_params['alpha']:.4f}")
print(f"Meilleur l1_ratio : {best_params['l1_ratio']:.2f}")
print(f"Test R¬≤ : {test_r2_elastic:.4f}")
print(f"Features s√©lectionn√©es : {n_selected_elastic}/{len(best_elastic.coef_)}")

## 5. Comparaison Finale

### 5.1 Tableau Comparatif

In [None]:
# Comparer tous les mod√®les
models = {
    'Linear Regression': lr,
    'Ridge (optimal)': ridge_cv,
    'Lasso (optimal)': lasso_cv,
    'Elastic Net (optimal)': best_elastic
}

comparison = []
for name, model in models.items():
    train_r2 = model.score(X_train_scaled, y_train)
    test_r2 = model.score(X_test_scaled, y_test)
    
    y_pred = model.predict(X_test_scaled)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
    if hasattr(model, 'coef_'):
        n_features = np.sum(model.coef_ != 0)
    else:
        n_features = X.shape[1]
    
    comparison.append({
        'Mod√®le': name,
        'Train R¬≤': f"{train_r2:.4f}",
        'Test R¬≤': f"{test_r2:.4f}",
        'Test RMSE': f"{rmse:.2f}",
        'Features': f"{n_features}/{X.shape[1]}"
    })

df_comparison = pd.DataFrame(comparison)
print("=" * 80)
print("COMPARAISON FINALE DES M√âTHODES DE R√âGULARISATION")
print("=" * 80)
print(df_comparison.to_string(index=False))
print("=" * 80)

### 5.2 Visualisation des Coefficients

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

models_plot = [
    ('Linear Regression', lr, axes[0, 0]),
    ('Ridge', ridge_cv, axes[0, 1]),
    ('Lasso', lasso_cv, axes[1, 0]),
    ('Elastic Net', best_elastic, axes[1, 1])
]

for name, model, ax in models_plot:
    coef = model.coef_
    n_nonzero = np.sum(coef != 0)
    
    ax.stem(range(len(coef)), coef, basefmt=" ")
    ax.axhline(y=0, color='r', linestyle='--', linewidth=1)
    ax.set_xlabel('Feature Index')
    ax.set_ylabel('Coefficient')
    ax.set_title(f'{name}\n({n_nonzero} features non-nulles)')
    ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nObservation :")
print("- Linear Regression : Coefficients tr√®s variables")
print("- Ridge : Tous les coefficients r√©duits mais non-nuls")
print("- Lasso : Beaucoup de coefficients exactement √† 0")
print("- Elastic Net : Compromis entre Ridge et Lasso")

## 6. Cas Pratique : Dataset Diabetes

### 6.1 Application sur donn√©es r√©elles

In [None]:
# Charger dataset
diabetes = load_diabetes()
X_diab = diabetes.data  # type: ignore
y_diab = diabetes.target  # type: ignore

# Split et normalisation
X_train_diab, X_test_diab, y_train_diab, y_test_diab = train_test_split(
    X_diab, y_diab, test_size=0.2, random_state=42
)

scaler_diab = StandardScaler()
X_train_diab = scaler_diab.fit_transform(X_train_diab)
X_test_diab = scaler_diab.transform(X_test_diab)

print("=== Dataset Diabetes ===")
print(f"√âchantillons : {X_diab.shape[0]}")
print(f"Features : {X_diab.shape[1]}")
print(f"Features : {diabetes.feature_names  # type: ignore}")

In [None]:
# Comparer les 4 m√©thodes
models_diab = {
    'Linear Regression': LinearRegression(),
    'Ridge': RidgeCV(alphas=alphas_cv, cv=5),
    'Lasso': LassoCV(alphas=alphas_cv, cv=5, max_iter=10000),
    'Elastic Net': ElasticNet(alpha=0.1, l1_ratio=0.5, max_iter=10000)
}

results_diab = []
for name, model in models_diab.items():
    model.fit(X_train_diab, y_train_diab)
    
    train_r2 = model.score(X_train_diab, y_train_diab)
    test_r2 = model.score(X_test_diab, y_test_diab)
    
    y_pred_diab = model.predict(X_test_diab)
    rmse = np.sqrt(mean_squared_error(y_test_diab, y_pred_diab))
    
    if hasattr(model, 'coef_'):
        n_features = np.sum(model.coef_ != 0)
    else:
        n_features = 10
    
    results_diab.append({
        'Mod√®le': name,
        'Train R¬≤': train_r2,
        'Test R¬≤': test_r2,
        'Test RMSE': rmse,
        'Features': n_features
    })

df_diab = pd.DataFrame(results_diab)
print("\n=== R√©sultats sur Diabetes Dataset ===")
print(df_diab.to_string(index=False))

# Visualisation
plt.figure(figsize=(10, 6))
x_pos = np.arange(len(df_diab))
plt.bar(x_pos - 0.2, df_diab['Train R¬≤'], width=0.4, label='Train R¬≤', alpha=0.8)
plt.bar(x_pos + 0.2, df_diab['Test R¬≤'], width=0.4, label='Test R¬≤', alpha=0.8)
plt.xticks(x_pos, df_diab['Mod√®le'], rotation=45, ha='right')
plt.ylabel('R¬≤')
plt.title('Comparaison sur Dataset Diabetes')
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

## 7. R√©capitulatif et Guide de Choix

### Quand utiliser quelle m√©thode ?

| M√©thode | Quand l'utiliser ? | Avantages | Inconv√©nients |
|---------|-------------------|-----------|---------------|
| **Linear Regression** | Peu de features, donn√©es propres | Simple, interpr√©table | Surapprentissage si p > n |
| **Ridge (L2)** | Features corr√©l√©es, tous utiles | Stabilit√©, garde toutes features | Pas de s√©lection |
| **Lasso (L1)** | Beaucoup de features, s√©lection | S√©lection automatique, interpr√©table | Instable si features corr√©l√©es |
| **Elastic Net** | Features corr√©l√©es + s√©lection | Combine avantages L1 et L2 | 2 hyperparam√®tres |

### Points cl√©s :

1. **Toujours normaliser** les features avant r√©gularisation
2. **Validation crois√©e** pour choisir Œª (alpha)
3. **Ridge** : Bon point de d√©part par d√©faut
4. **Lasso** : Si besoin de s√©lection de features
5. **Elastic Net** : Si Lasso instable (features corr√©l√©es)

### Prochaine √©tape :

Voir **03_exercices.ipynb** pour mettre en pratique

In [None]:
print("‚úÖ Notebook termin√© !")
print("\nVous ma√Ætrisez maintenant :")
print("  - Ridge (L2 regularization)")
print("  - Lasso (L1 regularization)")
print("  - Elastic Net (L1 + L2)")
print("  - S√©lection d'hyperparam√®tres")
print("  - Comparaison et choix de m√©thode")