# üöÄ Google Colab Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ogautier1980/sandbox-ml/blob/main/cours/XX_CHAPTER/XX_NOTEBOOK.ipynb)

**Si vous ex√©cutez ce notebook sur Google Colab**, ex√©cutez la cellule suivante pour installer les d√©pendances.

In [None]:
# Installation des d√©pendances (Google Colab uniquement)import sysIN_COLAB = 'google.colab' in sys.modulesif IN_COLAB:    print('üì¶ Installation des packages...')        # Packages ML de base    !pip install -q numpy pandas matplotlib seaborn scikit-learn        # D√©tection du chapitre et installation des d√©pendances sp√©cifiques    notebook_name = '14_demo_monitoring_mlflow.ipynb'  # Sera remplac√© automatiquement        # Ch 06-08 : Deep Learning    if any(x in notebook_name for x in ['06_', '07_', '08_']):        !pip install -q torch torchvision torchaudio        # Ch 08 : NLP    if '08_' in notebook_name:        !pip install -q transformers datasets tokenizers        if 'rag' in notebook_name:            !pip install -q sentence-transformers faiss-cpu rank-bm25        # Ch 09 : Reinforcement Learning    if '09_' in notebook_name:        !pip install -q gymnasium[classic-control]        # Ch 04 : Boosting    if '04_' in notebook_name and 'boosting' in notebook_name:        !pip install -q xgboost lightgbm catboost        # Ch 05 : Clustering avanc√©    if '05_' in notebook_name:        !pip install -q umap-learn        # Ch 11 : S√©ries temporelles    if '11_' in notebook_name:        !pip install -q statsmodels prophet        # Ch 12 : Vision avanc√©e    if '12_' in notebook_name:        !pip install -q ultralytics timm segmentation-models-pytorch        # Ch 13 : Recommandation    if '13_' in notebook_name:        !pip install -q scikit-surprise implicit        # Ch 14 : MLOps    if '14_' in notebook_name:        !pip install -q mlflow fastapi pydantic        print('‚úÖ Installation termin√©e !')else:    print('‚ÑπÔ∏è  Environnement local d√©tect√©, les packages sont d√©j√† install√©s.')

# D√©monstration : Monitoring et Tracking avec MLflow

Ce notebook illustre l'utilisation de **MLflow** pour le tracking et la gestion d'exp√©riences ML :

1. **Tracking d'Exp√©riences** : Logging de params, metrics, artifacts
2. **Comparaison de Runs** : Comparaison de plusieurs entra√Ænements
3. **Model Registry** : Sauvegarde et versioning de mod√®les
4. **UI MLflow** : Interface web pour visualisation
5. **Autologging** : Logging automatique avec scikit-learn et PyTorch

**Dataset** : California Housing (r√©gression)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import Ridge, Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature
import warnings
warnings.filterwarnings('ignore')

# Configuration de visualisation
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Biblioth√®ques import√©es avec succ√®s !")
print(f"MLflow version: {mlflow.__version__}")

## 1. Configuration de MLflow

In [None]:
# Configuration de MLflow
import os

# R√©pertoire de tracking
mlflow_dir = '/tmp/mlflow'
os.makedirs(mlflow_dir, exist_ok=True)

# URI de tracking (filesystem local)
mlflow.set_tracking_uri(f"file://{mlflow_dir}")

# Nom de l'exp√©rience
experiment_name = "Housing_Price_Prediction"
mlflow.set_experiment(experiment_name)

print(f"MLflow configur√© !")
print(f"  Tracking URI: {mlflow.get_tracking_uri()}")
print(f"  Experiment: {experiment_name}")
print(f"\nPour lancer l'UI MLflow:")
print(f"  mlflow ui --backend-store-uri {mlflow_dir} --port 5000")
print(f"  Puis ouvrir: http://localhost:5000")

## 2. Chargement et Pr√©paration des Donn√©es

In [None]:
# Chargement du dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target
feature_names = housing.feature_names

# Split train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Dataset California Housing:")
print(f"  Train: {X_train.shape}")
print(f"  Test:  {X_test.shape}")
print(f"  Features: {feature_names}")

## 3. Tracking d'une Exp√©rience Simple (Manuel)

In [None]:
print("=" * 60)
print("EXP√âRIENCE 1 : Random Forest (Tracking Manuel)")
print("=" * 60)

# D√©marrer un run MLflow
with mlflow.start_run(run_name="RF_baseline") as run:
    # Hyperparam√®tres
    n_estimators = 100
    max_depth = 20
    min_samples_split = 5
    
    # Logging des param√®tres
    mlflow.log_param("model_type", "RandomForest")
    mlflow.log_param("n_estimators", n_estimators)
    mlflow.log_param("max_depth", max_depth)
    mlflow.log_param("min_samples_split", min_samples_split)
    mlflow.log_param("test_size", 0.2)
    
    # Entra√Ænement
    model = RandomForestRegressor(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        random_state=42,
        n_jobs=-1
    )
    model.fit(X_train, y_train)
    
    # Pr√©dictions
    y_pred_train = model.predict(X_train)
    y_pred_test = model.predict(X_test)
    
    # M√©triques
    train_r2 = r2_score(y_train, y_pred_train)
    train_rmse = np.sqrt(mean_squared_error(y_train, y_pred_train))
    test_r2 = r2_score(y_test, y_pred_test)
    test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))
    test_mae = mean_absolute_error(y_test, y_pred_test)
    
    # Logging des m√©triques
    mlflow.log_metric("train_r2", train_r2)
    mlflow.log_metric("train_rmse", train_rmse)
    mlflow.log_metric("test_r2", test_r2)
    mlflow.log_metric("test_rmse", test_rmse)
    mlflow.log_metric("test_mae", test_mae)
    mlflow.log_metric("overfit_score", train_r2 - test_r2)
    
    # Validation crois√©e
    cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='r2')
    mlflow.log_metric("cv_r2_mean", cv_scores.mean())
    mlflow.log_metric("cv_r2_std", cv_scores.std())
    
    # Logging du mod√®le avec signature
    signature = infer_signature(X_train, y_pred_train)
    mlflow.sklearn.log_model(model, "model", signature=signature)
    
    # Sauvegarde d'un graphique (artifact)
    fig, ax = plt.subplots(figsize=(8, 6))
    ax.scatter(y_test, y_pred_test, alpha=0.5, s=20)
    ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
    ax.set_xlabel('Valeurs R√©elles')
    ax.set_ylabel('Valeurs Pr√©dites')
    ax.set_title(f'RF Baseline - R¬≤ = {test_r2:.4f}')
    ax.grid(True, alpha=0.3)
    
    plot_path = '/tmp/rf_baseline_plot.png'
    plt.savefig(plot_path, dpi=100, bbox_inches='tight')
    plt.close()
    
    mlflow.log_artifact(plot_path, "plots")
    
    # Logging feature importance
    feature_importance = pd.DataFrame({
        'feature': feature_names,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)
    
    importance_path = '/tmp/feature_importance.csv'
    feature_importance.to_csv(importance_path, index=False)
    mlflow.log_artifact(importance_path, "data")
    
    # Tags
    mlflow.set_tag("model_type", "RandomForest")
    mlflow.set_tag("dataset", "California Housing")
    mlflow.set_tag("task", "regression")
    
    run_id = run.info.run_id
    
    print(f"\nRun ID: {run_id}")
    print(f"\nM√©triques:")
    print(f"  Train R¬≤:  {train_r2:.4f}")
    print(f"  Train RMSE: {train_rmse:.4f}")
    print(f"  Test R¬≤:   {test_r2:.4f}")
    print(f"  Test RMSE:  {test_rmse:.4f}")
    print(f"  Test MAE:   {test_mae:.4f}")
    print(f"  CV R¬≤:      {cv_scores.mean():.4f} (+/- {cv_scores.std():.4f})")

## 4. Tracking de Plusieurs Exp√©riences (Comparaison de Mod√®les)

In [None]:
print("=" * 60)
print("COMPARAISON DE MOD√àLES (4 mod√®les diff√©rents)")
print("=" * 60)

# D√©finition des mod√®les √† tester
models_config = [
    {
        "name": "RandomForest_50trees",
        "model": RandomForestRegressor(n_estimators=50, random_state=42, n_jobs=-1),
        "params": {"n_estimators": 50, "model_type": "RandomForest"}
    },
    {
        "name": "RandomForest_200trees",
        "model": RandomForestRegressor(n_estimators=200, max_depth=30, random_state=42, n_jobs=-1),
        "params": {"n_estimators": 200, "max_depth": 30, "model_type": "RandomForest"}
    },
    {
        "name": "GradientBoosting",
        "model": GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42),
        "params": {"n_estimators": 100, "learning_rate": 0.1, "model_type": "GradientBoosting"}
    },
    {
        "name": "Ridge",
        "model": Ridge(alpha=1.0),
        "params": {"alpha": 1.0, "model_type": "Ridge"}
    }
]

results = []

for config in models_config:
    with mlflow.start_run(run_name=config["name"]) as run:
        print(f"\nEntra√Ænement : {config['name']}")
        
        # Logging params
        for key, value in config["params"].items():
            mlflow.log_param(key, value)
        
        # Entra√Ænement
        model = config["model"]
        model.fit(X_train, y_train)
        
        # Pr√©dictions
        y_pred_test = model.predict(X_test)
        
        # M√©triques
        test_r2 = r2_score(y_test, y_pred_test)
        test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))
        test_mae = mean_absolute_error(y_test, y_pred_test)
        
        mlflow.log_metric("test_r2", test_r2)
        mlflow.log_metric("test_rmse", test_rmse)
        mlflow.log_metric("test_mae", test_mae)
        
        # Logging du mod√®le
        mlflow.sklearn.log_model(model, "model")
        
        # Tags
        mlflow.set_tag("model_type", config["params"]["model_type"])
        mlflow.set_tag("experiment_type", "model_comparison")
        
        results.append({
            "Model": config["name"],
            "Run ID": run.info.run_id,
            "R¬≤": test_r2,
            "RMSE": test_rmse,
            "MAE": test_mae
        })
        
        print(f"  R¬≤: {test_r2:.4f}, RMSE: {test_rmse:.4f}, MAE: {test_mae:.4f}")

# DataFrame de r√©sultats
results_df = pd.DataFrame(results)

print("\n" + "=" * 60)
print("R√âSULTATS DE LA COMPARAISON")
print("=" * 60)
print(results_df.to_string(index=False))

In [None]:
# Visualisation de la comparaison
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

colors = ['#3498db', '#2ecc71', '#e74c3c', '#f39c12']

# R¬≤
axes[0].bar(results_df['Model'], results_df['R¬≤'], color=colors, edgecolor='black', alpha=0.7)
axes[0].set_ylabel('R¬≤ Score')
axes[0].set_title('Comparaison R¬≤')
axes[0].set_ylim([0, 1])
axes[0].tick_params(axis='x', rotation=45)
for i, v in enumerate(results_df['R¬≤']):
    axes[0].text(i, v + 0.02, f"{v:.4f}", ha='center', fontsize=9)

# RMSE
axes[1].bar(results_df['Model'], results_df['RMSE'], color=colors, edgecolor='black', alpha=0.7)
axes[1].set_ylabel('RMSE')
axes[1].set_title('Comparaison RMSE (plus bas = mieux)')
axes[1].tick_params(axis='x', rotation=45)
for i, v in enumerate(results_df['RMSE']):
    axes[1].text(i, v + 0.01, f"{v:.4f}", ha='center', fontsize=9)

# MAE
axes[2].bar(results_df['Model'], results_df['MAE'], color=colors, edgecolor='black', alpha=0.7)
axes[2].set_ylabel('MAE')
axes[2].set_title('Comparaison MAE (plus bas = mieux)')
axes[2].tick_params(axis='x', rotation=45)
for i, v in enumerate(results_df['MAE']):
    axes[2].text(i, v + 0.01, f"{v:.4f}", ha='center', fontsize=9)

plt.tight_layout()
plt.show()

## 5. Autologging avec Scikit-Learn

In [None]:
print("=" * 60)
print("AUTOLOGGING SCIKIT-LEARN")
print("=" * 60)

# Activation de l'autologging
mlflow.sklearn.autolog(log_input_examples=True, log_model_signatures=True)

with mlflow.start_run(run_name="RF_autolog") as run:
    # L'autologging capture automatiquement params, metrics, et le mod√®le
    model = RandomForestRegressor(n_estimators=150, max_depth=25, random_state=42, n_jobs=-1)
    model.fit(X_train, y_train)
    
    # Pr√©dictions (automatiquement logg√©es)
    y_pred = model.predict(X_test)
    
    # M√©triques additionnelles (manuelles)
    test_r2 = r2_score(y_test, y_pred)
    mlflow.log_metric("custom_r2", test_r2)
    
    run_id = run.info.run_id
    
    print(f"\nRun ID: {run_id}")
    print(f"Autologging activ√© : params, metrics, et mod√®le logg√©s automatiquement !")
    print(f"Test R¬≤: {test_r2:.4f}")

# D√©sactivation de l'autologging
mlflow.sklearn.autolog(disable=True)
print("\nAutologging d√©sactiv√©.")

## 6. Recherche et Chargement de Runs

In [None]:
print("=" * 60)
print("RECHERCHE DE RUNS")
print("=" * 60)

# R√©cup√©ration de l'exp√©rience
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id

# Recherche de tous les runs de l'exp√©rience
runs = mlflow.search_runs(
    experiment_ids=[experiment_id],
    order_by=["metrics.test_r2 DESC"],
    max_results=10
)

print(f"\nNombre de runs trouv√©s: {len(runs)}")
print(f"\nTop 5 runs (par test_r2):")
print(runs[['run_id', 'tags.mlflow.runName', 'metrics.test_r2', 'metrics.test_rmse']].head())

In [None]:
# Chargement du meilleur mod√®le
print("\n" + "=" * 60)
print("CHARGEMENT DU MEILLEUR MOD√àLE")
print("=" * 60)

best_run_id = runs.iloc[0]['run_id']
best_run_name = runs.iloc[0]['tags.mlflow.runName']
best_r2 = runs.iloc[0]['metrics.test_r2']

print(f"\nMeilleur run:")
print(f"  Run ID: {best_run_id}")
print(f"  Name: {best_run_name}")
print(f"  Test R¬≤: {best_r2:.4f}")

# Chargement du mod√®le
model_uri = f"runs:/{best_run_id}/model"
loaded_model = mlflow.sklearn.load_model(model_uri)

print(f"\nMod√®le charg√© depuis MLflow !")
print(f"Type: {type(loaded_model).__name__}")

# Test de pr√©diction
sample_predictions = loaded_model.predict(X_test[:5])
print(f"\nPr√©dictions sur 5 √©chantillons:")
for i, (true_val, pred_val) in enumerate(zip(y_test[:5], sample_predictions), 1):
    print(f"  {i}. True: ${true_val * 100:.0f}k, Predicted: ${pred_val * 100:.0f}k")

## 7. Model Registry (Versioning de Mod√®les)

In [None]:
print("=" * 60)
print("MODEL REGISTRY")
print("=" * 60)

# Note: Le Model Registry n√©cessite un backend SQL (SQLite, MySQL, PostgreSQL)
# Avec un backend filesystem local, nous pouvons seulement loguer les mod√®les

model_name = "HousingPricePredictor"

try:
    # Enregistrement du meilleur mod√®le dans le registry
    model_uri = f"runs:/{best_run_id}/model"
    
    # Version 1 du mod√®le
    model_version = mlflow.register_model(model_uri, model_name)
    
    print(f"\nMod√®le enregistr√© dans le registry !")
    print(f"  Name: {model_name}")
    print(f"  Version: {model_version.version}")
    print(f"  Run ID: {best_run_id}")
    
    # Chargement depuis le registry
    loaded_model_registry = mlflow.sklearn.load_model(f"models:/{model_name}/{model_version.version}")
    print(f"\nMod√®le charg√© depuis le registry !")
    
except Exception as e:
    print(f"\nModel Registry non disponible avec backend filesystem.")
    print(f"Pour activer le Model Registry, utiliser un backend SQL:")
    print(f"  mlflow.set_tracking_uri('sqlite:///mlflow.db')")
    print(f"\nErreur: {e}")

## 8. Comparaison Visuelle des Runs

In [None]:
# Visualisation de l'historique des runs
print("=" * 60)
print("VISUALISATION DES RUNS")
print("=" * 60)

# Extraction des m√©triques de tous les runs
runs_data = []
for _, run in runs.iterrows():
    runs_data.append({
        'Run Name': run.get('tags.mlflow.runName', 'Unknown'),
        'R¬≤': run.get('metrics.test_r2', None),
        'RMSE': run.get('metrics.test_rmse', None),
        'MAE': run.get('metrics.test_mae', None)
    })

runs_viz_df = pd.DataFrame(runs_data).dropna()

print(f"\nRuns avec m√©triques compl√®tes: {len(runs_viz_df)}")

if len(runs_viz_df) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # R¬≤ par run
    axes[0, 0].barh(runs_viz_df['Run Name'], runs_viz_df['R¬≤'], color='steelblue', edgecolor='black', alpha=0.7)
    axes[0, 0].set_xlabel('R¬≤ Score')
    axes[0, 0].set_title('R¬≤ par Run')
    axes[0, 0].set_xlim([0, 1])
    axes[0, 0].invert_yaxis()
    
    # RMSE par run
    axes[0, 1].barh(runs_viz_df['Run Name'], runs_viz_df['RMSE'], color='coral', edgecolor='black', alpha=0.7)
    axes[0, 1].set_xlabel('RMSE')
    axes[0, 1].set_title('RMSE par Run')
    axes[0, 1].invert_yaxis()
    
    # MAE par run
    axes[1, 0].barh(runs_viz_df['Run Name'], runs_viz_df['MAE'], color='lightgreen', edgecolor='black', alpha=0.7)
    axes[1, 0].set_xlabel('MAE')
    axes[1, 0].set_title('MAE par Run')
    axes[1, 0].invert_yaxis()
    
    # Scatter R¬≤ vs RMSE
    axes[1, 1].scatter(runs_viz_df['R¬≤'], runs_viz_df['RMSE'], s=100, alpha=0.7, edgecolor='black')
    for i, txt in enumerate(runs_viz_df['Run Name']):
        axes[1, 1].annotate(txt, (runs_viz_df['R¬≤'].iloc[i], runs_viz_df['RMSE'].iloc[i]), 
                           fontsize=8, alpha=0.7, ha='right')
    axes[1, 1].set_xlabel('R¬≤')
    axes[1, 1].set_ylabel('RMSE')
    axes[1, 1].set_title('R¬≤ vs RMSE')
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("Pas assez de runs avec m√©triques compl√®tes pour visualisation.")

## 9. Grid Search avec MLflow Tracking

In [None]:
from sklearn.model_selection import GridSearchCV

print("=" * 60)
print("GRID SEARCH AVEC MLFLOW TRACKING")
print("=" * 60)

# Grid de param√®tres
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

print(f"\nNombre de combinaisons: {np.prod([len(v) for v in param_grid.values()])}")

with mlflow.start_run(run_name="GridSearch_RF") as parent_run:
    # GridSearchCV
    grid_search = GridSearchCV(
        RandomForestRegressor(random_state=42, n_jobs=-1),
        param_grid,
        cv=3,
        scoring='r2',
        n_jobs=-1,
        verbose=0
    )
    
    grid_search.fit(X_train, y_train)
    
    # Logging du meilleur mod√®le
    mlflow.log_params(grid_search.best_params_)
    mlflow.log_metric("best_cv_score", grid_search.best_score_)
    
    # Test set performance
    y_pred = grid_search.predict(X_test)
    test_r2 = r2_score(y_test, y_pred)
    test_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
    mlflow.log_metric("test_r2", test_r2)
    mlflow.log_metric("test_rmse", test_rmse)
    
    # Logging du mod√®le
    mlflow.sklearn.log_model(grid_search.best_estimator_, "best_model")
    
    # Logging de tous les r√©sultats du grid search
    cv_results_df = pd.DataFrame(grid_search.cv_results_)
    cv_results_path = '/tmp/grid_search_results.csv'
    cv_results_df.to_csv(cv_results_path, index=False)
    mlflow.log_artifact(cv_results_path, "grid_search")
    
    print(f"\nMeilleurs param√®tres:")
    for param, value in grid_search.best_params_.items():
        print(f"  {param}: {value}")
    print(f"\nMeilleur CV score (R¬≤): {grid_search.best_score_:.4f}")
    print(f"Test R¬≤: {test_r2:.4f}")
    print(f"Test RMSE: {test_rmse:.4f}")

## 10. R√©capitulatif et Instructions MLflow UI

In [None]:
print("=" * 60)
print("R√âCAPITULATIF MLFLOW")
print("=" * 60)

# Statistiques de l'exp√©rience
total_runs = len(runs)
best_r2 = runs['metrics.test_r2'].max()
avg_r2 = runs['metrics.test_r2'].mean()

print(f"\nExp√©rience: {experiment_name}")
print(f"  Total runs: {total_runs}")
print(f"  Meilleur R¬≤: {best_r2:.4f}")
print(f"  R¬≤ moyen: {avg_r2:.4f}")
print(f"\nTracking URI: {mlflow.get_tracking_uri()}")

print("\n" + "=" * 60)
print("LANCEMENT DE L'INTERFACE MLFLOW UI")
print("=" * 60)
print(f"\nPour visualiser toutes les exp√©riences dans l'UI MLflow:")
print(f"\n1. Ouvrir un terminal")
print(f"2. Ex√©cuter la commande:")
print(f"   mlflow ui --backend-store-uri {mlflow_dir} --port 5000")
print(f"\n3. Ouvrir dans un navigateur:")
print(f"   http://localhost:5000")
print(f"\n4. Fonctionnalit√©s de l'UI:")
print(f"   - Comparaison visuelle des runs")
   - Graphiques de m√©triques")
print(f"   - T√©l√©chargement d'artifacts")
print(f"   - Filtrage et recherche de runs")
print(f"   - Visualisation des param√®tres et m√©triques")
print(f"   - Model Registry (si backend SQL configur√©)")

print("\n" + "=" * 60)
print("FICHIERS G√âN√âR√âS")
print("=" * 60)
print(f"\nR√©pertoire MLflow: {mlflow_dir}")
print(f"  - Contient tous les runs et artifacts")
print(f"  - Mod√®les sauvegard√©s pour chaque run")
print(f"  - Graphiques et fichiers CSV")

## 11. Conclusion

### Points Cl√©s de MLflow

1. **Tracking** :
   - Logging de params, metrics, artifacts (plots, mod√®les, fichiers)
   - Organisation en exp√©riences et runs
   - Tags pour cat√©gorisation

2. **Autologging** :
   - Capture automatique avec scikit-learn, PyTorch, TensorFlow
   - Params, metrics, et mod√®les logg√©s sans code suppl√©mentaire

3. **Comparaison de Mod√®les** :
   - Recherche et tri de runs par m√©triques
   - Visualisation comparative dans l'UI
   - Identification du meilleur mod√®le

4. **Model Registry** :
   - Versioning de mod√®les
   - Stages : None, Staging, Production, Archived
   - Tra√ßabilit√© compl√®te

5. **Reproductibilit√©** :
   - Chargement de mod√®les depuis runs
   - Signature de mod√®le (input/output schema)
   - Environnement Python sauvegard√©

### Best Practices

- **Naming** : Utiliser des noms de runs descriptifs
- **Tags** : Taguer les runs (task, dataset, experiment_type)
- **Artifacts** : Sauvegarder plots, confusion matrices, feature importance
- **Backend** : Utiliser SQLite/PostgreSQL pour production (au lieu de filesystem)
- **Model Registry** : G√©rer le cycle de vie des mod√®les (dev ‚Üí staging ‚Üí prod)
- **CI/CD** : Int√©grer MLflow dans les pipelines d'entra√Ænement

### Alternatives √† MLflow

- **Weights & Biases (W&B)** : Interface plus moderne, cloud-first
- **Neptune.ai** : Collaboration d'√©quipe, int√©grations √©tendues
- **TensorBoard** : Int√©gr√© avec TensorFlow/PyTorch
- **Comet.ml** : Tracking + d√©ploiement + monitoring

### Ressources

- Documentation : https://mlflow.org/docs/latest/index.html
- Tutoriels : https://mlflow.org/docs/latest/tutorials-and-examples/index.html
- GitHub : https://github.com/mlflow/mlflow