# üöÄ ENTRA√éNEMENT COMPLET - Google Colab avec GPU

## Configuration Compl√®te pour l'Entra√Ænement du Robot de Navigation

Ce notebook permet un entra√Ænement complet avec:
- ‚úÖ **GPU gratuit** (T4/V100/A100)
- ‚úÖ **Sauvegarde automatique** sur Google Drive
- ‚úÖ **2000 √©pisodes** d'entra√Ænement
- ‚úÖ **Curriculum learning** (4 stages progressifs)
- ‚úÖ **TensorBoard** en temps r√©el
- ‚úÖ **Auto-resume** en cas de timeout
- ‚úÖ **Visualisations** et rapport PDF
- ‚úÖ **Test du mod√®le** final

---

### üìã Instructions:
1. **Runtime** ‚Üí **Change runtime type** ‚Üí **GPU (T4)**
2. **Runtime** ‚Üí **Run all**
3. Autorise l'acc√®s √† Google Drive
4. Attends ~30-45 minutes
5. R√©cup√®re les r√©sultats dans Drive!

---

**Date:** December 6, 2025  
**Version:** 2.0 - Production Ready

## üîß √âtape 1: Configuration Initiale et V√©rification GPU

In [None]:
import os
import sys
from pathlib import Path

# V√©rification GPU
print("=" * 70)
print("üîç V√âRIFICATION SYST√àME")
print("=" * 70)

# Check if running on Colab
try:
    import google.colab
    IN_COLAB = True
    print("‚úÖ Environnement: Google Colab")
except:
    IN_COLAB = False
    print("‚ö†Ô∏è  Environnement: Local")

# Check GPU
import torch
if torch.cuda.is_available():
    print(f"‚úÖ GPU D√©tect√©: {torch.cuda.get_device_name(0)}")
    print(f"   M√©moire GPU: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    device = torch.device('cuda')
else:
    print("‚ö†Ô∏è  GPU non disponible - Utilisation CPU")
    device = torch.device('cpu')

print(f"   Device: {device}")
print(f"   PyTorch Version: {torch.__version__}")
print("=" * 70)

## üìÅ √âtape 2: Montage Google Drive et Extraction du Projet

In [None]:
if IN_COLAB:
    # Monter Google Drive
    from google.colab import drive
    drive.mount('/content/drive')
    
    # D√©finir les chemins
    DRIVE_ROOT = Path('/content/drive/MyDrive/RL_Project')
    PROJECT_ZIP = DRIVE_ROOT / 'projet_RL.zip'
    WORK_DIR = Path('/content/projet_RL')
    RESULTS_DIR = DRIVE_ROOT / 'results'
    CHECKPOINTS_DIR = DRIVE_ROOT / 'checkpoints'
    
    print("\n‚úÖ Google Drive mont√© avec succ√®s!")
    print(f"   Drive Root: {DRIVE_ROOT}")
    
    # Cr√©er les dossiers de r√©sultats
    RESULTS_DIR.mkdir(parents=True, exist_ok=True)
    CHECKPOINTS_DIR.mkdir(parents=True, exist_ok=True)
    
    # Extraire le projet
    if PROJECT_ZIP.exists():
        print(f"\nüì¶ Extraction du projet depuis {PROJECT_ZIP}...")
        import zipfile
        with zipfile.ZipFile(PROJECT_ZIP, 'r') as zip_ref:
            zip_ref.extractall(WORK_DIR)
        print(f"‚úÖ Projet extrait dans: {WORK_DIR}")
    else:
        print(f"‚ùå ERREUR: {PROJECT_ZIP} n'existe pas!")
        print(f"   Upload 'projet_RL.zip' dans '{DRIVE_ROOT}'")
        sys.exit(1)
    
    # Changer vers le r√©pertoire du projet
    os.chdir(WORK_DIR)
    sys.path.insert(0, str(WORK_DIR))
    
else:
    # Mode local
    WORK_DIR = Path.cwd()
    RESULTS_DIR = WORK_DIR / 'results' / 'colab_training'
    CHECKPOINTS_DIR = WORK_DIR / 'checkpoints' / 'colab'
    RESULTS_DIR.mkdir(parents=True, exist_ok=True)
    CHECKPOINTS_DIR.mkdir(parents=True, exist_ok=True)

print(f"\nüìÇ R√©pertoire de travail: {WORK_DIR}")
print(f"üìä R√©sultats: {RESULTS_DIR}")
print(f"üíæ Checkpoints: {CHECKPOINTS_DIR}")

## üì¶ √âtape 3: Installation des D√©pendances

In [None]:
%%capture
# Installation silencieuse des d√©pendances

print("üì¶ Installation des d√©pendances...")

!pip install torch torchvision torchaudio --upgrade
!pip install gymnasium numpy matplotlib seaborn pandas
!pip install tensorboard optuna tqdm rich
!pip install pygame pillow opencv-python
!pip install plotly kaleido

# Installer le projet
!pip install -e .

print("‚úÖ Installation termin√©e!")

## üéØ √âtape 4: Configuration de l'Entra√Ænement Complet

In [None]:
import sys
import time
import json
from pathlib import Path
from datetime import datetime

# Imports PyTorch
import torch
import torch.nn as nn
import numpy as np

# Imports Gymnasium
import gymnasium as gym

# Imports pour l'environnement
from src.environment.navigation_env import NavigationEnv
from src.environment.obstacles import StaticObstacle, DynamicObstacle

# Imports pour l'agent
from src.agents.dqn_agent import DQNAgent

# Imports pour le training
from src.training.curriculum_learning import CurriculumLearningSystem
from src.training.experiment_tracker import UnifiedTracker

# Imports pour les utilitaires
from src.utils.replay_buffer import ReplayBuffer

# Imports pour la visualisation
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import clear_output, display

# Configuration matplotlib pour Colab
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("‚úÖ Tous les modules import√©s avec succ√®s!")
print(f"üìç R√©pertoire de travail: {Path.cwd()}")

## ‚öôÔ∏è √âtape 5: Hyperparam√®tres et Configuration

In [None]:
# ==================== CONFIGURATION DE L'ENTRA√éNEMENT ====================

CONFIG = {
    # üéÆ Param√®tres d'entra√Ænement
    'num_episodes': 2000,              # Nombre total d'√©pisodes
    'max_steps_per_episode': 1000,     # Nombre max de pas par √©pisode
    'target_update_frequency': 10,     # Fr√©quence de mise √† jour du r√©seau cible
    
    # üß† Architecture du r√©seau
    'hidden_dims': [512, 512],         # Dimensions des couches cach√©es
    'learning_rate': 5e-4,             # Taux d'apprentissage
    'gamma': 0.99,                     # Facteur de discount
    
    # üíæ Replay Buffer
    'buffer_capacity': 100000,         # Capacit√© du buffer
    'batch_size': 128,                 # Taille du batch
    
    # üéØ Epsilon (exploration)
    'epsilon_start': 1.0,              # Epsilon initial
    'epsilon_end': 0.01,               # Epsilon final
    'epsilon_decay': 0.995,            # D√©croissance epsilon
    
    # üó∫Ô∏è Environnement
    'env_width': 800,                  # Largeur de l'environnement
    'env_height': 600,                 # Hauteur de l'environnement
    'num_obstacles': 7,                # Nombre d'obstacles
    
    # üíæ Sauvegarde
    'checkpoint_interval': 100,        # Sauvegarder tous les N √©pisodes
    'progress_interval': 25,           # Afficher progr√®s tous les N √©pisodes
    
    # ‚è±Ô∏è Gestion du timeout Colab (12h max)
    'max_runtime_hours': 11.5,         # Buffer de 30 min avant timeout
    'auto_save_interval': 50,          # Sauvegarde auto tous les N √©pisodes
}

# Chemins pour la sauvegarde
if IN_COLAB:
    CHECKPOINT_PATH = Path(CHECKPOINTS_DIR) / "checkpoint_colab.pt"
    BEST_MODEL_PATH = Path(CHECKPOINTS_DIR) / "best_model_colab.pt"
    TENSORBOARD_DIR = Path(WORK_DIR) / "runs" / "colab_training"
else:
    CHECKPOINT_PATH = Path("checkpoints") / "checkpoint_local.pt"
    BEST_MODEL_PATH = Path("checkpoints") / "best_model_local.pt"
    TENSORBOARD_DIR = Path("runs") / "local_training"

# Cr√©er les dossiers si n√©cessaire
CHECKPOINT_PATH.parent.mkdir(parents=True, exist_ok=True)
TENSORBOARD_DIR.mkdir(parents=True, exist_ok=True)

print("‚öôÔ∏è Configuration:")
print(f"   üìä √âpisodes: {CONFIG['num_episodes']}")
print(f"   üß† Architecture: {CONFIG['hidden_dims']}")
print(f"   üìà Learning rate: {CONFIG['learning_rate']}")
print(f"   üíæ Buffer size: {CONFIG['buffer_capacity']}")
print(f"   üéØ Batch size: {CONFIG['batch_size']}")
print(f"   üó∫Ô∏è  Environnement: {CONFIG['env_width']}x{CONFIG['env_height']}")
print(f"   üöß Obstacles: {CONFIG['num_obstacles']}")
print(f"   üíæ Checkpoint tous les {CONFIG['checkpoint_interval']} √©pisodes")
print(f"   ‚è±Ô∏è  Runtime max: {CONFIG['max_runtime_hours']}h")
print(f"\nüìÅ Chemins:")
print(f"   Checkpoint: {CHECKPOINT_PATH}")
print(f"   Meilleur mod√®le: {BEST_MODEL_PATH}")
print(f"   TensorBoard: {TENSORBOARD_DIR}")

## üèóÔ∏è √âtape 6: Initialisation de l'Environnement et de l'Agent

In [None]:
# ==================== CR√âATION DES OBSTACLES ====================
print("üöß Cr√©ation des obstacles...")

obstacles = []
np.random.seed(42)  # Pour la reproductibilit√©

for i in range(CONFIG['num_obstacles']):
    x = np.random.randint(100, CONFIG['env_width'] - 100)
    y = np.random.randint(100, CONFIG['env_height'] - 100)
    width = np.random.randint(40, 80)
    height = np.random.randint(40, 80)
    
    obstacle = StaticObstacle(
        x=x, y=y,
        width=width, height=height,
        color=(100, 100, 100)
    )
    obstacles.append(obstacle)

print(f"‚úÖ {len(obstacles)} obstacles cr√©√©s")

# ==================== CR√âATION DE L'ENVIRONNEMENT ====================
print("\nüó∫Ô∏è  Cr√©ation de l'environnement...")

env = NavigationEnv(
    width=CONFIG['env_width'],
    height=CONFIG['env_height'],
    obstacles=obstacles
)

state_dim = env.observation_space.shape[0]
action_dim = env.action_space.n

print(f"‚úÖ Environnement cr√©√©")
print(f"   üìä Dimension √©tat: {state_dim}")
print(f"   üéÆ Nombre d'actions: {action_dim}")

# ==================== CR√âATION DE L'AGENT ====================
print("\nü§ñ Cr√©ation de l'agent DQN...")

agent = DQNAgent(
    state_dim=state_dim,
    action_dim=action_dim,
    hidden_dims=CONFIG['hidden_dims'],
    learning_rate=CONFIG['learning_rate'],
    gamma=CONFIG['gamma']
)

# D√©placer l'agent sur le device (GPU si disponible)
agent.q_network.to(device)
agent.target_network.to(device)

print(f"‚úÖ Agent cr√©√© et d√©plac√© sur {device}")
print(f"   üß† Architecture: {CONFIG['hidden_dims']}")
print(f"   üìà Learning rate: {CONFIG['learning_rate']}")

# ==================== CR√âATION DU REPLAY BUFFER ====================
print("\nüíæ Cr√©ation du Replay Buffer...")

replay_buffer = ReplayBuffer(capacity=CONFIG['buffer_capacity'])

print(f"‚úÖ Replay Buffer cr√©√© (capacit√©: {CONFIG['buffer_capacity']:,})")

# ==================== SYST√àME D'APPRENTISSAGE PAR CURRICULUM ====================
print("\nüìö Initialisation du Curriculum Learning...")

curriculum = CurriculumLearningSystem(
    num_episodes=CONFIG['num_episodes'],
    stages=None  # Utilise les stages par d√©faut
)

print(f"‚úÖ Curriculum Learning initialis√©")
print(f"   üìä Nombre de stages: {len(curriculum.stages)}")

# ==================== TRACKER D'EXP√âRIENCES ====================
print("\nüìä Initialisation du tracker...")

tracker = UnifiedTracker(
    project_name="robot-navigation-colab",
    experiment_name=f"training_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
    config=CONFIG,
    log_dir=str(TENSORBOARD_DIR),
    use_tensorboard=True,
    use_wandb=False,  # D√©sactiv√© pour Colab
    use_mlflow=False  # D√©sactiv√© pour Colab
)

print(f"‚úÖ Tracker initialis√© (TensorBoard)")
print(f"   üìÅ Log dir: {TENSORBOARD_DIR}")

print("\n" + "="*60)
print("üöÄ TOUT EST PR√äT POUR L'ENTRA√éNEMENT!")
print("="*60)

## üéØ √âtape 7: Boucle d'Entra√Ænement Principale (2000 √âpisodes)

In [None]:
# ==================== FONCTIONS UTILITAIRES ====================

def save_checkpoint(episode, agent, optimizer, best_reward, epsilon, curriculum, filename):
    """Sauvegarde un checkpoint complet"""
    checkpoint = {
        'episode': episode,
        'q_network_state_dict': agent.q_network.state_dict(),
        'target_network_state_dict': agent.target_network.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'best_reward': best_reward,
        'epsilon': epsilon,
        'curriculum_stage': curriculum.current_stage_idx,
        'config': CONFIG
    }
    torch.save(checkpoint, filename)
    print(f"üíæ Checkpoint sauvegard√©: {filename}")

def load_checkpoint(filename, agent, optimizer):
    """Charge un checkpoint"""
    if not Path(filename).exists():
        return 0, float('-inf'), CONFIG['epsilon_start'], 0
    
    checkpoint = torch.load(filename, map_location=device)
    agent.q_network.load_state_dict(checkpoint['q_network_state_dict'])
    agent.target_network.load_state_dict(checkpoint['target_network_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    
    print(f"üìÇ Checkpoint charg√©: {filename}")
    print(f"   Episode: {checkpoint['episode']}")
    print(f"   Best reward: {checkpoint['best_reward']:.2f}")
    
    return (checkpoint['episode'], 
            checkpoint['best_reward'], 
            checkpoint['epsilon'],
            checkpoint['curriculum_stage'])

# ==================== TENTATIVE DE CHARGEMENT D'UN CHECKPOINT ====================
print("üîç Recherche d'un checkpoint existant...")

start_episode = 0
best_reward = float('-inf')
epsilon = CONFIG['epsilon_start']
start_stage = 0

if CHECKPOINT_PATH.exists():
    try:
        start_episode, best_reward, epsilon, start_stage = load_checkpoint(
            CHECKPOINT_PATH, agent, agent.optimizer
        )
        curriculum.current_stage_idx = start_stage
        print(f"‚úÖ Reprise depuis l'√©pisode {start_episode}")
    except Exception as e:
        print(f"‚ö†Ô∏è  Erreur lors du chargement: {e}")
        print("üîÑ D√©marrage d'un nouvel entra√Ænement")
else:
    print("üìù Pas de checkpoint trouv√©, d√©marrage d'un nouvel entra√Ænement")

# ==================== VARIABLES DE SUIVI ====================
episode_rewards = []
episode_lengths = []
episode_losses = []
episode_successes = []
training_start_time = time.time()

print("\n" + "="*60)
print(f"üöÄ D√âBUT DE L'ENTRA√éNEMENT - {CONFIG['num_episodes']} √âPISODES")
print("="*60)

# ==================== BOUCLE D'ENTRA√éNEMENT ====================
for episode in range(start_episode, CONFIG['num_episodes']):
    # V√©rification du timeout Colab
    elapsed_hours = (time.time() - training_start_time) / 3600
    if elapsed_hours >= CONFIG['max_runtime_hours']:
        print(f"\n‚è±Ô∏è  Timeout approch√© ({elapsed_hours:.2f}h)")
        print("üíæ Sauvegarde automatique avant arr√™t...")
        save_checkpoint(episode, agent, agent.optimizer, best_reward, 
                       epsilon, curriculum, CHECKPOINT_PATH)
        print("‚úÖ Sauvegarde termin√©e. Vous pouvez relancer le notebook.")
        break
    
    # Obtenir la configuration du curriculum
    curriculum.update_stage(episode)
    stage_config = curriculum.get_current_env_config()
    current_epsilon = curriculum.get_current_epsilon(epsilon)
    
    # Reset de l'environnement
    state, _ = env.reset()
    episode_reward = 0
    episode_loss = 0
    num_updates = 0
    done = False
    steps = 0
    
    # √âpisode
    while not done and steps < CONFIG['max_steps_per_episode']:
        # S√©lection de l'action
        action = agent.select_action(state, current_epsilon)
        
        # Ex√©cution de l'action
        next_state, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        
        # Stockage dans le replay buffer
        replay_buffer.add(state, action, reward, next_state, done)
        
        # Entra√Ænement si le buffer a assez d'exp√©riences
        if len(replay_buffer.buffer) >= CONFIG['batch_size']:
            batch = replay_buffer.sample(CONFIG['batch_size'])
            loss = agent.train_step(batch)
            episode_loss += loss
            num_updates += 1
        
        state = next_state
        episode_reward += reward
        steps += 1
    
    # Mise √† jour du r√©seau cible
    if (episode + 1) % CONFIG['target_update_frequency'] == 0:
        agent.update_target_network()
    
    # D√©croissance d'epsilon
    epsilon = max(CONFIG['epsilon_end'], epsilon * CONFIG['epsilon_decay'])
    
    # Enregistrement des m√©triques
    episode_rewards.append(episode_reward)
    episode_lengths.append(steps)
    avg_loss = episode_loss / max(num_updates, 1)
    episode_losses.append(avg_loss)
    success = info.get('success', False)
    episode_successes.append(1 if success else 0)
    
    # Logging vers TensorBoard
    tracker.log_metrics({
        'reward': episode_reward,
        'episode_length': steps,
        'loss': avg_loss,
        'epsilon': current_epsilon,
        'success': 1 if success else 0,
        'curriculum_stage': curriculum.current_stage_idx,
        'buffer_size': len(replay_buffer.buffer)
    }, step=episode)
    
    # Sauvegarde du meilleur mod√®le
    if episode_reward > best_reward:
        best_reward = episode_reward
        torch.save(agent.q_network.state_dict(), BEST_MODEL_PATH)
    
    # Sauvegarde p√©riodique
    if (episode + 1) % CONFIG['checkpoint_interval'] == 0:
        save_checkpoint(episode + 1, agent, agent.optimizer, best_reward,
                       epsilon, curriculum, CHECKPOINT_PATH)
    
    # Affichage des progr√®s
    if (episode + 1) % CONFIG['progress_interval'] == 0:
        clear_output(wait=True)
        
        avg_reward_100 = np.mean(episode_rewards[-100:])
        avg_length_100 = np.mean(episode_lengths[-100:])
        success_rate_100 = np.mean(episode_successes[-100:]) * 100
        
        print(f"{'='*60}")
        print(f"üìä √âpisode {episode + 1}/{CONFIG['num_episodes']}")
        print(f"{'='*60}")
        print(f"   üí∞ Reward: {episode_reward:.2f}")
        print(f"   üìà Moyenne 100 derniers: {avg_reward_100:.2f}")
        print(f"   üéØ Meilleur reward: {best_reward:.2f}")
        print(f"   üìè Longueur: {steps} pas")
        print(f"   ‚úÖ Succ√®s: {'Oui' if success else 'Non'}")
        print(f"   üìä Taux de succ√®s (100): {success_rate_100:.1f}%")
        print(f"   üìâ Loss moyenne: {avg_loss:.4f}")
        print(f"   üé≤ Epsilon: {current_epsilon:.4f}")
        print(f"   üìö Stage: {curriculum.get_current_stage_name()}")
        print(f"   üíæ Buffer: {len(replay_buffer.buffer):,}/{CONFIG['buffer_capacity']:,}")
        print(f"   ‚è±Ô∏è  Temps: {elapsed_hours:.2f}h / {CONFIG['max_runtime_hours']}h")
        print(f"{'='*60}")
        
        # Mini graphique de progression
        if len(episode_rewards) >= 100:
            plt.figure(figsize=(12, 4))
            
            plt.subplot(1, 3, 1)
            plt.plot(episode_rewards[-100:])
            plt.title('Rewards (100 derniers)')
            plt.xlabel('√âpisode')
            plt.ylabel('Reward')
            plt.grid(True)
            
            plt.subplot(1, 3, 2)
            plt.plot(episode_losses[-100:])
            plt.title('Loss (100 derniers)')
            plt.xlabel('√âpisode')
            plt.ylabel('Loss')
            plt.grid(True)
            
            plt.subplot(1, 3, 3)
            success_window = np.convolve(episode_successes, np.ones(10)/10, mode='valid')
            plt.plot(success_window[-90:] * 100)
            plt.title('Taux de succ√®s (fen√™tre mobile 10)')
            plt.xlabel('√âpisode')
            plt.ylabel('Succ√®s (%)')
            plt.grid(True)
            
            plt.tight_layout()
            plt.show()

# Sauvegarde finale
print(f"\n{'='*60}")
print("üèÅ ENTRA√éNEMENT TERMIN√â!")
print(f"{'='*60}")
save_checkpoint(CONFIG['num_episodes'], agent, agent.optimizer, best_reward,
               epsilon, curriculum, CHECKPOINT_PATH)

tracker.finish()

print(f"\n‚úÖ Meilleur reward obtenu: {best_reward:.2f}")
print(f"‚úÖ Taux de succ√®s final: {np.mean(episode_successes[-100:]) * 100:.1f}%")
print(f"‚úÖ Dur√©e totale: {(time.time() - training_start_time) / 3600:.2f}h")

## üìä √âtape 8: Visualisation des R√©sultats d'Entra√Ænement

In [None]:
# ==================== VISUALISATION COMPL√àTE DES R√âSULTATS ====================

print("üìä G√©n√©ration des visualisations...")

fig = plt.figure(figsize=(16, 12))

# 1. Courbe des rewards
ax1 = plt.subplot(3, 2, 1)
plt.plot(episode_rewards, alpha=0.6, label='Reward par √©pisode')
# Moyenne mobile sur 50 √©pisodes
if len(episode_rewards) >= 50:
    ma_50 = np.convolve(episode_rewards, np.ones(50)/50, mode='valid')
    plt.plot(range(49, len(episode_rewards)), ma_50, 'r-', linewidth=2, label='Moyenne mobile 50')
plt.xlabel('√âpisode')
plt.ylabel('Reward')
plt.title('√âvolution des Rewards')
plt.legend()
plt.grid(True, alpha=0.3)

# 2. Courbe des losses
ax2 = plt.subplot(3, 2, 2)
plt.plot(episode_losses, alpha=0.6, label='Loss par √©pisode')
if len(episode_losses) >= 50:
    ma_loss_50 = np.convolve(episode_losses, np.ones(50)/50, mode='valid')
    plt.plot(range(49, len(episode_losses)), ma_loss_50, 'r-', linewidth=2, label='Moyenne mobile 50')
plt.xlabel('√âpisode')
plt.ylabel('Loss')
plt.title('√âvolution de la Loss')
plt.legend()
plt.grid(True, alpha=0.3)

# 3. Taux de succ√®s
ax3 = plt.subplot(3, 2, 3)
if len(episode_successes) >= 50:
    success_ma = np.convolve(episode_successes, np.ones(50)/50, mode='valid') * 100
    plt.plot(range(49, len(episode_successes)), success_ma, 'g-', linewidth=2)
plt.xlabel('√âpisode')
plt.ylabel('Taux de succ√®s (%)')
plt.title('Taux de Succ√®s (Moyenne mobile 50)')
plt.grid(True, alpha=0.3)

# 4. Longueur des √©pisodes
ax4 = plt.subplot(3, 2, 4)
plt.plot(episode_lengths, alpha=0.6)
if len(episode_lengths) >= 50:
    ma_len_50 = np.convolve(episode_lengths, np.ones(50)/50, mode='valid')
    plt.plot(range(49, len(episode_lengths)), ma_len_50, 'r-', linewidth=2)
plt.xlabel('√âpisode')
plt.ylabel('Nombre de pas')
plt.title('Longueur des √âpisodes')
plt.grid(True, alpha=0.3)

# 5. Distribution des rewards (100 derniers)
ax5 = plt.subplot(3, 2, 5)
plt.hist(episode_rewards[-100:], bins=30, alpha=0.7, color='skyblue', edgecolor='black')
plt.xlabel('Reward')
plt.ylabel('Fr√©quence')
plt.title('Distribution des Rewards (100 derniers √©pisodes)')
plt.grid(True, alpha=0.3)

# 6. Statistiques finales
ax6 = plt.subplot(3, 2, 6)
ax6.axis('off')
stats_text = f"""
STATISTIQUES FINALES

Nombre d'√©pisodes: {len(episode_rewards)}
Meilleur reward: {best_reward:.2f}

Reward moyen (tout): {np.mean(episode_rewards):.2f}
Reward moyen (100 derniers): {np.mean(episode_rewards[-100:]):.2f}

Taux succ√®s (tout): {np.mean(episode_successes)*100:.1f}%
Taux succ√®s (100 derniers): {np.mean(episode_successes[-100:])*100:.1f}%

Loss finale: {episode_losses[-1]:.4f}
Longueur moyenne: {np.mean(episode_lengths):.0f} pas

Buffer final: {len(replay_buffer.buffer):,}
Epsilon final: {epsilon:.4f}
Stage final: {curriculum.get_current_stage_name()}
"""
ax6.text(0.1, 0.5, stats_text, fontsize=12, family='monospace',
         verticalalignment='center')

plt.tight_layout()
plt.savefig(Path(RESULTS_DIR) / 'training_results.png', dpi=150, bbox_inches='tight')
print(f"‚úÖ Graphique sauvegard√©: {Path(RESULTS_DIR) / 'training_results.png'}")
plt.show()

# ==================== SAUVEGARDE DES DONN√âES ====================
print("\nüíæ Sauvegarde des donn√©es...")

# Sauvegarde CSV
import pandas as pd

df_results = pd.DataFrame({
    'episode': range(len(episode_rewards)),
    'reward': episode_rewards,
    'length': episode_lengths,
    'loss': episode_losses,
    'success': episode_successes
})

csv_path = Path(RESULTS_DIR) / 'training_history.csv'
df_results.to_csv(csv_path, index=False)
print(f"‚úÖ Historique sauvegard√©: {csv_path}")

# Sauvegarde JSON avec statistiques
results_summary = {
    'config': CONFIG,
    'training_info': {
        'num_episodes': len(episode_rewards),
        'best_reward': float(best_reward),
        'final_epsilon': float(epsilon),
        'final_stage': curriculum.get_current_stage_name(),
        'training_duration_hours': (time.time() - training_start_time) / 3600
    },
    'statistics': {
        'mean_reward_all': float(np.mean(episode_rewards)),
        'mean_reward_last_100': float(np.mean(episode_rewards[-100:])),
        'std_reward_last_100': float(np.std(episode_rewards[-100:])),
        'success_rate_all': float(np.mean(episode_successes)),
        'success_rate_last_100': float(np.mean(episode_successes[-100:])),
        'mean_episode_length': float(np.mean(episode_lengths)),
        'final_buffer_size': len(replay_buffer.buffer)
    }
}

json_path = Path(RESULTS_DIR) / 'training_summary.json'
with open(json_path, 'w') as f:
    json.dump(results_summary, f, indent=2)
print(f"‚úÖ R√©sum√© sauvegard√©: {json_path}")

print("\n" + "="*60)
print("‚úÖ TOUTES LES VISUALISATIONS ET DONN√âES SAUVEGARD√âES!")
print("="*60)

## üß™ √âtape 9: √âvaluation du Mod√®le Entra√Æn√©

In [None]:
# ==================== √âVALUATION SUR 50 √âPISODES DE TEST ====================

print("üß™ √âvaluation du meilleur mod√®le...")
print("="*60)

# Charger le meilleur mod√®le
if BEST_MODEL_PATH.exists():
    agent.q_network.load_state_dict(torch.load(BEST_MODEL_PATH, map_location=device))
    print(f"‚úÖ Meilleur mod√®le charg√©: {BEST_MODEL_PATH}")
else:
    print("‚ö†Ô∏è  Utilisation du mod√®le actuel (pas de meilleur mod√®le sauvegard√©)")

agent.q_network.eval()  # Mode √©valuation

num_test_episodes = 50
test_rewards = []
test_lengths = []
test_successes = []

print(f"\nüéÆ Lancement de {num_test_episodes} √©pisodes de test...")
print("(Sans exploration - epsilon=0)\n")

for test_ep in range(num_test_episodes):
    state, _ = env.reset()
    episode_reward = 0
    steps = 0
    done = False
    
    while not done and steps < CONFIG['max_steps_per_episode']:
        # Action sans exploration (epsilon=0)
        with torch.no_grad():
            state_tensor = torch.FloatTensor(state).unsqueeze(0).to(device)
            q_values = agent.q_network(state_tensor)
            action = q_values.argmax(dim=1).item()
        
        next_state, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        
        state = next_state
        episode_reward += reward
        steps += 1
    
    test_rewards.append(episode_reward)
    test_lengths.append(steps)
    success = info.get('success', False)
    test_successes.append(1 if success else 0)
    
    if (test_ep + 1) % 10 == 0:
        print(f"Episode {test_ep + 1}/{num_test_episodes}: "
              f"Reward={episode_reward:.2f}, "
              f"Longueur={steps}, "
              f"Succ√®s={'‚úÖ' if success else '‚ùå'}")

# ==================== STATISTIQUES D'√âVALUATION ====================
print("\n" + "="*60)
print("üìä R√âSULTATS DE L'√âVALUATION")
print("="*60)
print(f"Reward moyen: {np.mean(test_rewards):.2f} ¬± {np.std(test_rewards):.2f}")
print(f"Reward min: {np.min(test_rewards):.2f}")
print(f"Reward max: {np.max(test_rewards):.2f}")
print(f"Longueur moyenne: {np.mean(test_lengths):.1f} pas")
print(f"Taux de succ√®s: {np.mean(test_successes)*100:.1f}% ({sum(test_successes)}/{num_test_episodes})")
print("="*60)

# ==================== VISUALISATION DES R√âSULTATS DE TEST ====================
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Rewards de test
axes[0].bar(range(num_test_episodes), test_rewards, alpha=0.7, color='skyblue')
axes[0].axhline(y=np.mean(test_rewards), color='r', linestyle='--', label='Moyenne')
axes[0].set_xlabel('√âpisode de test')
axes[0].set_ylabel('Reward')
axes[0].set_title('Rewards de Test')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Distribution des rewards
axes[1].hist(test_rewards, bins=20, alpha=0.7, color='lightgreen', edgecolor='black')
axes[1].axvline(x=np.mean(test_rewards), color='r', linestyle='--', linewidth=2, label='Moyenne')
axes[1].set_xlabel('Reward')
axes[1].set_ylabel('Fr√©quence')
axes[1].set_title('Distribution des Rewards de Test')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Succ√®s/√âchecs
success_count = sum(test_successes)
fail_count = num_test_episodes - success_count
axes[2].pie([success_count, fail_count], 
           labels=['Succ√®s', '√âchecs'],
           autopct='%1.1f%%',
           colors=['lightgreen', 'lightcoral'],
           startangle=90)
axes[2].set_title('Taux de Succ√®s')

plt.tight_layout()
plt.savefig(Path(RESULTS_DIR) / 'evaluation_results.png', dpi=150, bbox_inches='tight')
print(f"\n‚úÖ Visualisation sauvegard√©e: {Path(RESULTS_DIR) / 'evaluation_results.png'}")
plt.show()

# Sauvegarde des r√©sultats de test
test_results = {
    'num_test_episodes': num_test_episodes,
    'mean_reward': float(np.mean(test_rewards)),
    'std_reward': float(np.std(test_rewards)),
    'min_reward': float(np.min(test_rewards)),
    'max_reward': float(np.max(test_rewards)),
    'mean_length': float(np.mean(test_lengths)),
    'success_rate': float(np.mean(test_successes)),
    'num_successes': int(sum(test_successes))
}

test_results_path = Path(RESULTS_DIR) / 'evaluation_results.json'
with open(test_results_path, 'w') as f:
    json.dump(test_results, f, indent=2)
print(f"‚úÖ R√©sultats de test sauvegard√©s: {test_results_path}")

agent.q_network.train()  # Retour en mode entra√Ænement

## üìà √âtape 10: Lancement de TensorBoard (Optionnel)

In [None]:
# ==================== LANCEMENT DE TENSORBOARD ====================

if IN_COLAB:
    print("üìà Lancement de TensorBoard dans Colab...")
    
    # Charger l'extension TensorBoard
    %load_ext tensorboard
    
    # Lancer TensorBoard
    %tensorboard --logdir {TENSORBOARD_DIR}
    
    print(f"‚úÖ TensorBoard lanc√©!")
    print(f"üìÅ Log directory: {TENSORBOARD_DIR}")
    print("\n‚ÑπÔ∏è  TensorBoard s'affiche ci-dessus.")
    print("    Vous pouvez explorer les m√©triques, les graphiques et les scalaires.")
else:
    print("üíª Mode local d√©tect√©")
    print(f"Pour lancer TensorBoard, ex√©cutez dans un terminal:")
    print(f"   tensorboard --logdir={TENSORBOARD_DIR}")
    print(f"Puis ouvrez: http://localhost:6006")

## üìã R√©sum√© Final et Prochaines √âtapes

### ‚úÖ Ce qui a √©t√© accompli:
1. ‚úÖ Configuration GPU et montage Google Drive
2. ‚úÖ Installation des d√©pendances
3. ‚úÖ Entra√Ænement complet de 2000 √©pisodes avec curriculum learning
4. ‚úÖ Sauvegarde automatique des checkpoints tous les 100 √©pisodes
5. ‚úÖ G√©n√©ration des visualisations (courbes, statistiques)
6. ‚úÖ √âvaluation du mod√®le sur 50 √©pisodes de test
7. ‚úÖ Export des r√©sultats (CSV, JSON, graphiques)

### üìÅ Fichiers g√©n√©r√©s:
- **Checkpoints:**
  - `checkpoint_colab.pt` - Dernier checkpoint (peut reprendre l'entra√Ænement)
  - `best_model_colab.pt` - Meilleur mod√®le bas√© sur le reward
  
- **R√©sultats:**
  - `training_history.csv` - Historique complet de l'entra√Ænement
  - `training_summary.json` - Statistiques r√©sum√©es
  - `evaluation_results.json` - R√©sultats des tests
  - `training_results.png` - Graphiques d'entra√Ænement
  - `evaluation_results.png` - Graphiques d'√©valuation

### üöÄ Prochaines √©tapes possibles:

#### 1. **Am√©liorer les performances:**
   - Augmenter le nombre d'√©pisodes (3000-5000)
   - Ajuster les hyperparam√®tres (learning rate, architecture)
   - Modifier la fonction de r√©compense
   - Ajouter plus d'obstacles ou changer leur configuration

#### 2. **Exp√©rimenter avec d'autres algorithmes:**
   - Essayer PPO, A3C, ou SAC
   - Impl√©menter le Double DQN ou Dueling DQN
   - Tester Rainbow DQN

#### 3. **Analyser en d√©tail:**
   - Visualiser les trajectoires de l'agent
   - Analyser les Q-values
   - Cr√©er des heatmaps de la politique apprise
   - G√©n√©rer des vid√©os de l'agent en action

#### 4. **D√©ployer le mod√®le:**
   - Cr√©er une interface interactive
   - D√©ployer sur un serveur web
   - Optimiser pour l'inf√©rence (quantization, pruning)

### üí° Pour relancer l'entra√Ænement:
Si l'entra√Ænement s'interrompt (timeout Colab), **il suffit de relancer toutes les cellules**. Le syst√®me d√©tectera automatiquement le checkpoint et reprendra l√† o√π il s'√©tait arr√™t√©!

### üìä Pour acc√©der aux r√©sultats:
Les r√©sultats sont automatiquement sauvegard√©s dans votre Google Drive:
- `/content/drive/MyDrive/RL_Project/results/`
- `/content/drive/MyDrive/RL_Project/checkpoints/`

---

**üéâ F√âLICITATIONS! Vous avez entra√Æn√© avec succ√®s un agent DQN sur Google Colab! üéâ**