# Algorithm Comparison for Next-Action Prediction (with Hyperparameter Tuning)

This notebook provides a comprehensive comparison of different algorithms for next-action prediction in the risky navigation environment, following machine learning best practices with proper hyperparameter tuning:

**Algorithms Evaluated:**
- **A2C (Advantage Actor-Critic)**: Reinforcement learning approach
- **AutoEncoder**: Neural network encoder-decoder architecture  
- **Bayesian**: Bayesian neural network with uncertainty quantification
- **Transformer**: Self-attention based model
- **Linear**: Simple linear regression baseline
- **VAE**: Variational AutoEncoder with probabilistic latent representations

**Best Practices Workflow:**
1. **Data Collection & Preparation**: Collect training data and split into train/validation sets
2. **Hyperparameter Tuning**: For each algorithm, perform cross-validation hyperparameter search
3. **Model Training**: Train each algorithm with optimal hyperparameters found
4. **Evaluation**: Test final models on unseen data and compare performance
5. **Analysis**: Comprehensive results analysis with visualizations

This approach ensures fair comparison by optimizing each algorithm's hyperparameters before evaluation.

## Import Required Libraries

In [None]:
!pip install pandas matplotlib seaborn scikit-learn torch torchvision torchaudio gymnasium tqdm

In [None]:
#  if on runpod
!rm -rf risky_navigation
!git clone https://github.com/mosmith3asu/risky_navigation.git
!cd risky_navigation

In [None]:
# Restart the kernel to reload updated modules
import importlib
import sys

# Clear the module cache for AutoEncoder
modules_to_reload = [
    'src.algorithms.AutoEncoder.agent',
    'src.algorithms.Bayesian.agent', 
    'src.algorithms.Transformer.agent',
    'src.algorithms.Linear.agent',
    'src.algorithms.VAE.agent'
]

for module_name in modules_to_reload:
    if module_name in sys.modules:
        importlib.reload(sys.modules[module_name])

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch

import time
import os
import warnings
from tqdm import tqdm
from datetime import datetime
from sklearn.model_selection import ParameterGrid
import random
from itertools import product

# print current path
print(os.path.abspath('.'))

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
warnings.filterwarnings('ignore')



In [None]:
# Import environment and algorithms
import sys
sys.path.append('/risky_navigation')

from src.env.continuous_nav_env import ContinuousNavigationEnv
from src.algorithms.AutoEncoder.agent import AutoEncoderAgent
from src.algorithms.Bayesian.agent import BayesianAgent
from src.algorithms.Transformer.agent import TransformerAgent
from src.algorithms.Linear.agent import LinearAgent
from src.algorithms.VAE.agent import VAEAgent
from src.utils.file_management import save_pickle, load_pickle
from src.utils.logger import Logger

print("All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"Device available: {'CUDA' if torch.cuda.is_available() else 'CPU'}")

# Test AutoEncoder architecture to verify fix
print("\nTesting AutoEncoder architecture:")
test_model = AutoEncoderAgent(state_dim=4, action_dim=2, goal_dim=2, 
                             latent_dim=32, hidden_dims=[128, 64])
print("AutoEncoder architecture verification successful!")

## Config

In [None]:
# Configuration for experiments - Optimized for RTX 4090 (24GB VRAM)
CONFIG = {
    'num_episodes': 1000,          # Episodes for data collection
    'max_steps': 200,             # Max steps per episode
    'batch_size': 256,            # Increased batch size for RTX 4090 (was 128)
    'num_epochs': 300,            # Training epochs (reduced from 700 for efficiency)
    'val_ratio': 0.2,             # Validation set ratio
    'num_test_episodes': 50,      # Episodes for testing
    'lr': 1e-3,                   # Learning rate
    'device': torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
    'search_method': 'random',    # 'grid' or 'random' - RANDOM is much faster!
    'n_trials': 25,               # For random search (much faster than 972 grid combinations)
    'cv_folds': 3,                # Cross-validation folds
    'patience': 5,                # Early stopping patience
    'hp_epochs': 30,              # Epochs for hyperparameter search (reduced from 50)
}

# PyTorch optimizations for RTX 4090
if torch.cuda.is_available():
    torch.backends.cudnn.benchmark = True  # Auto-tune convolution algorithms
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
    torch.set_float32_matmul_precision('high')  # Use TF32 for faster matmul
    print("✓ GPU optimizations enabled for RTX 4090")
    print(f"  - CUDA Version: {torch.version.cuda}")
    print(f"  - cuDNN Benchmark: Enabled")
    print(f"  - TF32 Acceleration: Enabled")

# Algorithms to compare
ALGORITHMS = ['AutoEncoder', 'Bayesian', 'Transformer', 'Linear', 'VAE']

# Initialize environment
env = ContinuousNavigationEnv()
dummy_state = env.reset()

# Get dimensions
STATE_DIM = dummy_state.shape[0]
ACTION_DIM = env.action_space.shape[0]
GOAL_DIM = env.goal.shape[0] if hasattr(env, 'goal') else 2

print(f"\nEnvironment dimensions:")
print(f"  State dimension: {STATE_DIM}")
print(f"  Action dimension: {ACTION_DIM}")
print(f"  Goal dimension: {GOAL_DIM}")
print(f"  Device: {CONFIG['device']}")
print(f"  Batch size: {CONFIG['batch_size']}")

# Results storage - updated to include hyperparameter information
results = {
    'algorithm': [],
    'train_time': [],
    'test_time': [],
    'final_train_loss': [],
    'final_val_loss': [],
    'avg_mse': [],
    'avg_reward': [],
    'success_rate': [],
    'model_params': [],
    'best_hyperparams': [],        # New field for best hyperparameters
    'hp_search_cv_score': []       # New field for CV score from hyperparameter search
}



# Define hyperparameter search spaces for each algorithm
HYPERPARAMETER_SPACES = {
    'AutoEncoder': {
        'latent_dim': [32, 64, 128],
        'hidden_dims': [[], [64], [128], [64, 32], [128, 64], [256, 128]],  # Intermediate layers only
        'lr': [1e-4, 1e-3, 1e-2],
        'dropout': [0.0, 0.1, 0.2],
        'activation': ['ReLU', 'ELU', 'GELU'],
        'batch_norm': [True, False]
    },
    'Bayesian': {
        'latent_dim': [32, 64, 128],  # Fixed: BayesianAgent uses latent_dim, not hidden_dim
        'lr': [1e-4, 1e-3, 1e-2],
        'kl_weight': [0.001, 0.01, 0.1],  # Fixed: Bayesian specific parameter
        'prior_std': [0.5, 1.0, 2.0]  # Fixed: prior_variance -> prior_std
    },
    'Transformer': {
        'd_model': [32, 64, 128],
        'nhead': [2, 4, 8],
        'num_layers': [1, 2, 3],
        'dropout': [0.0, 0.1, 0.2],
        'lr': [1e-4, 1e-3, 1e-2]
    },
    'Linear': {
        'lr': [1e-4, 1e-3, 1e-2],
        'weight_decay': [0.0, 1e-5, 1e-4]
    },
    'VAE': {
        'latent_dim': [16, 32, 64],
        'hidden_dim': [64, 128, 256],
        'lr': [1e-4, 1e-3, 1e-2],
        'beta': [0.5, 1.0, 2.0]
    }
}

print("\nHyperparameter tuning configuration:")
print(f"  Search method: {CONFIG['search_method']}")
if CONFIG['search_method'] == 'random':
    print(f"  Random trials per algorithm: {CONFIG['n_trials']}")
    print(f"  ✓ This is ~{972 // CONFIG['n_trials']}x faster than grid search!")
else:
    total_combos = len(list(ParameterGrid(HYPERPARAMETER_SPACES['AutoEncoder'])))
    print(f"  WARNING: Grid search will test {total_combos} combinations!")
print(f"  CV folds: {CONFIG['cv_folds']}")
print(f"  HP search epochs: {CONFIG['hp_epochs']} (with early stopping)")
print(f"  Early stopping patience: {CONFIG['patience']}")
print(f"  Algorithms configured: {list(HYPERPARAMETER_SPACES.keys())}")


In [None]:
# Add this cell RIGHT AFTER the "Config" section (after the CONFIG dictionary cell)

# ============================================================
# GPU OPTIMIZATION SETTINGS FOR RTX 4090
# ============================================================

if torch.cuda.is_available():
    print("\n" + "="*60)
    print("CONFIGURING GPU OPTIMIZATIONS FOR RTX 4090")
    print("="*60)
    
    # Enable cuDNN auto-tuner for optimal convolution algorithms
    torch.backends.cudnn.benchmark = True
    print("✓ cuDNN benchmark mode enabled")
    
    # Use TensorFloat32 (TF32) for faster matrix multiplication on Ampere GPUs
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
    print("✓ TF32 enabled for matrix operations")
    
    # Set matmul precision for better performance
    torch.set_float32_matmul_precision('high')
    print("✓ Float32 matmul precision set to 'high'")
    
    # Enable memory efficient attention if available
    try:
        torch.backends.cuda.enable_mem_efficient_sdp(True)
        print("✓ Memory efficient scaled dot product enabled")
    except:
        print("⚠ Memory efficient SDP not available (PyTorch < 2.0)")
    
    # Pre-allocate GPU memory for better performance
    torch.cuda.empty_cache()
    print("✓ GPU cache cleared")
    
    print("="*60 + "\n")
    
    # Display current GPU status
    check_gpu_status()
else:
    print("WARNING: CUDA not available. Running on CPU will be very slow!")

# ============================================================
# UPDATE CONFIG WITH OPTIMIZED BATCH SIZE FOR RTX 4090
# ============================================================

# Update batch size to maximize GPU utilization
CONFIG['batch_size'] = 256  # Increased from 128 to utilize RTX 4090's 24GB VRAM

print(f"\n✓ Batch size optimized for RTX 4090: {CONFIG['batch_size']}")
print(f"✓ Expected GPU memory usage: ~8-12GB out of 24GB available")
print(f"✓ This should increase GPU utilization from 20% to 80-95%\n")

## Hyperparameter Tuning Utility Functions

In [None]:
import copy

def create_k_fold_splits_rl(states, actions, rewards, next_states, dones, goals, k=3):
    """Create k-fold cross-validation splits for RL data."""
    n_samples = len(states)
    indices = np.random.permutation(n_samples)
    fold_size = n_samples // k
    
    folds = []
    for i in range(k):
        start_idx = i * fold_size
        end_idx = start_idx + fold_size if i < k-1 else n_samples
        
        val_indices = indices[start_idx:end_idx]
        train_indices = np.concatenate([indices[:start_idx], indices[end_idx:]])
        
        train_fold = (states[train_indices], actions[train_indices], rewards[train_indices],
                      next_states[train_indices], dones[train_indices], goals[train_indices])
        val_fold = (states[val_indices], actions[val_indices], rewards[val_indices],
                    next_states[val_indices], dones[val_indices], goals[val_indices])
        
        folds.append((train_fold, val_fold))
    
    return folds

def generate_hyperparameter_combinations(algorithm_name, search_method='random', n_trials=10):
    """Generate hyperparameter combinations for tuning."""
    param_space = HYPERPARAMETER_SPACES[algorithm_name]
    
    if search_method == 'grid':
        # Grid search - all combinations
        combinations = list(ParameterGrid(param_space))
    else:
        # Random search - sample n_trials combinations
        combinations = []
        keys = list(param_space.keys())
        
        for _ in range(n_trials):
            combination = {}
            for key in keys:
                combination[key] = random.choice(param_space[key])
            combinations.append(combination)
    
    return combinations

def hyperparameter_search_rl(algorithm_name, states, actions, rewards, next_states, dones, goals, 
                             state_dim, action_dim, goal_dim, device):
    """
    Perform hyperparameter search with cross-validation for RL/behavioral cloning.
    Returns: (best_hyperparams, best_score, all_results)
    """
    print(f"\nStarting hyperparameter search for {algorithm_name}...")
    print(f"Search method: {CONFIG['search_method']}")
    
    # Generate hyperparameter combinations
    combinations = generate_hyperparameter_combinations(
        algorithm_name, CONFIG['search_method'], CONFIG['n_trials']
    )
    print(f"Testing {len(combinations)} hyperparameter combinations")
    
    # Create k-fold splits
    folds = create_k_fold_splits_rl(states, actions, rewards, next_states, dones, goals, CONFIG['cv_folds'])
    
    best_score = float('inf')
    best_hyperparams = None
    all_results = []
    
    for idx, hyperparams in enumerate(combinations):
        print(f"\n  [{idx+1}/{len(combinations)}] Testing: {hyperparams}")
        
        # Cross-validation scores for this combination
        cv_scores = []
        
        for fold_idx, (train_fold, val_fold) in enumerate(folds):
            # Create agent with current hyperparameters
            agent = create_agent_with_hyperparams_rl(
                algorithm_name, hyperparams, state_dim, action_dim, goal_dim, device
            )
            
            if agent is None:
                print(f"    Fold {fold_idx+1}: Failed to create agent, skipping...")
                break
            
            # Train with early stopping
            try:
                train_states, train_actions, train_rewards, train_next_states, train_dones, train_goals = train_fold
                val_states, val_actions, val_rewards, val_next_states, val_dones, val_goals = val_fold
                
                best_val_loss = float('inf')
                patience_counter = 0
                
                # Train for limited epochs with early stopping
                for epoch in range(CONFIG['hp_epochs']):
                    # Train one epoch
                    n_samples = len(train_states)
                    num_batches = n_samples // CONFIG['batch_size']
                    epoch_loss = 0.0
                    
                    indices = np.random.permutation(n_samples)
                    for batch_num in range(num_batches):
                        start_idx = batch_num * CONFIG['batch_size']
                        end_idx = min(start_idx + CONFIG['batch_size'], n_samples)
                        batch_indices = indices[start_idx:end_idx]
                        
                        batch_states_t = torch.tensor(train_states[batch_indices], dtype=torch.float32, device=device)
                        batch_actions_t = torch.tensor(train_actions[batch_indices], dtype=torch.float32, device=device)
                        batch_goals_t = torch.tensor(train_goals[batch_indices], dtype=torch.float32, device=device)
                        
                        loss_result = agent.train_step(batch_states_t, batch_actions_t, batch_goals_t, batch_actions_t)
                        if isinstance(loss_result, dict):
                            loss = loss_result['loss']
                        else:
                            loss = loss_result
                        epoch_loss += loss
                    
                    # Compute validation loss
                    val_states_t = torch.tensor(val_states, dtype=torch.float32, device=device)
                    val_actions_t = torch.tensor(val_actions, dtype=torch.float32, device=device)
                    val_goals_t = torch.tensor(val_goals, dtype=torch.float32, device=device)
                    val_loss = compute_validation_loss_rl(agent, val_states_t, val_actions_t, val_goals_t)
                    
                    # Early stopping check
                    if val_loss < best_val_loss:
                        best_val_loss = val_loss
                        patience_counter = 0
                    else:
                        patience_counter += 1
                    
                    if patience_counter >= CONFIG['patience']:
                        break
                
                cv_scores.append(best_val_loss)
                print(f"    Fold {fold_idx+1}/{CONFIG['cv_folds']}: Val Loss = {best_val_loss:.6f} (stopped at epoch {epoch+1})")
                
                # Clear GPU cache after each fold
                if torch.cuda.is_available():
                    del agent
                    torch.cuda.empty_cache()
                    
            except Exception as e:
                print(f"    Fold {fold_idx+1} failed: {str(e)}")
                break
        
        # Calculate average CV score
        if len(cv_scores) == CONFIG['cv_folds']:
            avg_score = np.mean(cv_scores)
            std_score = np.std(cv_scores)
            
            all_results.append({
                'hyperparams': hyperparams,
                'avg_score': avg_score,
                'std_score': std_score,
                'cv_scores': cv_scores
            })
            
            print(f"    Average CV Score: {avg_score:.6f} ± {std_score:.6f}", end='')
            
            # Update best hyperparameters
            if avg_score < best_score:
                best_score = avg_score
                best_hyperparams = hyperparams.copy()
                print(" ✓ NEW BEST!")
            else:
                print()
    
    print(f"\n{algorithm_name} Hyperparameter Search Complete!")
    print(f"Best CV Score: {best_score:.6f}")
    print(f"Best Hyperparameters: {best_hyperparams}")
    
    return best_hyperparams, best_score, all_results

def add_predict_action_method(agent):
    """Add predict_action method for policy execution."""
    if not hasattr(agent, 'predict_action'):
        def predict_action(state, goal):
            """Predict action given state and goal."""
            agent.model.eval()
            with torch.no_grad():
                if isinstance(state, np.ndarray):
                    state = torch.tensor(state, dtype=torch.float32, device=agent.device).unsqueeze(0)
                if isinstance(goal, np.ndarray):
                    goal = torch.tensor(goal, dtype=torch.float32, device=agent.device).unsqueeze(0)
                
                # Concatenate state and goal
                inputs = torch.cat([state, goal], dim=-1)
                prediction = agent.model(inputs)
                
                return prediction.squeeze().cpu().numpy()
        
        agent.predict_action = predict_action
    
    return agent

def create_agent_with_hyperparams_rl(algorithm_name, hyperparams, state_dim, action_dim, goal_dim, device):
    """Create an agent instance with specific hyperparameters for RL/behavioral cloning."""
    
    try:
        # Note: For behavioral cloning, we modify the input dimension
        # Input: state + goal (no current action needed)
        input_dim = state_dim + goal_dim
        
        if algorithm_name == 'AutoEncoder':
            agent = AutoEncoderAgent(
                state_dim=input_dim,  # state + goal
                action_dim=action_dim,
                goal_dim=0,  # Already included in state_dim
                latent_dim=hyperparams.get('latent_dim', 64),
                hidden_dims=hyperparams.get('hidden_dims', [128]),
                lr=hyperparams.get('lr', 1e-3),
                dropout=hyperparams.get('dropout', 0.0),
                activation=hyperparams.get('activation', 'ReLU'),
                batch_norm=hyperparams.get('batch_norm', False),
                device=device
            )
        
        elif algorithm_name == 'Bayesian':
            agent = BayesianAgent(
                state_dim=input_dim,
                action_dim=action_dim,
                goal_dim=0,
                latent_dim=hyperparams.get('latent_dim', 64),
                lr=hyperparams.get('lr', 1e-3),
                kl_weight=hyperparams.get('kl_weight', 0.01),
                prior_std=hyperparams.get('prior_std', 1.0),
                device=device
            )
        
        elif algorithm_name == 'Transformer':
            agent = TransformerAgent(
                state_dim=input_dim,
                action_dim=action_dim,
                goal_dim=0,
                d_model=hyperparams.get('d_model', 64),
                nhead=hyperparams.get('nhead', 4),
                num_layers=hyperparams.get('num_layers', 2),
                dropout=hyperparams.get('dropout', 0.1),
                lr=hyperparams.get('lr', 1e-3),
                device=device,
            )
        
        elif algorithm_name == 'Linear':
            agent = LinearAgent(
                state_dim=input_dim,
                action_dim=action_dim,
                goal_dim=0,
                lr=hyperparams.get('lr', 1e-3),
                weight_decay=hyperparams.get('weight_decay', 0.0),
                device=device
            )
        
        elif algorithm_name == 'VAE':
            agent = VAEAgent(
                state_dim=input_dim,
                action_dim=action_dim,
                goal_dim=0,
                latent_dim=hyperparams.get('latent_dim', 32),
                hidden_dim=hyperparams.get('hidden_dim', 128),
                lr=hyperparams.get('lr', 1e-3),
                beta=hyperparams.get('beta', 1.0),
                device=device
            )
        
        else:
            raise ValueError(f"Unknown algorithm: {algorithm_name}")
        
        # Add predict_action method for policy execution
        agent = add_predict_action_method(agent)
        return agent
    
    except Exception as e:
        print(f"Error creating agent {algorithm_name} with hyperparams {hyperparams}: {str(e)}")
        return None

# --- Universal Agent Save Function ---
def save_agent_model(agent, path):
    """Save model for any agent type."""
    try:
        if hasattr(agent, 'save'):
            agent.save(path)
        elif hasattr(agent, 'model'):
            torch.save(agent.model.state_dict(), path)
        elif hasattr(agent, 'encoder') and hasattr(agent, 'decoder'):
            torch.save({
                'encoder': agent.encoder.state_dict(),
                'decoder': agent.decoder.state_dict()
            }, path)
        else:
            raise AttributeError("No model found to save.")
        print(f"Model saved to {path}")
    except Exception as e:
        print(f"Could not save model: {e}")


## Data Collection and Preparation

In [None]:
def collect_rl_experience(env, num_episodes=100, max_steps=200):
    """
    Collect RL training data using optimal policy from visibility graph.
    This provides expert demonstrations for imitation learning.
    """
    data = []
    successful_episodes = 0
    
    for ep in tqdm(range(num_episodes), desc='Collecting RL experience'):
        state = env.reset()
        goal = env.goal.copy() if hasattr(env, 'goal') else np.zeros(2)
        episode_transitions = []
        episode_reward = 0
        
        for t in range(max_steps):
            # Get optimal action using visibility graph
            current_pos = state[:2]
            
            try:
                # Use environment's visibility graph for shortest path
                if hasattr(env, 'vgraph'):
                    path = env.vgraph.shortest_path(current_pos, goal)
                    
                    if len(path) > 1:
                        # Direction to next waypoint
                        next_waypoint = path[1]
                        direction = next_waypoint - current_pos
                        action = direction / (np.linalg.norm(direction) + 1e-8)
                    else:
                        action = np.zeros(2)
                else:
                    # Fallback: direct to goal
                    direction = goal - current_pos
                    action = direction / (np.linalg.norm(direction) + 1e-8)
                    
            except Exception as e:
                # Fallback: move towards goal
                direction = goal - current_pos
                action = direction / (np.linalg.norm(direction) + 1e-8)
            
            # Clip to action space
            action = np.clip(action, env.action_space.low, env.action_space.high)
            
            # Take action in environment
            next_state, reward, done, info = env.step(action)
            
            # Store transition (s, a, r, s', done)
            episode_transitions.append({
                'state': state.copy(),
                'action': action.copy(),
                'reward': reward,
                'next_state': next_state.copy(),
                'done': done,
                'goal': goal.copy()
            })
            
            episode_reward += reward
            state = next_state
            
            if done:
                if info.get('reason') == 'goal_reached':
                    successful_episodes += 1
                break
        
        # Add all transitions from this episode
        data.extend(episode_transitions)
    
    print(f"Collected {len(data)} transitions from {num_episodes} episodes")
    print(f"Success rate: {successful_episodes/num_episodes:.2%}")
    
    return data

def prepare_rl_data(data):
    """Convert RL experience to arrays for training."""
    states = np.stack([d['state'] for d in data])
    actions = np.stack([d['action'] for d in data])
    rewards = np.array([d['reward'] for d in data])
    next_states = np.stack([d['next_state'] for d in data])
    dones = np.array([d['done'] for d in data])
    goals = np.stack([d['goal'] for d in data])
    
    return states, actions, rewards, next_states, dones, goals

# Collect or load RL experience data
dataset_path = 'rl_experience_dataset.pickle'
if os.path.exists(dataset_path):
    print(f"Loading existing RL dataset from {dataset_path}")
    data = load_pickle(dataset_path)
else:
    print("Collecting RL experience data using optimal policy...")
    data = collect_rl_experience(env, CONFIG['num_episodes'], CONFIG['max_steps'])
    save_pickle(data, dataset_path)
    print(f"RL dataset saved to {dataset_path}")

# Prepare RL data
states, actions, rewards, next_states, dones, goals = prepare_rl_data(data)

print(f"\nRL Dataset Statistics:")
print(f"  Total transitions: {len(states)}")
print(f"  Avg reward: {rewards.mean():.3f}")
print(f"  Success rate (goal reached): {(rewards > 0).sum() / len(rewards):.2%}")
print(f"  State shape: {states.shape}")
print(f"  Action shape: {actions.shape}")


## Helper Functions for Training and Evaluation

In [None]:
def train_agent_rl(agent, states, actions, rewards, next_states, dones, goals, num_epochs, batch_size, gamma=0.99):
    """
    Train an agent using Q-learning/behavioral cloning hybrid approach.
    This is imitation learning from expert demonstrations.
    """
    n_samples = len(states)
    num_batches = n_samples // batch_size
    
    train_losses = []
    val_losses = []
    best_loss = float('inf')
    patience_counter = 0
    
    # Split into train/val
    val_size = int(0.2 * n_samples)
    indices = np.random.permutation(n_samples)
    train_idx, val_idx = indices[val_size:], indices[:val_size]
    
    print(f"Training with {len(train_idx)} samples, validating with {len(val_idx)} samples...")
    
    for epoch in range(num_epochs):
        # Shuffle training data
        train_idx_shuffled = np.random.permutation(train_idx)
        epoch_loss = 0.0
        
        for batch_num in range(num_batches):
            start_idx = batch_num * batch_size
            end_idx = min(start_idx + batch_size, len(train_idx_shuffled))
            batch_indices = train_idx_shuffled[start_idx:end_idx]
            
            # Get batch
            batch_states = states[batch_indices]
            batch_actions = actions[batch_indices]
            batch_goals = goals[batch_indices]
            
            # Convert to tensors
            device = getattr(agent, 'device', 'cpu')
            batch_states_t = torch.tensor(batch_states, dtype=torch.float32, device=device)
            batch_actions_t = torch.tensor(batch_actions, dtype=torch.float32, device=device)
            batch_goals_t = torch.tensor(batch_goals, dtype=torch.float32, device=device)
            
            # Behavioral cloning: predict expert action
            loss_result = agent.train_step(batch_states_t, batch_actions_t, batch_goals_t, batch_actions_t)
            if isinstance(loss_result, dict):
                loss = loss_result['loss']
            else:
                loss = loss_result
            
            epoch_loss += loss
        
        avg_train_loss = epoch_loss / num_batches
        
        # Validation
        val_states_t = torch.tensor(states[val_idx], dtype=torch.float32, device=device)
        val_actions_t = torch.tensor(actions[val_idx], dtype=torch.float32, device=device)
        val_goals_t = torch.tensor(goals[val_idx], dtype=torch.float32, device=device)
        val_loss = compute_validation_loss_rl(agent, val_states_t, val_actions_t, val_goals_t)
        
        train_losses.append(avg_train_loss)
        val_losses.append(val_loss)
        
        # Early stopping
        if val_loss < best_loss:
            best_loss = val_loss
            patience_counter = 0
        else:
            patience_counter += 1
        
        if (epoch + 1) % 10 == 0:
            print(f"  Epoch {epoch+1}/{num_epochs}: Train Loss={avg_train_loss:.6f}, Val Loss={val_loss:.6f} {'✓' if patience_counter == 0 else f'[patience {patience_counter}/{CONFIG.get("patience", 5)}]'}")
        
        if patience_counter >= CONFIG.get('patience', 5):
            print(f"  Early stopping at epoch {epoch+1} (best val loss: {best_loss:.6f})")
            break
    
    return train_losses, val_losses

def compute_validation_loss_rl(agent, states, actions, goals):
    """Compute validation loss for behavioral cloning."""
    # Set to eval mode
    if hasattr(agent, 'model'):
        agent.model.eval()
    if hasattr(agent, 'encoder'):
        agent.encoder.eval()
    if hasattr(agent, 'decoder'):
        agent.decoder.eval()
    
    with torch.no_grad():
        # For models that predict actions from state+goal
        if hasattr(agent, 'model'):
            # Concatenate state and goal as input
            inputs = torch.cat([states, goals], dim=-1)
            predictions = agent.model(inputs)
            val_loss = torch.mean((predictions - actions)**2).item()
        elif hasattr(agent, 'encoder') and hasattr(agent, 'decoder'):
            # For encoder-decoder architectures
            # Try to predict actions
            subset_size = min(100, len(states))
            val_loss_sum = 0.0
            for i in range(subset_size):
                try:
                    pred = agent.predict_action(states[i].cpu().numpy(), goals[i].cpu().numpy())
                    if isinstance(pred, tuple):
                        pred = pred[0]
                    if isinstance(pred, np.ndarray):
                        mse = np.mean((pred - actions[i].cpu().numpy())**2)
                    else:
                        mse = float(torch.mean((pred - actions[i])**2))
                    val_loss_sum += mse
                except:
                    # Fallback to simple MSE
                    val_loss_sum += 0.5
            val_loss = val_loss_sum / subset_size
        else:
            val_loss = 0.5  # Fallback
    
    return val_loss

def evaluate_agent(agent, env, num_episodes=10, max_steps=200):
    """Evaluate agent performance in the environment."""
    episode_rewards = []
    episode_mses = []
    success_count = 0
    
    for ep in range(num_episodes):
        state = env.reset()
        goal = env.goal.copy() if hasattr(env, 'goal') else np.zeros(2)
        ep_reward = 0.0
        ep_mses = []
        
        for t in range(max_steps):
            # Get action from agent
            try:
                if hasattr(agent, 'predict_action'):
                    action_result = agent.predict_action(state, goal)
                else:
                    # Fallback to predict_next_action
                    dummy_action = np.zeros(env.action_space.shape[0])
                    action_result = agent.predict_next_action(state, dummy_action, goal)
                
                if isinstance(action_result, tuple):
                    action = action_result[0]
                else:
                    action = action_result
                
                # Ensure correct shape
                if isinstance(action, torch.Tensor):
                    action = action.detach().cpu().numpy()
                if action.ndim > 1:
                    action = action.flatten()
                
                action = np.clip(action, env.action_space.low, env.action_space.high)
                
            except Exception as e:
                # Fallback to random action
                action = env.action_space.sample()
            
            next_state, reward, done, info = env.step(action)
            
            # Calculate MSE with optimal action
            try:
                current_pos = state[:2]
                if hasattr(env, 'vgraph'):
                    path = env.vgraph.shortest_path(current_pos, goal)
                    if len(path) > 1:
                        direction = path[1] - current_pos
                        optimal_action = direction / (np.linalg.norm(direction) + 1e-8)
                    else:
                        optimal_action = np.zeros_like(action)
                else:
                    direction = goal - current_pos
                    optimal_action = direction / (np.linalg.norm(direction) + 1e-8)
                
                optimal_action = np.clip(optimal_action, env.action_space.low, env.action_space.high)
                mse = np.mean((action - optimal_action)**2)
                ep_mses.append(mse)
            except:
                ep_mses.append(0.5)
            
            ep_reward += reward
            state = next_state
            
            if done:
                if info.get('reason') == 'goal_reached':
                    success_count += 1
                break
        
        episode_rewards.append(ep_reward)
    
    return {
        'avg_reward': np.mean(episode_rewards),
        'avg_mse': np.mean(ep_mses) if ep_mses else 0.5,
        'success_rate': success_count / num_episodes
    }

def count_parameters(model):
    """Count the number of trainable parameters in a model."""
    if hasattr(model, 'model'):
        return sum(p.numel() for p in model.model.parameters() if p.requires_grad)
    elif hasattr(model, 'encoder') and hasattr(model, 'decoder'):
        return sum(p.numel() for p in model.encoder.parameters() if p.requires_grad) + \
               sum(p.numel() for p in model.decoder.parameters() if p.requires_grad)
    return 0


## AutoEncoder: Hyperparameter Tuning and Training

In [None]:
print("="*60)
print("AUTOENCODER: HYPERPARAMETER TUNING & TRAINING")
print("="*60)

# Clear GPU cache before starting
if torch.cuda.is_available():
    clear_gpu_cache()
    check_gpu_status()

# Step 1: Hyperparameter Tuning
print("Step 1: Hyperparameter Tuning")
ae_best_hyperparams, ae_best_score, ae_search_results = hyperparameter_search(
    'AutoEncoder', train_data, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

print(f"\nBest AutoEncoder hyperparameters:")
for key, value in ae_best_hyperparams.items():
    print(f"  {key}: {value}")
print(f"Best CV score: {ae_best_score:.6f}")

# Clear GPU cache before final training
if torch.cuda.is_available():
    clear_gpu_cache()

# Step 2: Train with Best Hyperparameters
print(f"\nStep 2: Training AutoEncoder with best hyperparameters...")
autoencoder_agent = create_agent_with_hyperparams(
    'AutoEncoder', ae_best_hyperparams, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

# Train AutoEncoder with full epochs
start_time = time.time()
ae_train_losses, ae_val_losses = train_agent(
    autoencoder_agent, train_data, val_data, 
    CONFIG['num_epochs'], CONFIG['batch_size']
)
ae_train_time = time.time() - start_time

# Step 3: Evaluation
print(f"\nStep 3: Evaluating AutoEncoder...")
start_time = time.time()
ae_results = evaluate_agent(autoencoder_agent, env, CONFIG['num_test_episodes'])
ae_test_time = time.time() - start_time

# Store results with hyperparameter information
results['algorithm'].append('AutoEncoder')
results['train_time'].append(ae_train_time)
results['test_time'].append(ae_test_time)
results['final_train_loss'].append(ae_train_losses[-1])
results['final_val_loss'].append(ae_val_losses[-1])
results['avg_mse'].append(ae_results['avg_mse'])
results['avg_reward'].append(ae_results['avg_reward'])
results['success_rate'].append(ae_results['success_rate'])
results['model_params'].append(count_parameters(autoencoder_agent))
results['best_hyperparams'].append(ae_best_hyperparams)
results['hp_search_cv_score'].append(ae_best_score)

print(f"\nAutoEncoder Final Results:")
print(f"  Best Hyperparams: {ae_best_hyperparams}")
print(f"  HP Search CV Score: {ae_best_score:.6f}")
print(f"  Train Time: {ae_train_time:.2f}s")
print(f"  Final Train Loss: {ae_train_losses[-1]:.6f}")
print(f"  Final Val Loss: {ae_val_losses[-1]:.6f}")
print(f"  Avg MSE: {ae_results['avg_mse']:.6f}")
print(f"  Avg Reward: {ae_results['avg_reward']:.3f}")
print(f"  Success Rate: {ae_results['success_rate']:.3f}")
print(f"  Model Parameters: {count_parameters(autoencoder_agent):,}")
print()

# Check GPU status after training
if torch.cuda.is_available():
    check_gpu_status()


In [None]:
import json 
# Save the trained AutoEncoder model
# model_dir = "trained_models"
model_dir = "risky_navigation/trained_models"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, f"autoencoder_model_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pt")

# Save model state dictionary
if hasattr(autoencoder_agent, 'model'):
    torch.save(autoencoder_agent.model.state_dict(), model_path)
    print(f"AutoEncoder model saved to {model_path}")
else:
    print("Could not save model: model attribute not found")

# Also save hyperparameters for reproducibility
hyperparams_path = os.path.join(model_dir, f"autoencoder_hyperparams_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
with open(hyperparams_path, 'w') as f:
    json.dump(ae_best_hyperparams, f, indent=2)
    print(f"Hyperparameters saved to {hyperparams_path}")

## Bayesian: Hyperparameter Tuning and Training

In [None]:
print("="*60)
print("BAYESIAN: HYPERPARAMETER TUNING & TRAINING")
print("="*60)

# Clear GPU cache before starting
if torch.cuda.is_available():
    clear_gpu_cache()
    check_gpu_status()

# Step 1: Hyperparameter Tuning
print("Step 1: Hyperparameter Tuning")
bay_best_hyperparams, bay_best_score, bay_search_results = hyperparameter_search(
    'Bayesian', train_data, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

print(f"\nBest Bayesian hyperparameters:")
for key, value in bay_best_hyperparams.items():
    print(f"  {key}: {value}")
print(f"Best CV score: {bay_best_score:.6f}")

# Clear GPU cache before final training
if torch.cuda.is_available():
    clear_gpu_cache()

# Step 2: Train with Best Hyperparameters
print(f"\nStep 2: Training Bayesian with best hyperparameters...")
bayesian_agent = create_agent_with_hyperparams(
    'Bayesian', bay_best_hyperparams, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

# Train Bayesian with full epochs
start_time = time.time()
bay_train_losses, bay_val_losses = train_agent(
    bayesian_agent, train_data, val_data,
    CONFIG['num_epochs'], CONFIG['batch_size']
)
bay_train_time = time.time() - start_time

# Step 3: Evaluation
print(f"\nStep 3: Evaluating Bayesian...")
start_time = time.time()
bay_results = evaluate_agent(bayesian_agent, env, CONFIG['num_test_episodes'])
bay_test_time = time.time() - start_time

# Store results with hyperparameter information
results['algorithm'].append('Bayesian')
results['train_time'].append(bay_train_time)
results['test_time'].append(bay_test_time)
results['final_train_loss'].append(bay_train_losses[-1])
results['final_val_loss'].append(bay_val_losses[-1])
results['avg_mse'].append(bay_results['avg_mse'])
results['avg_reward'].append(bay_results['avg_reward'])
results['success_rate'].append(bay_results['success_rate'])
results['model_params'].append(count_parameters(bayesian_agent))
results['best_hyperparams'].append(bay_best_hyperparams)
results['hp_search_cv_score'].append(bay_best_score)

print(f"\nBayesian Final Results:")
print(f"  Best Hyperparams: {bay_best_hyperparams}")
print(f"  HP Search CV Score: {bay_best_score:.6f}")
print(f"  Train Time: {bay_train_time:.2f}s")
print(f"  Final Train Loss: {bay_train_losses[-1]:.6f}")
print(f"  Final Val Loss: {bay_val_losses[-1]:.6f}")
print(f"  Avg MSE: {bay_results['avg_mse']:.6f}")
print(f"  Avg Reward: {bay_results['avg_reward']:.3f}")
print(f"  Success Rate: {bay_results['success_rate']:.3f}")
print(f"  Model Parameters: {count_parameters(bayesian_agent):,}")
print()

# Check GPU status after training
if torch.cuda.is_available():
    check_gpu_status()


In [None]:
# Save the trained Bayesian model (robust universal method)
# model_dir = "trained_models"
model_dir = "risky_navigation/trained_models"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, f"bayesian_model_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pt")

# Use universal save function for all agent types
save_agent_model(bayesian_agent, model_path)

# Also save hyperparameters for reproducibility
hyperparams_path = os.path.join(model_dir, f"bayesian_hyperparams_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
with open(hyperparams_path, 'w') as f:
    json.dump(bay_best_hyperparams, f, indent=2)
    print(f"Hyperparameters saved to {hyperparams_path}")

## Transformer: Hyperparameter Tuning and Training

In [None]:
print("="*60)
print("TRANSFORMER: HYPERPARAMETER TUNING & TRAINING")
print("="*60)

# Clear GPU cache before starting
if torch.cuda.is_available():
    clear_gpu_cache()
    check_gpu_status()

# Step 1: Hyperparameter Tuning
print("Step 1: Hyperparameter Tuning")
trans_best_hyperparams, trans_best_score, trans_search_results = hyperparameter_search(
    'Transformer', train_data, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

print(f"\nBest Transformer hyperparameters:")
for key, value in trans_best_hyperparams.items():
    print(f"  {key}: {value}")
print(f"Best CV score: {trans_best_score:.6f}")

# Clear GPU cache before final training
if torch.cuda.is_available():
    clear_gpu_cache()

# Step 2: Train with Best Hyperparameters
print(f"\nStep 2: Training Transformer with best hyperparameters...")
transformer_agent = create_agent_with_hyperparams(
    'Transformer', trans_best_hyperparams, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

# Train Transformer with full epochs
start_time = time.time()
trans_train_losses, trans_val_losses = train_agent(
    transformer_agent, train_data, val_data,
    CONFIG['num_epochs'], CONFIG['batch_size']
)
trans_train_time = time.time() - start_time

# Step 3: Evaluation
print(f"\nStep 3: Evaluating Transformer...")
start_time = time.time()
trans_results = evaluate_agent(transformer_agent, env, CONFIG['num_test_episodes'])
trans_test_time = time.time() - start_time

# Store results with hyperparameter information
results['algorithm'].append('Transformer')
results['train_time'].append(trans_train_time)
results['test_time'].append(trans_test_time)
results['final_train_loss'].append(trans_train_losses[-1])
results['final_val_loss'].append(trans_val_losses[-1])
results['avg_mse'].append(trans_results['avg_mse'])
results['avg_reward'].append(trans_results['avg_reward'])
results['success_rate'].append(trans_results['success_rate'])
results['model_params'].append(count_parameters(transformer_agent))
results['best_hyperparams'].append(trans_best_hyperparams)
results['hp_search_cv_score'].append(trans_best_score)

print(f"\nTransformer Final Results:")
print(f"  Best Hyperparams: {trans_best_hyperparams}")
print(f"  HP Search CV Score: {trans_best_score:.6f}")
print(f"  Train Time: {trans_train_time:.2f}s")
print(f"  Final Train Loss: {trans_train_losses[-1]:.6f}")
print(f"  Final Val Loss: {trans_val_losses[-1]:.6f}")
print(f"  Avg MSE: {trans_results['avg_mse']:.6f}")
print(f"  Avg Reward: {trans_results['avg_reward']:.3f}")
print(f"  Success Rate: {trans_results['success_rate']:.3f}")
print(f"  Model Parameters: {count_parameters(transformer_agent):,}")
print()

# Check GPU status after training
if torch.cuda.is_available():
    check_gpu_status()


In [None]:
# Save the trained Bayesian model (robust universal method)
# model_dir = "trained_models"
model_dir = "risky_navigation/trained_models"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, f"transformer_model_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pt")

# Use universal save function for all agent types
save_agent_model(transformer_agent, model_path)

# Also save hyperparameters for reproducibility
hyperparams_path = os.path.join(model_dir, f"transformer_hyperparams_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
with open(hyperparams_path, 'w') as f:
    json.dump(trans_best_hyperparams, f, indent=2)
    print(f"Hyperparameters saved to {hyperparams_path}")

## Linear: Hyperparameter Tuning and Training

In [None]:
print("="*60)
print("LINEAR: HYPERPARAMETER TUNING & TRAINING")
print("="*60)

# Clear GPU cache before starting
if torch.cuda.is_available():
    clear_gpu_cache()
    check_gpu_status()

# Step 1: Hyperparameter Tuning
print("Step 1: Hyperparameter Tuning")
lin_best_hyperparams, lin_best_score, lin_search_results = hyperparameter_search(
    'Linear', train_data, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

print(f"\nBest Linear hyperparameters:")
for key, value in lin_best_hyperparams.items():
    print(f"  {key}: {value}")
print(f"Best CV score: {lin_best_score:.6f}")

# Clear GPU cache before final training
if torch.cuda.is_available():
    clear_gpu_cache()

# Step 2: Train with Best Hyperparameters
print(f"\nStep 2: Training Linear with best hyperparameters...")
linear_agent = create_agent_with_hyperparams(
    'Linear', lin_best_hyperparams, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

# Train Linear with full epochs
start_time = time.time()
lin_train_losses, lin_val_losses = train_agent(
    linear_agent, train_data, val_data,
    CONFIG['num_epochs'], CONFIG['batch_size']
)
lin_train_time = time.time() - start_time

# Step 3: Evaluation
print(f"\nStep 3: Evaluating Linear...")
start_time = time.time()
lin_results = evaluate_agent(linear_agent, env, CONFIG['num_test_episodes'])
lin_test_time = time.time() - start_time

# Store results with hyperparameter information
results['algorithm'].append('Linear')
results['train_time'].append(lin_train_time)
results['test_time'].append(lin_test_time)
results['final_train_loss'].append(lin_train_losses[-1])
results['final_val_loss'].append(lin_val_losses[-1])
results['avg_mse'].append(lin_results['avg_mse'])
results['avg_reward'].append(lin_results['avg_reward'])
results['success_rate'].append(lin_results['success_rate'])
results['model_params'].append(count_parameters(linear_agent))
results['best_hyperparams'].append(lin_best_hyperparams)
results['hp_search_cv_score'].append(lin_best_score)

print(f"\nLinear Final Results:")
print(f"  Best Hyperparams: {lin_best_hyperparams}")
print(f"  HP Search CV Score: {lin_best_score:.6f}")
print(f"  Train Time: {lin_train_time:.2f}s")
print(f"  Final Train Loss: {lin_train_losses[-1]:.6f}")
print(f"  Final Val Loss: {lin_val_losses[-1]:.6f}")
print(f"  Avg MSE: {lin_results['avg_mse']:.6f}")
print(f"  Avg Reward: {lin_results['avg_reward']:.3f}")
print(f"  Success Rate: {lin_results['success_rate']:.3f}")
print(f"  Model Parameters: {count_parameters(linear_agent):,}")
print()

# Check GPU status after training
if torch.cuda.is_available():
    check_gpu_status()


In [None]:
# Save the trained Bayesian model (robust universal method)
# model_dir = "trained_models"
model_dir = "risky_navigation/trained_models"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, f"linear_model_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pt")

# Use universal save function for all agent types
save_agent_model(linear_agent, model_path)

# Also save hyperparameters for reproducibility
hyperparams_path = os.path.join(model_dir, f"linear_model_hyperparams_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
with open(hyperparams_path, 'w') as f:
    json.dump(lin_best_hyperparams, f, indent=2)
    print(f"Hyperparameters saved to {hyperparams_path}")

## VAE: Hyperparameter Tuning and Training

In [None]:
print("="*60)
print("VAE: HYPERPARAMETER TUNING & TRAINING")
print("="*60)

# Clear GPU cache before starting
if torch.cuda.is_available():
    clear_gpu_cache()
    check_gpu_status()

# Step 1: Hyperparameter Tuning
print("Step 1: Hyperparameter Tuning")
vae_best_hyperparams, vae_best_score, vae_search_results = hyperparameter_search(
    'VAE', train_data, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

print(f"\nBest VAE hyperparameters:")
for key, value in vae_best_hyperparams.items():
    print(f"  {key}: {value}")
print(f"Best CV score: {vae_best_score:.6f}")

# Clear GPU cache before final training
if torch.cuda.is_available():
    clear_gpu_cache()

# Step 2: Train with Best Hyperparameters
print(f"\nStep 2: Training VAE with best hyperparameters...")
vae_agent = create_agent_with_hyperparams(
    'VAE', vae_best_hyperparams, STATE_DIM, ACTION_DIM, GOAL_DIM, CONFIG['device']
)

# Train VAE with full epochs
start_time = time.time()
vae_train_losses, vae_val_losses = train_agent(
    vae_agent, train_data, val_data,
    CONFIG['num_epochs'], CONFIG['batch_size']
)
vae_train_time = time.time() - start_time

# Step 3: Evaluation
print(f"\nStep 3: Evaluating VAE...")
start_time = time.time()
vae_results = evaluate_agent(vae_agent, env, CONFIG['num_test_episodes'])
vae_test_time = time.time() - start_time

# Store results with hyperparameter information
results['algorithm'].append('VAE')
results['train_time'].append(vae_train_time)
results['test_time'].append(vae_test_time)
results['final_train_loss'].append(vae_train_losses[-1])
results['final_val_loss'].append(vae_val_losses[-1])
results['avg_mse'].append(vae_results['avg_mse'])
results['avg_reward'].append(vae_results['avg_reward'])
results['success_rate'].append(vae_results['success_rate'])
results['model_params'].append(count_parameters(vae_agent))
results['best_hyperparams'].append(vae_best_hyperparams)
results['hp_search_cv_score'].append(vae_best_score)

print(f"\nVAE Final Results:")
print(f"  Best Hyperparams: {vae_best_hyperparams}")
print(f"  HP Search CV Score: {vae_best_score:.6f}")
print(f"  Train Time: {vae_train_time:.2f}s")
print(f"  Final Train Loss: {vae_train_losses[-1]:.6f}")
print(f"  Final Val Loss: {vae_val_losses[-1]:.6f}")
print(f"  Avg MSE: {vae_results['avg_mse']:.6f}")
print(f"  Avg Reward: {vae_results['avg_reward']:.3f}")
print(f"  Success Rate: {vae_results['success_rate']:.3f}")
print(f"  Model Parameters: {count_parameters(vae_agent):,}")
print()

# Check GPU status after training
if torch.cuda.is_available():
    check_gpu_status()


In [None]:
# Save the trained Bayesian model (robust universal method)
# model_dir = "trained_models"
model_dir = "risky_navigation/trained_models"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, f"vae_model_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pt")

# Use universal save function for all agent types
save_agent_model(vae_agent, model_path)

# Also save hyperparameters for reproducibility
hyperparams_path = os.path.join(model_dir, f"vae_model_hyperparams_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
with open(hyperparams_path, 'w') as f:
    json.dump(vae_best_hyperparams, f, indent=2)
    print(f"Hyperparameters saved to {hyperparams_path}")

## Final Results Summary and Visualization

## Hyperparameter Tuning Summary

In [None]:
# Hyperparameter tuning summary
print("="*80)
print("HYPERPARAMETER TUNING SUMMARY")
print("="*80)

# Collect all best hyperparameters and scores
hp_summary = {
    'Algorithm': [],
    'Best_CV_Score': [],
    'Best_Hyperparameters': []
}

try:
    hp_summary['Algorithm'].append('AutoEncoder')
    hp_summary['Best_CV_Score'].append(ae_best_score)
    hp_summary['Best_Hyperparameters'].append(ae_best_hyperparams)
except NameError:
    print("AutoEncoder hyperparameter tuning not completed yet.")

try:
    hp_summary['Algorithm'].append('Bayesian')
    hp_summary['Best_CV_Score'].append(bay_best_score)
    hp_summary['Best_Hyperparameters'].append(bay_best_hyperparams)
except NameError:
    print("Bayesian hyperparameter tuning not completed yet.")

try:
    hp_summary['Algorithm'].append('Transformer')
    hp_summary['Best_CV_Score'].append(trans_best_score)
    hp_summary['Best_Hyperparameters'].append(trans_best_hyperparams)
except NameError:
    print("Transformer hyperparameter tuning not completed yet.")

try:
    hp_summary['Algorithm'].append('Linear')
    hp_summary['Best_CV_Score'].append(lin_best_score)
    hp_summary['Best_Hyperparameters'].append(lin_best_hyperparams)
except NameError:
    print("Linear hyperparameter tuning not completed yet.")

try:
    hp_summary['Algorithm'].append('VAE')
    hp_summary['Best_CV_Score'].append(vae_best_score)
    hp_summary['Best_Hyperparameters'].append(vae_best_hyperparams)
except NameError:
    print("VAE hyperparameter tuning not completed yet.")

if hp_summary['Algorithm']:
    hp_df = pd.DataFrame(hp_summary)
    
    print("\nBest Cross-Validation Scores from Hyperparameter Tuning:")
    print("-" * 60)
    for _, row in hp_df.iterrows():
        print(f"{row['Algorithm']:>12}: {row['Best_CV_Score']:.6f}")
    
    print(f"\nBest performing algorithm in hyperparameter search: {hp_df.loc[hp_df['Best_CV_Score'].idxmin(), 'Algorithm']}")
    
    print("\nDetailed Best Hyperparameters:")
    print("-" * 60)
    for _, row in hp_df.iterrows():
        print(f"\n{row['Algorithm']}:")
        for key, value in row['Best_Hyperparameters'].items():
            print(f"  {key}: {value}")

    # Create visualization of hyperparameter search results
    if len(hp_summary['Algorithm']) > 1:
        fig, ax = plt.subplots(figsize=(10, 6))
        bars = ax.bar(hp_df['Algorithm'], hp_df['Best_CV_Score'], 
                     color=sns.color_palette("husl", len(hp_df)))
        ax.set_title('Best Cross-Validation Scores from Hyperparameter Tuning', 
                    fontsize=14, fontweight='bold')
        ax.set_ylabel('CV Score (lower is better)')
        ax.set_xlabel('Algorithm')
        
        # Add value labels on bars
        for bar, score in zip(bars, hp_df['Best_CV_Score']):
            height = bar.get_height()
            ax.text(bar.get_x() + bar.get_width()/2., height,
                   f'{score:.4f}', ha='center', va='bottom')
        
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
else:
    print("No hyperparameter tuning results available yet.")

print("\n" + "="*80)

In [None]:
# Create results DataFrame
df_results = pd.DataFrame(results)

# Display results table
print("="*80)
print("ALGORITHM COMPARISON RESULTS")
print("="*80)
print(df_results.round(6))
print()

# Create summary statistics
print("SUMMARY STATISTICS:")
print("-" * 40)
print(f"Best Average Reward: {df_results['algorithm'][df_results['avg_reward'].idxmax()]} ({df_results['avg_reward'].max():.3f})")
print(f"Lowest MSE: {df_results['algorithm'][df_results['avg_mse'].idxmin()]} ({df_results['avg_mse'].min():.6f})")
print(f"Highest Success Rate: {df_results['algorithm'][df_results['success_rate'].idxmax()]} ({df_results['success_rate'].max():.3f})")
print(f"Fastest Training: {df_results['algorithm'][df_results['train_time'].idxmin()]} ({df_results['train_time'].min():.2f}s)")
print(f"Lowest Validation Loss: {df_results['algorithm'][df_results['final_val_loss'].idxmin()]} ({df_results['final_val_loss'].min():.6f})")
print()

## Performance Visualizations

In [None]:
# Create comprehensive visualization
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Algorithm Comparison Results', fontsize=16, fontweight='bold')

# 1. Average Reward Comparison
axes[0, 0].bar(df_results['algorithm'], df_results['avg_reward'], color=sns.color_palette("husl", len(df_results)))
axes[0, 0].set_title('Average Reward per Episode')
axes[0, 0].set_ylabel('Reward')
axes[0, 0].tick_params(axis='x', rotation=45)

# 2. MSE Comparison
axes[0, 1].bar(df_results['algorithm'], df_results['avg_mse'], color=sns.color_palette("husl", len(df_results)))
axes[0, 1].set_title('Average Mean Squared Error')
axes[0, 1].set_ylabel('MSE')
axes[0, 1].tick_params(axis='x', rotation=45)
axes[0, 1].set_yscale('log')  # Log scale for better visibility

# 3. Success Rate Comparison
axes[0, 2].bar(df_results['algorithm'], df_results['success_rate'], color=sns.color_palette("husl", len(df_results)))
axes[0, 2].set_title('Success Rate')
axes[0, 2].set_ylabel('Success Rate')
axes[0, 2].tick_params(axis='x', rotation=45)
axes[0, 2].set_ylim(0, 1)

# 4. Training Time Comparison
axes[1, 0].bar(df_results['algorithm'], df_results['train_time'], color=sns.color_palette("husl", len(df_results)))
axes[1, 0].set_title('Training Time')
axes[1, 0].set_ylabel('Time (seconds)')
axes[1, 0].tick_params(axis='x', rotation=45)

# 5. Model Parameters Comparison
axes[1, 1].bar(df_results['algorithm'], df_results['model_params'], color=sns.color_palette("husl", len(df_results)))
axes[1, 1].set_title('Model Parameters')
axes[1, 1].set_ylabel('Number of Parameters')
axes[1, 1].tick_params(axis='x', rotation=45)
axes[1, 1].set_yscale('log')  # Log scale for better visibility

# 6. Final Validation Loss Comparison
axes[1, 2].bar(df_results['algorithm'], df_results['final_val_loss'], color=sns.color_palette("husl", len(df_results)))
axes[1, 2].set_title('Final Validation Loss')
axes[1, 2].set_ylabel('Validation Loss')
axes[1, 2].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## Detailed Performance Radar Chart

In [None]:
# Create radar chart for comprehensive comparison
def create_radar_chart(df):
    # Normalize metrics for radar chart (0-1 scale)
    normalized_df = df.copy()
    
    # For metrics where lower is better (MSE, train_time, val_loss), invert them
    normalized_df['norm_mse'] = 1 - (df['avg_mse'] - df['avg_mse'].min()) / (df['avg_mse'].max() - df['avg_mse'].min())
    normalized_df['norm_train_time'] = 1 - (df['train_time'] - df['train_time'].min()) / (df['train_time'].max() - df['train_time'].min())
    normalized_df['norm_val_loss'] = 1 - (df['final_val_loss'] - df['final_val_loss'].min()) / (df['final_val_loss'].max() - df['final_val_loss'].min())
    
    # For metrics where higher is better, normalize directly
    normalized_df['norm_reward'] = (df['avg_reward'] - df['avg_reward'].min()) / (df['avg_reward'].max() - df['avg_reward'].min())
    normalized_df['norm_success'] = df['success_rate']  # Already 0-1
    
    # Parameters normalized (smaller models get higher scores)
    normalized_df['norm_params'] = 1 - (df['model_params'] - df['model_params'].min()) / (df['model_params'].max() - df['model_params'].min())
    
    # Metrics for radar chart
    metrics = ['norm_reward', 'norm_mse', 'norm_success', 'norm_train_time', 'norm_val_loss', 'norm_params']
    metric_labels = ['Reward', 'MSE (inv)', 'Success Rate', 'Train Time (inv)', 'Val Loss (inv)', 'Model Size (inv)']
    
    # Number of metrics
    N = len(metrics)
    
    # Compute angles for each metric
    angles = [n / float(N) * 2 * np.pi for n in range(N)]
    angles += angles[:1]  # Complete the circle
    
    # Create radar chart
    fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(projection='polar'))
    
    colors = sns.color_palette("husl", len(normalized_df))
    
    for i, (idx, row) in enumerate(normalized_df.iterrows()):
        values = [row[metric] for metric in metrics]
        values += values[:1]  # Complete the circle
        
        ax.plot(angles, values, 'o-', linewidth=2, label=row['algorithm'], color=colors[i])
        ax.fill(angles, values, alpha=0.25, color=colors[i])
    
    # Add labels
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(metric_labels)
    ax.set_ylim(0, 1)
    ax.set_title('Algorithm Performance Radar Chart\n(Higher values = better performance)', size=14, fontweight='bold', pad=20)
    ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0))
    ax.grid(True)
    
    plt.tight_layout()
    plt.show()

create_radar_chart(df_results)

## Export Results to CSV

In [None]:
# Save results to CSV with hyperparameter information
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
csv_filename = f"algorithm_comparison_results_{timestamp}.csv"

# Create a more detailed DataFrame for export
export_df = pd.DataFrame(results)

# Convert hyperparameters to string format for CSV
export_df['best_hyperparams_str'] = export_df['best_hyperparams'].apply(lambda x: str(x) if x else 'N/A')

# Save main results
export_df.to_csv(csv_filename, index=False)
print(f"Results saved to: {csv_filename}")

# Create a detailed summary report
summary_report = f"""
ALGORITHM COMPARISON SUMMARY REPORT (WITH HYPERPARAMETER TUNING)
Generated on: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
{'='*80}

EXPERIMENTAL SETUP:
- Number of training episodes: {CONFIG['num_episodes']}
- Training epochs per algorithm: {CONFIG['num_epochs']}
- Batch size: {CONFIG['batch_size']}
- Test episodes: {CONFIG['num_test_episodes']}
- Device: {CONFIG['device']}

HYPERPARAMETER TUNING SETUP:
- Search method: {CONFIG['search_method']}
- Number of trials per algorithm: {CONFIG['n_trials']}
- Cross-validation folds: {CONFIG['cv_folds']}
- Early stopping patience: {CONFIG['patience']}
- HP search epochs: {CONFIG['hp_epochs']}

RESULTS RANKING:

1. Best Overall Performance (Average Reward):
   {export_df.loc[export_df['avg_reward'].idxmax(), 'algorithm']} - {export_df['avg_reward'].max():.3f}
   Best hyperparams: {export_df.loc[export_df['avg_reward'].idxmax(), 'best_hyperparams']}

2. Most Accurate Predictions (Lowest MSE):
   {export_df.loc[export_df['avg_mse'].idxmin(), 'algorithm']} - {export_df['avg_mse'].min():.6f}
   Best hyperparams: {export_df.loc[export_df['avg_mse'].idxmin(), 'best_hyperparams']}

3. Highest Success Rate:
   {export_df.loc[export_df['success_rate'].idxmax(), 'algorithm']} - {export_df['success_rate'].max():.3f}
   Best hyperparams: {export_df.loc[export_df['success_rate'].idxmax(), 'best_hyperparams']}

4. Fastest Training:
   {export_df.loc[export_df['train_time'].idxmin(), 'algorithm']} - {export_df['train_time'].min():.2f}s
   Best hyperparams: {export_df.loc[export_df['train_time'].idxmin(), 'best_hyperparams']}

5. Best Validation Performance:
   {export_df.loc[export_df['final_val_loss'].idxmin(), 'algorithm']} - {export_df['final_val_loss'].min():.6f}
   Best hyperparams: {export_df.loc[export_df['final_val_loss'].idxmin(), 'best_hyperparams']}

6. Best Cross-Validation Score (from hyperparameter search):
   {export_df.loc[export_df['hp_search_cv_score'].idxmin(), 'algorithm']} - {export_df['hp_search_cv_score'].min():.6f}

DETAILED RESULTS:
{export_df[['algorithm', 'avg_reward', 'avg_mse', 'success_rate', 'train_time', 'hp_search_cv_score']].to_string(index=False)}

HYPERPARAMETER DETAILS:
"""

for _, row in export_df.iterrows():
    summary_report += f"""
{row['algorithm']}:
  Best hyperparameters: {row['best_hyperparams']}
  HP search CV score: {row['hp_search_cv_score']:.6f}
  Final validation loss: {row['final_val_loss']:.6f}
  Model parameters: {row['model_params']:,}
"""

summary_report += f"""

RECOMMENDATIONS:
- For real-time applications: Choose the algorithm with fastest training/inference
- For accuracy-critical tasks: Choose the algorithm with lowest MSE
- For exploration tasks: Choose the algorithm with highest success rate
- For resource-constrained environments: Choose the algorithm with fewest parameters
- For robust performance: Consider the algorithm with best cross-validation score

METHODOLOGY NOTES:
- All algorithms underwent {CONFIG['cv_folds']}-fold cross-validation hyperparameter tuning
- Best hyperparameters were selected based on validation loss minimization
- Final models were trained with full epochs using the best hyperparameters
- This ensures fair comparison and optimal performance for each algorithm
"""

# Save summary report
summary_filename = f"algorithm_comparison_summary_{timestamp}.txt"
with open(summary_filename, 'w') as f:
    f.write(summary_report)

print(f"Summary report saved to: {summary_filename}")
print("\nSummary Report:")
print(summary_report)