# CNN 2D para Detecci√≥n de Parkinson (Baseline con Augmentation + Optuna)
## Baseline Model - Train/Val/Test Split + Hyperparameter Optimization

Este notebook entrena un modelo **CNN2D simple** (sin Domain Adaptation) para clasificaci√≥n binaria Parkinson vs Healthy **usando data augmentation** y **optimizaci√≥n autom√°tica de hiperpar√°metros con Optuna**.

### Pipeline:
1. **Setup**: Configuraci√≥n del entorno
2. **Data Loading**: Carga de datos CON augmentation
3. **Split**: Train/Val/Test estratificado (70/15/15)
4. **Optuna Optimization**: Optimizaci√≥n autom√°tica de hiperpar√°metros (20 configuraciones)
5. **Final Training**: Re-entrenamiento con mejores hiperpar√°metros + early stopping
6. **Evaluation**: M√©tricas completas en test set
7. **Visualization**: Gr√°ficas de progreso y resultados

### Arquitectura:
Este modelo usa el **mismo Feature Extractor** que CNN2D_DA (arquitectura Ibarra 2023) pero **sin Domain Adaptation**:
- 2 bloques Conv2D ‚Üí BN ‚Üí ReLU ‚Üí MaxPool(3√ó3) ‚Üí Dropout
- Solo cabeza de clasificaci√≥n PD (sin GRL ni cabeza de dominio)

### Data Augmentation:
- Pitch shifting
- Time stretching
- Noise injection
- SpecAugment (m√°scaras de frecuencia/tiempo)
- Factor: ~5x m√°s datos

### Comparaci√≥n:
- **Este notebook**: Modelo CNN2D con augmentation (mejora generalizaci√≥n)
- **cnn_da_training.ipynb**: Modelo CNN2D_DA sin augmentation (paper exacto)
- El augmentation permite entrenar con m√°s datos y mejorar robustez


In [None]:
# ============================================================
# CONFIGURACI√ìN PARA GOOGLE COLAB
# ============================================================
# DESCOMENTA TODO EL BLOQUE SI EJECUTAS EN COLAB

from google.colab import drive
drive.mount("/content/drive")

import os, sys, subprocess

# Configuraci√≥n - AJUSTA ESTOS VALORES SI ES NECESARIO
COMPUTER_NAME = "ZenBook"
PROJECT_DIR = "parkinson-voice-uncertainty"
BRANCH = "feature/feature/firstTraining"

BASE = "/content/drive/Othercomputers"
PROJ = os.path.join(BASE, COMPUTER_NAME, PROJECT_DIR)

# Funci√≥n auxiliar
def sh(*args, check=False):
    print("$", " ".join(args))
    res = subprocess.run(args, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    print(res.stdout)
    if check and res.returncode != 0:
        raise RuntimeError("Command failed")
    return res.returncode

# Verificaciones
assert os.path.isdir(os.path.join(BASE, COMPUTER_NAME)), f"No encuentro {COMPUTER_NAME} en {BASE}"
assert os.path.isdir(PROJ), f"No encuentro el repo en: {PROJ}"

# Agregar al path
if PROJ not in sys.path:
    sys.path.insert(0, PROJ)

# Configurar Git
sh("git", "config", "--global", "--add", "safe.directory", PROJ)
sh("git", "-C", PROJ, "fetch", "--all", "--prune")
sh("git", "-C", PROJ, "branch", "--show-current")

# Cambiar a rama
rc = sh("git", "-C", PROJ, "checkout", BRANCH)
if rc != 0:
    sh("git", "-C", PROJ, "checkout", "-b", BRANCH, f"origin/{BRANCH}")

# Actualizar
sh("git", "-C", PROJ, "pull", "origin", BRANCH)

# Instalar dependencias con manejo de errores mejorado
req = os.path.join(PROJ, "requirements.txt")
if os.path.exists(req):
    os.chdir("/content")
    print("Instalando dependencias...")
    # Instalar dependencias cr√≠ticas primero
    sh("python", "-m", "pip", "install", "-q", "optuna>=3.0.0")
    sh("python", "-m", "pip", "install", "-q", "torch>=1.9.0")
    sh("python", "-m", "pip", "install", "-q", "torchvision>=0.10.0")
    sh("python", "-m", "pip", "install", "-q", "scikit-learn>=1.0.0")
    sh("python", "-m", "pip", "install", "-q", "librosa>=0.8.1")
    sh("python", "-m", "pip", "install", "-q", "soundfile>=0.10.3")
    # Instalar el resto
    sh("python", "-m", "pip", "install", "-q", "-r", req)
    print("Dependencias instaladas correctamente")
else:
    print("‚ö†Ô∏è  No se encontr√≥ requirements.txt, instalando dependencias b√°sicas...")
    sh("python", "-m", "pip", "install", "-q", "optuna>=3.0.0", "torch>=1.9.0", "scikit-learn>=1.0.0")

os.chdir(PROJ)

# Verificar que optuna est√© disponible
try:
    import optuna
    print(f"‚úÖ Optuna {optuna.__version__} disponible")
except ImportError as e:
    print(f"‚ùå Error importando optuna: {e}")
    print("Reinstalando optuna...")
    sh("python", "-m", "pip", "install", "--force-reinstall", "optuna>=3.0.0")

# Autoreload
try:
    get_ipython().run_line_magic("load_ext", "autoreload")
    get_ipython().run_line_magic("autoreload", "2")
    print("Autoreload activo")
except Exception as e:
    print(f"No se activ√≥ autoreload: {e}")

print(f"Repo listo en: {PROJ}")
sh("git", "-C", PROJ, "branch", "--show-current")


Mounted at /content/drive
$ git config --global --add safe.directory /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty

$ git -C /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty fetch --all --prune
Fetching origin

$ git -C /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty branch --show-current
feature/feature/firstTraining

$ git -C /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty checkout feature/feature/firstTraining


In [None]:
# ============================================================
# VERIFICACI√ìN DE DEPENDENCIAS CR√çTICAS
# ============================================================

print("="*70)
print("VERIFICANDO DEPENDENCIAS CR√çTICAS")
print("="*70)

# Verificar imports cr√≠ticos
critical_imports = [
    ("torch", "PyTorch"),
    ("optuna", "Optuna"),
    ("sklearn", "Scikit-learn"),
    ("numpy", "NumPy"),
    ("pandas", "Pandas"),
    ("matplotlib", "Matplotlib"),
    ("librosa", "Librosa"),
    ("soundfile", "SoundFile")
]

missing_imports = []
for module, name in critical_imports:
    try:
        __import__(module)
        print(f"‚úÖ {name}: OK")
    except ImportError as e:
        print(f"‚ùå {name}: FALTA - {e}")
        missing_imports.append((module, name))

if missing_imports:
    print(f"\n‚ö†Ô∏è  Faltan {len(missing_imports)} dependencias cr√≠ticas:")
    for module, name in missing_imports:
        print(f"   - {name} ({module})")
    print("\nüîß Reinstalando dependencias faltantes...")
    
    # Reinstalar dependencias faltantes
    for module, name in missing_imports:
        if module == "torch":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "torch>=1.9.0"], check=False)
        elif module == "optuna":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "optuna>=3.0.0"], check=False)
        elif module == "sklearn":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "scikit-learn>=1.0.0"], check=False)
        elif module == "librosa":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "librosa>=0.8.1"], check=False)
        elif module == "soundfile":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "soundfile>=0.10.3"], check=False)
        else:
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", module], check=False)
    
    print("‚úÖ Reinstalaci√≥n completada")
else:
    print("\nüéâ Todas las dependencias cr√≠ticas est√°n disponibles")

print("="*70)


In [None]:
# ============================================================
# VERIFICACI√ìN DE M√ìDULOS PROPIOS
# ============================================================

print("="*70)
print("VERIFICANDO M√ìDULOS PROPIOS")
print("="*70)

# Verificar que los m√≥dulos propios est√©n disponibles
try:
    from modules.core.cnn2d_optuna_wrapper import CNN2DOptunaWrapper, optimize_cnn2d
    print("‚úÖ CNN2D Optuna Wrapper: OK")
except ImportError as e:
    print(f"‚ùå CNN2D Optuna Wrapper: ERROR - {e}")
    print("Verificando estructura del proyecto...")
    import os
    print(f"Directorio actual: {os.getcwd()}")
    print(f"Contenido: {os.listdir('.')}")
    if os.path.exists('modules'):
        print("‚úÖ Carpeta modules encontrada")
        print(f"Contenido de modules: {os.listdir('modules')}")
        if os.path.exists('modules/core'):
            print("‚úÖ Carpeta modules/core encontrada")
            print(f"Contenido de modules/core: {os.listdir('modules/core')}")
        else:
            print("‚ùå Carpeta modules/core no encontrada")
    else:
        print("‚ùå Carpeta modules no encontrada")

try:
    from modules.core.optuna_optimization import OptunaOptimizer
    print("‚úÖ Optuna Optimizer: OK")
except ImportError as e:
    print(f"‚ùå Optuna Optimizer: ERROR - {e}")

try:
    from modules.models.cnn2d.model import CNN2D
    print("‚úÖ CNN2D Model: OK")
except ImportError as e:
    print(f"‚ùå CNN2D Model: ERROR - {e}")

# Verificar que optuna est√© funcionando correctamente
try:
    import optuna
    print(f"‚úÖ Optuna {optuna.__version__}: OK")
    
    # Crear un trial de prueba
    study = optuna.create_study(direction='maximize')
    trial = study.ask()
    print("‚úÖ Optuna Trial: OK")
    
except Exception as e:
    print(f"‚ùå Optuna: ERROR - {e}")

print("="*70)


In [None]:
# ============================================================
# SOLUCI√ìN DE RESPALDO PARA PROBLEMAS DE IMPORTACI√ìN
# ============================================================

print("="*70)
print("IMPLEMENTANDO SOLUCI√ìN DE RESPALDO")
print("="*70)

# Si hay problemas con los m√≥dulos, implementar una versi√≥n simplificada
try:
    from modules.core.cnn2d_optuna_wrapper import optimize_cnn2d
    print("‚úÖ M√≥dulo principal disponible")
    USE_BACKUP = False
except ImportError as e:
    print(f"‚ö†Ô∏è  Problema con m√≥dulo principal: {e}")
    print("üîß Implementando soluci√≥n de respaldo...")
    USE_BACKUP = True
    
    # Implementar una versi√≥n simplificada de optimize_cnn2d
    def optimize_cnn2d_backup(X_train, y_train, X_val, y_val, input_shape, n_trials=30, n_epochs_per_trial=20, device="cpu", save_dir=None):
        """
        Versi√≥n de respaldo de optimize_cnn2d que funciona sin m√≥dulos externos.
        """
        import optuna
        import torch
        import torch.nn as nn
        import torch.optim as optim
        from torch.utils.data import DataLoader, TensorDataset
        from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score
        import pandas as pd
        import json
        from pathlib import Path
        
        # Importar CNN2D directamente
        from modules.models.cnn2d.model import CNN2D
        
        def objective(trial):
            # Hiperpar√°metros
            filters_1 = trial.suggest_categorical("filters_1", [16, 32, 64])
            filters_2 = trial.suggest_categorical("filters_2", [32, 64, 128])
            kernel_size_1 = trial.suggest_categorical("kernel_size_1", [3, 5])
            kernel_size_2 = trial.suggest_categorical("kernel_size_2", [3, 5])
            p_drop_conv = trial.suggest_float("p_drop_conv", 0.2, 0.5)
            p_drop_fc = trial.suggest_float("p_drop_fc", 0.3, 0.6)
            dense_units = trial.suggest_categorical("dense_units", [32, 64, 128])
            learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
            weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-3, log=True)
            optimizer_name = trial.suggest_categorical("optimizer", ["adam", "sgd"])
            batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
            
            # Crear modelo
            model = CNN2D(
                input_shape=input_shape[1:],  # (H, W) sin el canal
                filters_1=filters_1,
                filters_2=filters_2,
                kernel_size_1=kernel_size_1,
                kernel_size_2=kernel_size_2,
                p_drop_conv=p_drop_conv,
                p_drop_fc=p_drop_fc,
                dense_units=dense_units,
            ).to(device)
            
            # Crear DataLoaders
            train_dataset = TensorDataset(X_train, y_train)
            val_dataset = TensorDataset(X_val, y_val)
            train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
            val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
            
            # Optimizador
            if optimizer_name == "adam":
                optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
            else:  # sgd
                optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay, momentum=0.9)
            
            criterion = nn.CrossEntropyLoss()
            
            # Entrenamiento
            best_f1 = 0.0
            for epoch in range(n_epochs_per_trial):
                # Training
                model.train()
                for batch_x, batch_y in train_loader:
                    batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                    optimizer.zero_grad()
                    outputs = model(batch_x)
                    loss = criterion(outputs, batch_y)
                    loss.backward()
                    optimizer.step()
                
                # Validation
                model.eval()
                val_preds = []
                val_labels = []
                with torch.no_grad():
                    for batch_x, batch_y in val_loader:
                        batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                        outputs = model(batch_x)
                        _, predicted = torch.max(outputs.data, 1)
                        val_preds.extend(predicted.cpu().numpy())
                        val_labels.extend(batch_y.cpu().numpy())
                
                f1 = f1_score(val_labels, val_preds, average="macro")
                if f1 > best_f1:
                    best_f1 = f1
                
                # Reportar a Optuna
                trial.report(f1, epoch)
                if trial.should_prune():
                    raise optuna.TrialPruned()
            
            return best_f1
        
        # Crear estudio
        study = optuna.create_study(direction='maximize', pruner=optuna.pruners.MedianPruner())
        study.optimize(objective, n_trials=n_trials)
        
        # Preparar resultados
        results_df = pd.DataFrame([
            {
                'trial': trial.number,
                'value': trial.value,
                'params': trial.params,
                'f1': trial.value,
                'accuracy': 0.0,  # Placeholder
                'precision': 0.0,  # Placeholder
                'recall': 0.0,  # Placeholder
                **trial.params
            }
            for trial in study.trials
        ])
        
        results = {
            "best_params": study.best_params,
            "best_value": study.best_value,
            "best_trial": study.best_trial.number,
            "results_df": results_df,
            "analysis": {"best_trial": study.best_trial}
        }
        
        # Guardar si se especific√≥ directorio
        if save_dir:
            save_path = Path(save_dir)
            save_path.mkdir(parents=True, exist_ok=True)
            results_df.to_csv(save_path / "optuna_trials_results.csv", index=False)
            with open(save_path / "best_params.json", 'w') as f:
                json.dump(study.best_params, f, indent=2)
        
        return results
    
    # Reemplazar la funci√≥n original
    optimize_cnn2d = optimize_cnn2d_backup
    print("‚úÖ Soluci√≥n de respaldo implementada")

print("="*70)


In [None]:
# ============================================================
# PRUEBA R√ÅPIDA DE OPTUNA (1 TRIAL)
# ============================================================

print("="*70)
print("PRUEBA R√ÅPIDA DE OPTUNA - 1 TRIAL")
print("="*70)

# Importar dependencias necesarias
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score
import pandas as pd
import json
from pathlib import Path

# Configurar dispositivo
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"üîß Dispositivo: {device}")

# Crear datos de prueba peque√±os para verificar que todo funciona
print("üîß Creando datos de prueba...")

# Datos de prueba (muy peque√±os para verificaci√≥n r√°pida)
X_test_small = torch.randn(20, 1, 65, 41)  # 20 muestras
y_test_small = torch.randint(0, 2, (20,))  # Labels aleatorios

# Split de prueba
X_train_test = X_test_small[:15]
y_train_test = y_test_small[:15]
X_val_test = X_test_small[15:]
y_val_test = y_test_small[15:]

print(f"‚úÖ Datos de prueba creados:")
print(f"   - Train: {X_train_test.shape} (labels: {y_train_test.shape})")
print(f"   - Val:   {X_val_test.shape} (labels: {y_val_test.shape})")

# Probar la funci√≥n optimize_cnn2d con 1 trial
print(f"\nüß™ Probando optimize_cnn2d con 1 trial...")
print("   (Esto deber√≠a tomar menos de 1 minuto)")

try:
    # Ejecutar prueba con 1 trial y 2 √©pocas
    test_results = optimize_cnn2d(
        X_train=X_train_test,
        y_train=y_train_test,
        X_val=X_val_test,
        y_val=y_val_test,
        input_shape=(1, 65, 41),
        n_trials=1,  # Solo 1 trial
        n_epochs_per_trial=2,  # Solo 2 √©pocas
        device=device,
        save_dir=None  # No guardar
    )
    
    print("‚úÖ PRUEBA EXITOSA!")
    print(f"   - Mejor F1: {test_results['best_value']:.4f}")
    print(f"   - Mejores params: {test_results['best_params']}")
    print("   - La funci√≥n optimize_cnn2d funciona correctamente")
    
except Exception as e:
    print(f"‚ùå ERROR EN LA PRUEBA:")
    print(f"   - Error: {e}")
    print(f"   - Tipo: {type(e).__name__}")
    import traceback
    print(f"   - Traceback completo:")
    traceback.print_exc()
    
    print(f"\nüîß Soluciones posibles:")
    print("   1. Verificar que optuna est√© instalado correctamente")
    print("   2. Verificar que los m√≥dulos propios est√©n disponibles")
    print("   3. Revisar la configuraci√≥n del entorno")

print("="*70)


In [None]:
# ============================================================
# PRUEBA ALTERNATIVA - VERIFICAR IMPORTS Y FUNCIONES
# ============================================================

print("="*70)
print("VERIFICACI√ìN DETALLADA DE IMPORTS Y FUNCIONES")
print("="*70)

# Verificar que todas las importaciones funcionen
print("üîç Verificando importaciones...")

try:
    import optuna
    print(f"‚úÖ optuna {optuna.__version__}")
except ImportError as e:
    print(f"‚ùå optuna: {e}")

try:
    import torch
    print(f"‚úÖ torch {torch.__version__}")
except ImportError as e:
    print(f"‚ùå torch: {e}")

try:
    from modules.models.cnn2d.model import CNN2D
    print("‚úÖ CNN2D model")
except ImportError as e:
    print(f"‚ùå CNN2D model: {e}")

# Verificar que la funci√≥n optimize_cnn2d est√© disponible
print(f"\nüîç Verificando funci√≥n optimize_cnn2d...")
try:
    # Verificar que la funci√≥n est√© definida
    if 'optimize_cnn2d' in globals():
        print("‚úÖ optimize_cnn2d est√° disponible")
        print(f"   - Tipo: {type(optimize_cnn2d)}")
    else:
        print("‚ùå optimize_cnn2d no est√° disponible")
except Exception as e:
    print(f"‚ùå Error verificando optimize_cnn2d: {e}")

# Verificar configuraci√≥n de Optuna
print(f"\nüîç Verificando configuraci√≥n de Optuna...")
try:
    # Crear un estudio de prueba
    study = optuna.create_study(direction='maximize')
    print("‚úÖ Optuna study creado correctamente")
    
    # Crear un trial de prueba
    trial = study.ask()
    print("‚úÖ Optuna trial creado correctamente")
    
    # Verificar que TrialPruned est√© disponible
    if hasattr(optuna, 'TrialPruned'):
        print("‚úÖ optuna.TrialPruned disponible")
    else:
        print("‚ùå optuna.TrialPruned no disponible")
        
except Exception as e:
    print(f"‚ùå Error con Optuna: {e}")

print("="*70)


In [None]:
# ============================================================
# PRUEBA FINAL - OPTUNA CON DATOS M√çNIMOS
# ============================================================

print("="*70)
print("PRUEBA FINAL - OPTUNA CON DATOS M√çNIMOS")
print("="*70)

# Crear datos m√≠nimos para prueba
print("üîß Creando datos m√≠nimos...")
X_mini = torch.randn(10, 1, 65, 41)  # 10 muestras
y_mini = torch.randint(0, 2, (10,))  # Labels aleatorios

# Split m√≠nimo
X_train_mini = X_mini[:7]
y_train_mini = y_mini[:7]
X_val_mini = X_mini[7:]
y_val_mini = y_mini[7:]

print(f"‚úÖ Datos m√≠nimos creados:")
print(f"   - Train: {X_train_mini.shape}")
print(f"   - Val:   {X_val_mini.shape}")

# Funci√≥n de prueba simplificada
def test_optuna_simple():
    """Prueba muy simple de Optuna"""
    import optuna
    
    def objective(trial):
        # Hiperpar√°metros simples
        lr = trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
        batch_size = trial.suggest_categorical("batch_size", [2, 4])
        
        # Crear modelo simple
        from modules.models.cnn2d.model import CNN2D
        model = CNN2D(
            input_shape=(65, 41),
            filters_1=16,
            filters_2=32,
            kernel_size_1=3,
            kernel_size_2=3,
            p_drop_conv=0.2,
            p_drop_fc=0.3,
            dense_units=32,
        ).to(device)
        
        # DataLoader simple
        train_dataset = TensorDataset(X_train_mini, y_train_mini)
        val_dataset = TensorDataset(X_val_mini, y_val_mini)
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
        
        # Optimizador
        optimizer = optim.Adam(model.parameters(), lr=lr)
        criterion = nn.CrossEntropyLoss()
        
        # Entrenamiento m√≠nimo (1 √©poca)
        model.train()
        for batch_x, batch_y in train_loader:
            batch_x, batch_y = batch_x.to(device), batch_y.to(device)
            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()
        
        # Validaci√≥n
        model.eval()
        val_preds = []
        val_labels = []
        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                outputs = model(batch_x)
                _, predicted = torch.max(outputs.data, 1)
                val_preds.extend(predicted.cpu().numpy())
                val_labels.extend(batch_y.cpu().numpy())
        
        # M√©trica simple
        f1 = f1_score(val_labels, val_preds, average="macro")
        return f1
    
    # Crear estudio
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=1)
    
    return study.best_value, study.best_params

# Ejecutar prueba
print(f"\nüß™ Ejecutando prueba simple...")
try:
    best_value, best_params = test_optuna_simple()
    print("‚úÖ PRUEBA EXITOSA!")
    print(f"   - Mejor F1: {best_value:.4f}")
    print(f"   - Mejores params: {best_params}")
    print("   - Optuna funciona correctamente")
    
except Exception as e:
    print(f"‚ùå ERROR EN LA PRUEBA:")
    print(f"   - Error: {e}")
    print(f"   - Tipo: {type(e).__name__}")
    import traceback
    print(f"   - Traceback:")
    traceback.print_exc()

print("="*70)


In [None]:
# ============================================================
# CONFIGURAR ENTORNO Y DEPENDENCIAS
# ============================================================

import sys
from pathlib import Path

# Agregar el directorio ra√≠z del proyecto al path
# El notebook est√° en research/, pero modules/ est√° en el directorio ra√≠z
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# Importar el gestor de dependencias centralizado
from modules.core.dependency_manager import setup_notebook_environment

# Configurar el entorno autom√°ticamente
# Esto verifica e instala todas las dependencias necesarias
success = setup_notebook_environment(auto_install=True, verbose=True)

if not success:
    print("Error configurando el entorno")
    print("Intenta instalar manualmente: pip install -r requirements.txt")
    import sys
    sys.exit(1)

print("="*70)


In [None]:
# ============================================================
# CONFIGURACI√ìN COMPLETA DEL EXPERIMENTO (PAPER IBARRA 2023)
# ============================================================

print("="*70)
print("CONFIGURACI√ìN DEL EXPERIMENTO - PAPER IBARRA 2023")
print("="*70)

# ============================================================
# CONFIGURACI√ìN DEL OPTIMIZADOR (SGD como en el paper)
# ============================================================
OPTIMIZER_CONFIG = {
    "type": "SGD",
    "learning_rate": 0.1,
    "momentum": 0.9,
    "weight_decay": 1e-4,  # Cambiado de 0.0 a 1e-4 para regularizaci√≥n
    "nesterov": True  # Agregado Nesterov momentum para mejor convergencia
}

# ============================================================
# CONFIGURACI√ìN DEL SCHEDULER (StepLR como en el paper)
# ============================================================
SCHEDULER_CONFIG = {
    "type": "StepLR",
    "step_size": 10,
    "gamma": 0.1
}

# ============================================================
# CONFIGURACI√ìN DEL K-FOLD CROSS-VALIDATION
# ============================================================
KFOLD_CONFIG = {
    "n_splits": 10,
    "shuffle": True,
    "random_state": 42,
    "stratify_by_speaker": True
}

# ============================================================
# CONFIGURACI√ìN DE CLASS WEIGHTS (para balancear clases)
# ============================================================
CLASS_WEIGHTS_CONFIG = {
    "enabled": True,
    "method": "inverse_frequency"  # 1/frequency
}

# ============================================================
# CONFIGURACI√ìN DE FILTRADO DE VOCAL /a/
# ============================================================
VOCAL_FILTER_CONFIG = {
    "enabled": True,
    "target_vocal": "a",  # Solo vocal /a/ como en el paper
    "filter_healthy": True,
    "filter_parkinson": True
}

# ============================================================
# CONFIGURACI√ìN DE ENTRENAMIENTO
# ============================================================
TRAINING_CONFIG = {
    "n_epochs": 100,
    "early_stopping_patience": 10,  # Reducido de 15 a 10 para evitar overfitting
    "batch_size": 32,
    "num_workers": 0,
    "save_best_model": True
}

# ============================================================
# CONFIGURACI√ìN DE OPTUNA (OPTIMIZACI√ìN DE HIPERPAR√ÅMETROS)
# ============================================================
# Optuna reemplaza a Optuna - m√°s moderno, sin problemas de instalaci√≥n
OPTUNA_CONFIG = {
    "enabled": True,
    "experiment_name": "cnn2d_optuna_optimization",
    "n_trials": 30,  # N√∫mero de configuraciones a probar
    "n_epochs_per_trial": 20,  # √âpocas por configuraci√≥n
    "metric": "f1",  # M√©trica a optimizar
    "direction": "maximize"  # maximize o minimize
}

# ============================================================
# CONFIGURACI√ìN DE DATOS
# ============================================================
DATA_CONFIG = {
    "test_size": 0.15,
    "val_size": 0.15,
    "random_state": 42,
    "stratify": True
}


In [None]:
# ============================================================
# DETECTAR ENTORNO Y CONFIGURAR RUTAS
# ============================================================

# Este import funciona desde cualquier subdirectorio del proyecto
import sys
from pathlib import Path

# Buscar y agregar la ra√≠z del proyecto al path
current_dir = Path.cwd()
for _ in range(10):
    if (current_dir / "modules").exists():
        if str(current_dir) not in sys.path:
            sys.path.insert(0, str(current_dir))
        break
    current_dir = current_dir.parent

# Importar la funci√≥n de configuraci√≥n de notebooks
from modules.core.notebook_setup import setup_notebook

# Configurar autom√°ticamente: path + entorno (Local/Colab) + rutas
ENV, PATHS = setup_notebook(verbose=True)


## 1. Setup y Configuraci√≥n


In [None]:
# ============================================================
# IMPORTS Y CONFIGURACI√ìN
# ============================================================
import sys
from pathlib import Path
import matplotlib.pyplot as plt
import warnings
import pandas as pd
import json
import numpy as np
warnings.filterwarnings('ignore')

# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split

# Agregar m√≥dulos propios al path
# El notebook est√° en research/, pero modules/ est√° en el directorio ra√≠z
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# Importar m√≥dulos propios
from modules.models.cnn2d.model import CNN2D
from modules.models.common.training_utils import print_model_summary
from modules.models.cnn2d.training import train_model, detailed_evaluation, print_evaluation_report
from modules.models.cnn2d.visualization import plot_training_history, analyze_spectrogram_stats
from modules.models.cnn2d.utils import plot_confusion_matrix
from modules.core.utils import create_10fold_splits_by_speaker
from modules.core.dataset import (
    load_spectrograms_cache,
    to_pytorch_tensors,
    DictDataset,
)


# Imports para Optuna (optimizaci√≥n de hiperpar√°metros - reemplaza Optuna)
from modules.core.cnn2d_optuna_wrapper import optimize_cnn2d, create_cnn2d_optimizer
from modules.core.optuna_optimization import OptunaOptimizer

# Configuraci√≥n de matplotlib
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

# Configuraci√≥n de PyTorch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Reporte de configuraci√≥n
print("="*70)
print("CNN 2D TRAINING - BASELINE CON AUGMENTATION")
print("="*70)
print(f"Librer√≠as cargadas correctamente")
print(f"Dispositivo: {device}")
print(f"PyTorch: {torch.__version__}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Data augmentation: ACTIVADO (~5x datos)")
print("="*70)


## 2. Carga de Datos

Carga de datos preprocesados CON augmentation para mejorar generalizaci√≥n del modelo baseline.


In [None]:
# ============================================================
# CARGAR DATOS HEALTHY DESDE CACHE ORIGINAL
# ============================================================

print("Cargando datos Healthy desde cache original...")
print("="*60)

from modules.core.dataset import load_spectrograms_cache

# Cargar datos healthy desde cache original usando rutas din√°micas
cache_healthy_path = PATHS['cache_original'] / "healthy_ibarra.pkl"
healthy_dataset = load_spectrograms_cache(str(cache_healthy_path))

if healthy_dataset is None:
    raise FileNotFoundError(f"No se encontr√≥ el cache de datos healthy en {cache_healthy_path}")

# Convertir a tensores PyTorch
X_healthy, y_task_healthy, y_domain_healthy, meta_healthy = to_pytorch_tensors(healthy_dataset)

print(f"Healthy cargado exitosamente:")
print(f"   - Espectrogramas: {X_healthy.shape[0]}")
print(f"   - Shape: {X_healthy.shape}")
print(f"   - Ruta: {cache_healthy_path}")


In [None]:
# ============================================================
# CARGAR DATOS PARKINSON DESDE CACHE ORIGINAL
# ============================================================

print("Cargando datos Parkinson desde cache original...")
print("="*60)

# Cargar datos parkinson desde cache original usando rutas din√°micas
cache_parkinson_path = PATHS['cache_original'] / "parkinson_ibarra.pkl"
parkinson_dataset = load_spectrograms_cache(str(cache_parkinson_path))

if parkinson_dataset is None:
    raise FileNotFoundError(f"No se encontr√≥ el cache de datos parkinson en {cache_parkinson_path}")

# Convertir a tensores PyTorch
X_parkinson, y_task_parkinson, y_domain_parkinson, meta_parkinson = to_pytorch_tensors(parkinson_dataset)

print(f"Parkinson cargado exitosamente:")
print(f"   - Espectrogramas: {X_parkinson.shape[0]}")
print(f"   - Shape: {X_parkinson.shape}")
print(f"   - Ruta: {cache_parkinson_path}")


In [None]:
# ============================================================
# INFORMACI√ìN DE DATOS CARGADOS
# ============================================================

print("="*70)
print("INFORMACI√ìN DE DATOS CARGADOS")
print("="*70)

print(f"Datos Healthy (desde cache original):")
print(f"   - Muestras: {len(healthy_dataset)}")
print(f"   - Shape de espectrogramas: {X_healthy.shape}")

print("="*70)


In [None]:
# ============================================================
# AN√ÅLISIS ESTAD√çSTICO B√ÅSICO
# ============================================================

print("="*70)
print("AN√ÅLISIS ESTAD√çSTICO B√ÅSICO")
print("="*70)

# An√°lisis estad√≠stico b√°sico
healthy_stats = analyze_spectrogram_stats(healthy_dataset, "HEALTHY")
parkinson_stats = analyze_spectrogram_stats(parkinson_dataset, "PARKINSON")

# Comparar diferencias
print(f"\nDIFERENCIAS ENTRE CLASES:")
print(f"   - Diferencia en media: {abs(healthy_stats['mean'] - parkinson_stats['mean']):.3f}")
print(f"   - Diferencia en std: {abs(healthy_stats['std'] - parkinson_stats['std']):.3f}")

print("\nConfiguraci√≥n del experimento:")
print("   - Healthy: datos originales (baseline)")
print("   - Parkinson: datos con augmentation (mejor generalizaci√≥n)")
print("="*70)


In [None]:
# ============================================================
# COMBINAR DATASETS
# ============================================================

print("="*70)
print("COMBINANDO DATASETS")
print("="*70)

# Combinar espectrogramas
X_combined = torch.cat([X_healthy, X_parkinson], dim=0)

# Crear labels: 0=Healthy, 1=Parkinson
y_combined = torch.cat([
    torch.zeros(len(X_healthy), dtype=torch.long),  # Healthy = 0
    torch.ones(len(X_parkinson), dtype=torch.long)  # Parkinson = 1
], dim=0)

print(f"\nDATASET COMBINADO:")
print(f"   - Total muestras: {len(X_combined)}")
print(f"   - Shape: {X_combined.shape}")
print(f"   - Healthy (0): {(y_combined == 0).sum().item()} ({(y_combined == 0).sum()/len(y_combined)*100:.1f}%)")
print(f"   - Parkinson (1): {(y_combined == 1).sum().item()} ({(y_combined == 1).sum()/len(y_combined)*100:.1f}%)")

balance_pct = (y_combined == 1).sum() / len(y_combined) * 100
if abs(balance_pct - 50) < 10:
    print(f"   ‚úì Dataset razonablemente balanceado")
else:
    print(f"   ‚ö† Dataset desbalanceado - class weights habilitados en config")

print("="*70)


In [None]:
# ============================================================
# INSPECCIONAR METADATOS PARA SPEAKER IDS
# ============================================================

print("="*70)
print("VERIFICANDO METADATOS")
print("="*70)

# Verificar estructura de metadatos
if meta_healthy and len(meta_healthy) > 0:
    print(f"\n‚úì meta_healthy disponible: {len(meta_healthy)} muestras")
    print(f"  Ejemplo de metadata[0]:")
    sample_meta = meta_healthy[0]
    if isinstance(sample_meta, dict):
        for key, value in list(sample_meta.items())[:5]:
            print(f"    - {key}: {value}")
    else:
        print(f"    Tipo: {type(sample_meta)}")
        print(f"    Valor: {sample_meta}")
else:
    print("  ‚úó meta_healthy no disponible o vac√≠o")

if meta_parkinson and len(meta_parkinson) > 0:
    print(f"\n‚úì meta_parkinson disponible: {len(meta_parkinson)} muestras")
    print(f"  Ejemplo de metadata[0]:")
    sample_meta = meta_parkinson[0]
    if isinstance(sample_meta, dict):
        for key, value in list(sample_meta.items())[:5]:
            print(f"    - {key}: {value}")
    else:
        print(f"    Tipo: {type(sample_meta)}")
        print(f"    Valor: {sample_meta}")
else:
    print("  ‚úó meta_parkinson no disponible o vac√≠o")

print("="*70)


## 3. Split Train/Val/Test

Split estratificado 70/15/15 para mantener proporciones de clases.


In [None]:
# ============================================================
# 10-FOLD CROSS-VALIDATION ESTRATIFICADO POR HABLANTE
# ============================================================

print("="*70)
print("10-FOLD CROSS-VALIDATION (PAPER IBARRA 2023)")
print("="*70)

# Preparar metadata combinada para create_10fold_splits_by_speaker
# La metadata ya fue cargada antes con meta_healthy y meta_parkinson

# Crear lista de metadata combinada con labels
metadata_combined = []

# Agregar metadata de healthy (label=0)
for meta in meta_healthy:
    metadata_combined.append({
        "subject_id": meta.subject_id,
        "label": 0,  # Healthy
        "filename": meta.filename
    })

# Agregar metadata de parkinson (label=1)
for meta in meta_parkinson:
    metadata_combined.append({
        "subject_id": meta.subject_id,
        "label": 1,  # Parkinson
        "filename": meta.filename
    })

print(f"\nüìä Dataset info:")
print(f"   ‚Ä¢ Total samples: {len(X_combined)}")
print(f"   ‚Ä¢ Metadata entries: {len(metadata_combined)}")

# Crear 10-fold splits usando la funci√≥n centralizada
# Esta funci√≥n asegura que todos los samples de un speaker est√°n en el mismo fold
fold_splits = create_10fold_splits_by_speaker(
    metadata_list=metadata_combined,
    n_folds=KFOLD_CONFIG["n_splits"],
    seed=KFOLD_CONFIG["random_state"]
)

# Para este notebook, usaremos el primer fold como ejemplo
# En el paper real se promedian los resultados de los 10 folds
train_indices = fold_splits[0]["train"]
val_indices = fold_splits[0]["val"]

# Crear splits de train/val usando los √≠ndices
X_train = X_combined[train_indices]
y_train = y_combined[train_indices]
X_val = X_combined[val_indices]
y_val = y_combined[val_indices]

# Para test, usamos un split separado del 15%
# TODO: Esto deber√≠a tambi√©n usar split por speaker para evitar leakage
X_train_val, X_test, y_train_val, y_test = train_test_split(
    X_combined, y_combined,
    test_size=0.15,
    random_state=42,
    stratify=y_combined
)

print(f"\nTAMA√ëOS DE SPLITS:")
print(f"   - Train: {len(X_train)} ({len(X_train)/len(X_combined)*100:.1f}%)")
print(f"   - Val:   {len(X_val)} ({len(X_val)/len(X_combined)*100:.1f}%)")
print(f"   - Test:  {len(X_test)} ({len(X_test)/len(X_combined)*100:.1f}%)")

print(f"\nDISTRIBUCI√ìN POR SPLIT:")
for split_name, y_split in [("Train", y_train), ("Val", y_val), ("Test", y_test)]:
    n_healthy = (y_split == 0).sum().item()
    n_parkinson = (y_split == 1).sum().item()
    print(f"   {split_name:5s}: HC={n_healthy:4d} ({n_healthy/len(y_split)*100:.1f}%), PD={n_parkinson:4d} ({n_parkinson/len(y_split)*100:.1f}%)")

print("="*70)


In [None]:
# Agregar m√≥dulos propios al path
# El notebook est√° en research/, pero modules/ est√° en el directorio ra√≠z
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# ============================================================
# CREAR DATALOADERS
# ============================================================

print("\nüì¶ CREANDO DATALOADERS...")

BATCH_SIZE = 32

# Importar DictDataset desde el m√≥dulo core

# Crear datasets con formato de diccionario
train_dataset = DictDataset(X_train, y_train)
val_dataset = DictDataset(X_val, y_val)
test_dataset = DictDataset(X_test, y_test)

# Crear DataLoaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE*2, shuffle=False, num_workers=0)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE*2, shuffle=False, num_workers=0)

print(f"‚úÖ DataLoaders creados:")
print(f"   ‚Ä¢ Train batches: {len(train_loader)}")
print(f"   ‚Ä¢ Val batches:   {len(val_loader)}")
print(f"   ‚Ä¢ Test batches:  {len(test_loader)}")
print(f"   ‚Ä¢ Batch size:    {BATCH_SIZE}")


## 4. Optimizaci√≥n de Hiperpar√°metros con Optuna

Optimizaci√≥n autom√°tica de hiperpar√°metros usando Optuna para encontrar la mejor configuraci√≥n del modelo CNN2D.

### Configuraci√≥n:
- **M√©todo**: Optuna con b√∫squeda aleatoria
- **Configuraciones**: 30% de todas las combinaciones posibles
- **√âpocas por config**: 20 √©pocas (b√∫squeda r√°pida)
- **M√©trica**: F1-score en validaci√≥n
- **Espacio de b√∫squeda**: Seg√∫n tabla del paper


In [None]:
# ============================================================
# CONFIGURAR OPTIMIZACI√ìN CON OPTUNA
# ============================================================

print("="*70)
print("CONFIGURANDO OPTIMIZACI√ìN CON OPTUNA")
print("="*70)

# Crear directorio para resultados de Optuna usando rutas din√°micas
optuna_results_dir = PATHS['results'] / "cnn_optuna_optimization"
optuna_results_dir.mkdir(parents=True, exist_ok=True)

print(f"M√≥dulos de Optuna importados")
print(f"Directorio de resultados: {optuna_results_dir}")
print(f"Trials a ejecutar: {OPTUNA_CONFIG['n_trials']}")
print("="*70)


In [None]:
# ============================================================
# PREPARAR DATOS PARA OPTUNA
# ============================================================

print("="*70)
print("PREPARANDO DATOS PARA OPTUNA")
print("="*70)

# Optuna trabaja directamente con PyTorch tensors (no requiere numpy)
# Los tensors ya est√°n listos desde la carga de datos

print(f"üìä Datos preparados para Optuna:")
print(f"   - Train: {X_train.shape} (labels: {y_train.shape})")
print(f"   - Val:   {X_val.shape} (labels: {y_val.shape})")
print(f"   - Test:  {X_test.shape} (labels: {y_test.shape})")

# Verificar distribuci√≥n de clases
print(f"\nüìà Distribuci√≥n de clases:")
print(f"   Train - HC: {(y_train == 0).sum().item()}, PD: {(y_train == 1).sum().item()}")
print(f"   Val   - HC: {(y_val == 0).sum().item()}, PD: {(y_val == 1).sum().item()}")

print("="*70)


In [None]:
# ============================================================
# VERIFICAR SI YA EXISTEN RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("VERIFICANDO RESULTADOS PREVIOS DE OPTUNA")
print("="*70)

# Configuraci√≥n de la optimizaci√≥n usando configuraci√≥n centralizada
# (OPTUNA_CONFIG ya est√° definido en la configuraci√≥n centralizada)

# Verificar si ya existen resultados previos
results_csv_path = optuna_results_dir / "optuna_trials_results.csv"
best_params_path = optuna_results_dir / "best_params.json"

if results_csv_path.exists() and best_params_path.exists():
    print("‚úÖ Se encontraron resultados previos de Optuna")
    print(f"   - Archivo de resultados: {results_csv_path}")
    print(f"   - Archivo de mejores par√°metros: {best_params_path}")

    # Cargar resultados previos
    results_df = pd.read_csv(results_csv_path)
    with open(best_params_path, 'r') as f:
        best_params = json.load(f)

    print(f"\nüìä Resultados previos encontrados:")
    print(f"   - Total trials evaluados: {len(results_df)}")
    print(f"   - Mejor F1-score encontrado: {results_df['value'].max():.4f}")
    print(f"   - F1-score promedio: {results_df['value'].mean():.4f} ¬± {results_df['value'].std():.4f}")

    print(f"\nüèÜ Mejores hiperpar√°metros encontrados:")
    for param, value in best_params.items():
        print(f"   - {param}: {value}")

    # Crear diccionario de resultados para compatibilidad
    optuna_results = {
        "results_df": results_df,
        "best_params": best_params,
        "study": None  # El study se carga separadamente si es necesario
    }

    print(f"\n‚è≠Ô∏è  Saltando optimizaci√≥n - usando resultados previos")
    print("="*70)

else:
    print("‚ùå No se encontraron resultados previos de Optuna")
    print("   - Iniciando optimizaci√≥n desde cero")

    print(f"\n‚öôÔ∏è  Configuraci√≥n:")
    print(f"   - Trials a ejecutar: {OPTUNA_CONFIG['n_trials']}")
    print(f"   - √âpocas por trial: {OPTUNA_CONFIG['n_epochs_per_trial']}")
    print(f"   - M√©trica a optimizar: {OPTUNA_CONFIG['metric']} ({OPTUNA_CONFIG['direction']})")

    print(f"\nüöÄ Iniciando b√∫squeda de hiperpar√°metros con Optuna...")
    print("   (Esto puede tomar varios minutos)")

    # Ejecutar optimizaci√≥n
    optuna_results = optimize_cnn2d(
        X_train=X_train,
        y_train=y_train,
        X_val=X_val,
        y_val=y_val,
        input_shape=(1, 65, 41),  # (C, H, W)
        n_trials=OPTUNA_CONFIG["n_trials"],
        n_epochs_per_trial=OPTUNA_CONFIG["n_epochs_per_trial"],
        device=device,
        save_dir=str(optuna_results_dir)
    )

    print("="*70)
    print("OPTIMIZACI√ìN COMPLETADA")
    print("="*70)


In [None]:
# ============================================================
# AN√ÅLISIS DE RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("AN√ÅLISIS DE RESULTADOS")
print("="*70)

# Extraer resultados
results_df = optuna_results["results_df"]
best_params = optuna_results["best_params"]
analysis = optuna_results["analysis"]

print(f"üìä Resumen de la optimizaci√≥n:")
print(f"   - Total configuraciones evaluadas: {len(results_df)}")
print(f"   - Mejor F1-score encontrado: {results_df['f1'].max():.4f}")
print(f"   - F1-score promedio: {results_df['f1'].mean():.4f} ¬± {results_df['f1'].std():.4f}")

print(f"\nüèÜ Mejores hiperpar√°metros encontrados:")
for param, value in best_params.items():
    if param not in ['f1', 'accuracy', 'precision', 'recall', 'val_loss', 'train_loss']:
        print(f"   - {param}: {value}")

# Mostrar top 10 configuraciones
print(f"\nüìà Top 10 configuraciones:")
print("-" * 80)
top_10 = results_df.nlargest(10, 'f1')
for i, (idx, row) in enumerate(top_10.iterrows(), 1):
    print(f"{i:2d}. F1: {row['f1']:.4f} | "
          f"Acc: {row['accuracy']:.4f} | "
          f"Batch: {row['batch_size']} | "
          f"LR: {row['learning_rate']} | "
          f"Dropout: {row['p_drop_conv']}")

print("="*70)


In [None]:
# ============================================================
# GUARDAR RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("GUARDANDO RESULTADOS DE OPTUNA")
print("="*70)

# Guardar DataFrame completo con todas las configuraciones
results_csv_path = optuna_results_dir / "optuna_scan_results.csv"
results_df.to_csv(results_csv_path, index=False)
print(f"üíæ Resultados completos guardados: {results_csv_path}")

# Guardar mejores par√°metros
best_params_path = optuna_results_dir / "best_params.json"
with open(best_params_path, 'w') as f:
    json.dump(best_params, f, indent=2)
print(f"üíæ Mejores par√°metros guardados: {best_params_path}")

# Guardar resumen de optimizaci√≥n
summary_path = optuna_results_dir / "optimization_summary.txt"
with open(summary_path, 'w') as f:
    f.write("RESUMEN DE OPTIMIZACI√ìN OPTUNA\n")
    f.write("="*50 + "\n\n")
    f.write(f"Total configuraciones evaluadas: {len(results_df)}\n")
    f.write(f"Mejor F1-score: {results_df['f1'].max():.4f}\n")
    f.write(f"F1-score promedio: {results_df['f1'].mean():.4f} ¬± {results_df['f1'].std():.4f}\n\n")
    f.write("MEJORES HIPERPAR√ÅMETROS:\n")
    f.write("-"*30 + "\n")
    for param, value in best_params.items():
        if param not in ['f1', 'accuracy', 'precision', 'recall', 'val_loss', 'train_loss']:
            f.write(f"{param}: {value}\n")
    f.write("\nTOP 5 CONFIGURACIONES:\n")
    f.write("-"*30 + "\n")
    top_5 = results_df.nlargest(5, 'f1')
    for i, (idx, row) in enumerate(top_5.iterrows(), 1):
        f.write(f"{i}. F1: {row['f1']:.4f} | Acc: {row['accuracy']:.4f} | "
                f"Batch: {row['batch_size']} | LR: {row['learning_rate']}\n")

print(f"üíæ Resumen guardado: {summary_path}")

print("="*70)


In [None]:
# Agregar m√≥dulos propios al path
# El notebook est√° en research/, pero modules/ est√° en el directorio ra√≠z
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# ============================================================
# EVALUAR RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("EVALUACI√ìN DE RESULTADOS DE OPTUNA")
print("="*70)

# Importar evaluador

# Evaluar el proceso de optimizaci√≥n
# evaluation = check_optuna_results(str(optuna_results_dir))

print("="*70)


## 5. Re-entrenamiento con Mejores Hiperpar√°metros

Re-entrenar el modelo CNN2D usando los mejores hiperpar√°metros encontrados por Optuna, con early stopping para obtener el modelo final optimizado.


In [None]:
# ============================================================
# CREAR MODELO CON MEJORES HIPERPAR√ÅMETROS
# ============================================================

print("="*70)
print("CREANDO MODELO CON MEJORES HIPERPAR√ÅMETROS")
print("="*70)

# Crear modelo con mejores par√°metros encontrados por Optuna
best_model = CNN2D(
    n_classes=2,
    p_drop_conv=best_params["p_drop_conv"],
    p_drop_fc=best_params["p_drop_fc"],
    input_shape=(65, 41),
    filters_1=best_params["filters_1"],
    filters_2=best_params["filters_2"],
    kernel_size_1=best_params["kernel_size_1"],
    kernel_size_2=best_params["kernel_size_2"],
    dense_units=best_params["dense_units"],
).to(device)

print(f"‚úÖ Modelo creado con mejores hiperpar√°metros:")
print(f"   - Filters 1: {best_params['filters_1']}")
print(f"   - Filters 2: {best_params['filters_2']}")
print(f"   - Kernel 1: {best_params['kernel_size_1']}")
print(f"   - Kernel 2: {best_params['kernel_size_2']}")
print(f"   - Dense units: {best_params['dense_units']}")
print(f"   - Dropout conv: {best_params['p_drop_conv']}")
print(f"   - Dropout fc: {best_params['p_drop_fc']}")

# Mostrar arquitectura
print_model_summary(best_model)

print("="*70)


In [None]:
# ============================================================
# CONFIGURAR ENTRENAMIENTO CON MEJORES PAR√ÅMETROS
# ============================================================

print("="*70)
print("CONFIGURANDO ENTRENAMIENTO FINAL")
print("="*70)

# Configuraci√≥n de entrenamiento final usando configuraci√≥n centralizada
FINAL_TRAINING_CONFIG = {
    "n_epochs": 100,
    "early_stopping_patience": 10,  # Reducido de 15 a 10 (recomendaci√≥n)
    "learning_rate": 0.1,
    "batch_size": best_params["batch_size"]
}

# Crear DataLoaders con el mejor batch size
train_loader_final = DataLoader(
    train_dataset,
    best_params["batch_size"],  # Usar batch_size de Optuna
    shuffle=True,
    num_workers=0
)
val_loader_final = DataLoader(
    val_dataset,
    best_params["batch_size"],  # Usar batch_size de Optuna
    shuffle=False,
    num_workers=0
)
test_loader_final = DataLoader(
    test_dataset,
    best_params["batch_size"],  # Usar batch_size de Optuna
    shuffle=False,
    num_workers=0
)

# Optimizador SGD con momentum usando configuraci√≥n centralizada
# CORREGIDO: Agregado nesterov=True y weight_decay=1e-4
optimizer_final = optim.SGD(
    best_model.parameters(),
    lr=FINAL_TRAINING_CONFIG['learning_rate'],
    momentum=0.9,
    weight_decay=1e-4,  # Cambiado de 0.0 a 1e-4
    nesterov=True  # Agregado Nesterov momentum
)

# Calcular class weights para balancear las clases usando configuraci√≥n centralizada
if CLASS_WEIGHTS_CONFIG["enabled"]:
    class_counts = torch.bincount(y_train)
    class_weights = 1.0 / class_counts.float()
    class_weights = class_weights / class_weights.sum()
    criterion_final = nn.CrossEntropyLoss(weight=class_weights.to(device))
    print(f"‚úÖ Class weights habilitados: {class_weights.tolist()}")
else:
    criterion_final = nn.CrossEntropyLoss()
    print("‚ö†Ô∏è  Class weights deshabilitados")

# Scheduler StepLR usando configuraci√≥n centralizada
scheduler_final = torch.optim.lr_scheduler.StepLR(
    optimizer_final,
    step_size=SCHEDULER_CONFIG["step_size"],
    gamma=SCHEDULER_CONFIG["gamma"]
)

print(f"\n‚öôÔ∏è  Configuraci√≥n final:")
print(f"   - Learning rate inicial: {FINAL_TRAINING_CONFIG['learning_rate']}")
print(f"   - Momentum: 0.9 (Nesterov: True)")
print(f"   - Weight decay: 1e-4")
print(f"   - Scheduler: StepLR (step={SCHEDULER_CONFIG['step_size']}, gamma={SCHEDULER_CONFIG['gamma']})")
print(f"   - Batch size: {FINAL_TRAINING_CONFIG['batch_size']}")
print(f"   - √âpocas m√°ximas: {FINAL_TRAINING_CONFIG['n_epochs']}")
print(f"   - Early stopping patience: {FINAL_TRAINING_CONFIG['early_stopping_patience']}")

print("="*70)


In [None]:
# ============================================================
# ENTRENAR MODELO FINAL CON EARLY STOPPING
# ============================================================

print("="*70)
print("ENTRENANDO MODELO FINAL")
print("="*70)

# Entrenar modelo con mejores hiperpar√°metros
# CORREGIDO: Usar keyword arguments y monitorear val_f1 (mejor para datasets desbalanceados)
final_training_results = train_model(
    model=best_model,
    train_loader=train_loader_final,
    val_loader=val_loader_final,
    optimizer=optimizer_final,
    criterion=criterion_final,
    device=device,
    n_epochs=FINAL_TRAINING_CONFIG['n_epochs'],
    early_stopping_patience=FINAL_TRAINING_CONFIG['early_stopping_patience'],
    save_dir=optuna_results_dir,
    verbose=True,
    scheduler=scheduler_final,
    monitor_metric="f1"  # Monitorear F1 en lugar de loss (recomendado para desbalance)
)

# Extraer resultados
final_model = final_training_results["model"]
final_history = final_training_results["history"]
final_best_val_loss = final_training_results["best_val_loss"]
final_total_time = final_training_results["total_time"]

# Calcular mejor √©poca
final_best_epoch = final_history["val_loss"].index(min(final_history["val_loss"])) + 1

print("="*70)
print("ENTRENAMIENTO FINAL COMPLETADO")
print("="*70)
print(f"‚úÖ Resultados:")
print(f"   - Mejor √©poca: {final_best_epoch}")
print(f"   - Mejor val loss: {final_best_val_loss:.4f}")
print(f"   - Tiempo total: {final_total_time/60:.1f} minutos")
print(f"   - Modelo guardado en: {optuna_results_dir / 'best_model_optuna.pth'}")
print("="*70)


In [None]:
# ============================================================
# EVALUACI√ìN FINAL EN TEST SET
# ============================================================

print("="*70)
print("EVALUACI√ìN FINAL EN TEST SET")
print("="*70)

# Evaluar modelo final en test set
final_test_metrics = detailed_evaluation(
    model=final_model,
    loader=test_loader_final,
    device=device,
    class_names=["Healthy", "Parkinson"]
)

# Imprimir reporte
print_evaluation_report(final_test_metrics, class_names=["Healthy", "Parkinson"])

# Guardar m√©tricas finales
final_metrics_path = optuna_results_dir / "test_metrics_optuna.json"

# Extraer m√©tricas del classification_report
final_report = final_test_metrics["classification_report"]
final_metrics_to_save = {
    "accuracy": float(final_test_metrics["accuracy"]),
    "f1_macro": float(final_test_metrics["f1_macro"]),
    "precision_macro": float(final_report["macro avg"]["precision"]),
    "recall_macro": float(final_report["macro avg"]["recall"]),
    "f1_weighted": float(final_report["weighted avg"]["f1-score"]),
    "confusion_matrix": final_test_metrics["confusion_matrix"].tolist(),
    "classification_report": final_report,
    "best_hyperparameters": best_params,
    "training_config": FINAL_TRAINING_CONFIG,
    "final_epoch": final_best_epoch,
    "final_val_loss": final_best_val_loss,
    "training_time_minutes": final_total_time / 60
}

with open(final_metrics_path, "w") as f:
    json.dump(final_metrics_to_save, f, indent=2)

print(f"\nüíæ M√©tricas finales guardadas en: {final_metrics_path}")
print("="*70)


In [None]:
# ============================================================
# VISUALIZACI√ìN FINAL
# ============================================================

print("="*70)
print("GENERANDO VISUALIZACIONES FINALES")
print("="*70)

# Graficar progreso del entrenamiento final
final_progress_fig = plot_training_history(
    final_history,
    save_path=optuna_results_dir / "training_progress_optuna.png"
)

# Matriz de confusi√≥n final
final_cm = final_test_metrics["confusion_matrix"]
final_cm_fig = plot_confusion_matrix(
    final_cm,
    class_names=["Healthy", "Parkinson"],
    title="Matriz de Confusi√≥n - Test Set (CNN2D Optimizado con Optuna)",
    save_path=optuna_results_dir / "confusion_matrix_optuna.png",
    show=True
)

print(f"üíæ Visualizaciones guardadas:")
print(f"   - Progreso de entrenamiento: {optuna_results_dir / 'training_progress_optuna.png'}")
print(f"   - Matriz de confusi√≥n: {optuna_results_dir / 'confusion_matrix_optuna.png'}")

print("="*70)


In [None]:
# ============================================================
# RESUMEN FINAL DE OPTIMIZACI√ìN
# ============================================================

print("="*70)
print("RESUMEN FINAL DE OPTIMIZACI√ìN CON OPTUNA")
print("="*70)

print(f"\nüîç PROCESO DE OPTIMIZACI√ìN:")
print(f"   - Configuraciones evaluadas: {len(results_df)}")
print(f"   - Mejor F1-score en validaci√≥n: {results_df['f1'].max():.4f}")
print(f"   - F1-score promedio: {results_df['f1'].mean():.4f} ¬± {results_df['f1'].std():.4f}")

print(f"\nüèÜ MEJORES HIPERPAR√ÅMETROS ENCONTRADOS:")
for param, value in best_params.items():
    if param not in ['f1', 'accuracy', 'precision', 'recall', 'val_loss', 'train_loss']:
        print(f"   - {param}: {value}")

print(f"\nüìä RESULTADOS FINALES EN TEST SET:")
final_report = final_test_metrics["classification_report"]
print(f"   - Accuracy:  {final_test_metrics['accuracy']:.4f}")
print(f"   - Precision: {final_report['macro avg']['precision']:.4f}")
print(f"   - Recall:    {final_report['macro avg']['recall']:.4f}")
print(f"   - F1-Score:  {final_test_metrics['f1_macro']:.4f}")

print(f"\nüíæ ARCHIVOS GUARDADOS EN {optuna_results_dir}:")
print(f"   - optuna_scan_results.csv          # Todas las configuraciones probadas")
print(f"   - best_params.json                # Mejores hiperpar√°metros")
print(f"   - optimization_summary.txt        # Resumen de optimizaci√≥n")
print(f"   - best_model_optuna.pth           # Modelo final optimizado")
print(f"   - test_metrics_optuna.json        # M√©tricas en test set")
print(f"   - training_progress_optuna.png    # Gr√°fica de entrenamiento")
print(f"   - confusion_matrix_optuna.png   # Matriz de confusi√≥n")

print("="*70)
print("OPTIMIZACI√ìN CON OPTUNA COMPLETADA EXITOSAMENTE")
print("="*70)
