# CNN 2D para Detección de Parkinson (Baseline con Augmentation + Optuna)
## Baseline Model - Train/Val/Test Split + Hyperparameter Optimization

Este notebook entrena un modelo **CNN2D simple** (sin Domain Adaptation) para clasificación binaria Parkinson vs Healthy **usando data augmentation** y **optimización automática de hiperparámetros con Optuna**.

### Pipeline:
1. **Setup**: Configuración del entorno
2. **Data Loading**: Carga de datos CON augmentation
3. **Split**: Train/Val/Test estratificado (70/15/15)
4. **Optuna Optimization**: Optimización automática de hiperparámetros (20 configuraciones)
5. **Final Training**: Re-entrenamiento con mejores hiperparámetros + early stopping
6. **Evaluation**: Métricas completas en test set
7. **Visualization**: Gráficas de progreso y resultados

### Arquitectura:
Este modelo usa el **mismo Feature Extractor** que CNN2D_DA (arquitectura Ibarra 2023) pero **sin Domain Adaptation**:
- 2 bloques Conv2D → BN → ReLU → MaxPool(3×3) → Dropout
- Solo cabeza de clasificación PD (sin GRL ni cabeza de dominio)

### Data Augmentation:
- Pitch shifting
- Time stretching
- Noise injection
- SpecAugment (máscaras de frecuencia/tiempo)
- Factor: ~5x más datos

### Comparación:
- **Este notebook**: Modelo CNN2D con augmentation (mejora generalización)
- **cnn_da_training.ipynb**: Modelo CNN2D_DA sin augmentation (paper exacto)
- El augmentation permite entrenar con más datos y mejorar robustez


In [39]:
# ============================================================
# CONFIGURACIÓN PARA GOOGLE COLAB
# ============================================================
# DESCOMENTA TODO EL BLOQUE SI EJECUTAS EN COLAB

from google.colab import drive
drive.mount("/content/drive")

import os, sys, subprocess

# Configuración - AJUSTA ESTOS VALORES SI ES NECESARIO
COMPUTER_NAME = "ZenBook"
PROJECT_DIR = "parkinson-voice-uncertainty"
BRANCH = "feature/feature/firstTraining"

BASE = "/content/drive/Othercomputers"
PROJ = os.path.join(BASE, COMPUTER_NAME, PROJECT_DIR)

# Función auxiliar
def sh(*args, check=False):
    print("$", " ".join(args))
    res = subprocess.run(args, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    print(res.stdout)
    if check and res.returncode != 0:
        raise RuntimeError("Command failed")
    return res.returncode

# Verificaciones
assert os.path.isdir(os.path.join(BASE, COMPUTER_NAME)), f"No encuentro {COMPUTER_NAME} en {BASE}"
assert os.path.isdir(PROJ), f"No encuentro el repo en: {PROJ}"

# Agregar al path
if PROJ not in sys.path:
    sys.path.insert(0, PROJ)

# Configurar Git
sh("git", "config", "--global", "--add", "safe.directory", PROJ)
sh("git", "-C", PROJ, "fetch", "--all", "--prune")
sh("git", "-C", PROJ, "branch", "--show-current")

# Cambiar a rama
rc = sh("git", "-C", PROJ, "checkout", BRANCH)
if rc != 0:
    sh("git", "-C", PROJ, "checkout", "-b", BRANCH, f"origin/{BRANCH}")

# Actualizar
sh("git", "-C", PROJ, "pull", "origin", BRANCH)

# Instalar dependencias con manejo de errores mejorado
req = os.path.join(PROJ, "requirements.txt")
if os.path.exists(req):
    os.chdir("/content")
    print("Instalando dependencias...")
    # Instalar dependencias críticas primero
    sh("python", "-m", "pip", "install", "-q", "optuna>=3.0.0")
    sh("python", "-m", "pip", "install", "-q", "torch>=1.9.0")
    sh("python", "-m", "pip", "install", "-q", "torchvision>=0.10.0")
    sh("python", "-m", "pip", "install", "-q", "scikit-learn>=1.0.0")
    sh("python", "-m", "pip", "install", "-q", "librosa>=0.8.1")
    sh("python", "-m", "pip", "install", "-q", "soundfile>=0.10.3")
    # Instalar el resto
    sh("python", "-m", "pip", "install", "-q", "-r", req)
    print("Dependencias instaladas correctamente")
else:
    print("⚠️  No se encontró requirements.txt, instalando dependencias básicas...")
    sh("python", "-m", "pip", "install", "-q", "optuna>=3.0.0", "torch>=1.9.0", "scikit-learn>=1.0.0")

os.chdir(PROJ)

# Verificar que optuna esté disponible
try:
    import optuna
    print(f"✅ Optuna {optuna.__version__} disponible")
except ImportError as e:
    print(f"❌ Error importando optuna: {e}")
    print("Reinstalando optuna...")
    sh("python", "-m", "pip", "install", "--force-reinstall", "optuna>=3.0.0")

# Autoreload
try:
    get_ipython().run_line_magic("load_ext", "autoreload")
    get_ipython().run_line_magic("autoreload", "2")
    print("Autoreload activo")
except Exception as e:
    print(f"No se activó autoreload: {e}")

print(f"Repo listo en: {PROJ}")
sh("git", "-C", PROJ, "branch", "--show-current")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
$ git config --global --add safe.directory /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty

$ git -C /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty fetch --all --prune
Fetching origin

$ git -C /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty branch --show-current
feature/feature/firstTraining

$ git -C /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty checkout feature/feature/firstTraining
M	.gitignore
M	.ipynb_checkpoints/parkinson_voice_analysis-checkpoint.ipynb
M	README.md
M	bash.exe.stackdump
M	data/README.md
M	data/vowels_healthy/healthy_diverse_metadata.json
M	data_preparation/sample_healthy_data.py
M	data_preparation/verify_sampling.py
M	docs/CONFIGURACION_VALIDATION.md
M	docs/CORRECCIONES_APLICADAS.md
M	docs/MIGRATION_TALOS_TO_OPTUNA.md
M	docs/TEST_RESULTS_SUMMARY.md
M	examples/env

0

In [18]:
# ============================================================
# VERIFICACIÓN DE DEPENDENCIAS CRÍTICAS
# ============================================================

print("="*70)
print("VERIFICANDO DEPENDENCIAS CRÍTICAS")
print("="*70)

# Verificar imports críticos
critical_imports = [
    ("torch", "PyTorch"),
    ("optuna", "Optuna"),
    ("sklearn", "Scikit-learn"),
    ("numpy", "NumPy"),
    ("pandas", "Pandas"),
    ("matplotlib", "Matplotlib"),
    ("librosa", "Librosa"),
    ("soundfile", "SoundFile")
]

missing_imports = []
for module, name in critical_imports:
    try:
        __import__(module)
        print(f"✅ {name}: OK")
    except ImportError as e:
        print(f"❌ {name}: FALTA - {e}")
        missing_imports.append((module, name))

if missing_imports:
    print(f"\n⚠️  Faltan {len(missing_imports)} dependencias críticas:")
    for module, name in missing_imports:
        print(f"   - {name} ({module})")
    print("\n🔧 Reinstalando dependencias faltantes...")

    # Reinstalar dependencias faltantes
    for module, name in missing_imports:
        if module == "torch":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "torch>=1.9.0"], check=False)
        elif module == "optuna":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "optuna>=3.0.0"], check=False)
        elif module == "sklearn":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "scikit-learn>=1.0.0"], check=False)
        elif module == "librosa":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "librosa>=0.8.1"], check=False)
        elif module == "soundfile":
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", "soundfile>=0.10.3"], check=False)
        else:
            subprocess.run(["python", "-m", "pip", "install", "--force-reinstall", module], check=False)

    print("✅ Reinstalación completada")
else:
    print("\n🎉 Todas las dependencias críticas están disponibles")

print("="*70)


VERIFICANDO DEPENDENCIAS CRÍTICAS
✅ PyTorch: OK
✅ Optuna: OK
✅ Scikit-learn: OK
✅ NumPy: OK
✅ Pandas: OK
✅ Matplotlib: OK
✅ Librosa: OK
✅ SoundFile: OK

🎉 Todas las dependencias críticas están disponibles


In [19]:
# ============================================================
# VERIFICACIÓN DE MÓDULOS PROPIOS
# ============================================================

print("="*70)
print("VERIFICANDO MÓDULOS PROPIOS")
print("="*70)

# Verificar que los módulos propios estén disponibles
try:
    from modules.core.cnn2d_optuna_wrapper import CNN2DOptunaWrapper, optimize_cnn2d
    print("✅ CNN2D Optuna Wrapper: OK")
except ImportError as e:
    print(f"❌ CNN2D Optuna Wrapper: ERROR - {e}")
    print("Verificando estructura del proyecto...")
    import os
    print(f"Directorio actual: {os.getcwd()}")
    print(f"Contenido: {os.listdir('.')}")
    if os.path.exists('modules'):
        print("✅ Carpeta modules encontrada")
        print(f"Contenido de modules: {os.listdir('modules')}")
        if os.path.exists('modules/core'):
            print("✅ Carpeta modules/core encontrada")
            print(f"Contenido de modules/core: {os.listdir('modules/core')}")
        else:
            print("❌ Carpeta modules/core no encontrada")
    else:
        print("❌ Carpeta modules no encontrada")

try:
    from modules.core.optuna_optimization import OptunaOptimizer
    print("✅ Optuna Optimizer: OK")
except ImportError as e:
    print(f"❌ Optuna Optimizer: ERROR - {e}")

try:
    from modules.models.cnn2d.model import CNN2D
    print("✅ CNN2D Model: OK")
except ImportError as e:
    print(f"❌ CNN2D Model: ERROR - {e}")

# Verificar que optuna esté funcionando correctamente
try:
    import optuna
    print(f"✅ Optuna {optuna.__version__}: OK")

    # Crear un trial de prueba
    study = optuna.create_study(direction='maximize')
    trial = study.ask()
    print("✅ Optuna Trial: OK")

except Exception as e:
    print(f"❌ Optuna: ERROR - {e}")

print("="*70)


[I 2025-10-25 18:48:57,619] A new study created in memory with name: no-name-f1d9c371-a63c-46f5-b5d4-195a0b1cd28b


VERIFICANDO MÓDULOS PROPIOS
✅ CNN2D Optuna Wrapper: OK
✅ Optuna Optimizer: OK
✅ CNN2D Model: OK
✅ Optuna 4.5.0: OK
✅ Optuna Trial: OK


In [20]:
# ============================================================
# SOLUCIÓN DE RESPALDO PARA PROBLEMAS DE IMPORTACIÓN
# ============================================================

print("="*70)
print("IMPLEMENTANDO SOLUCIÓN DE RESPALDO")
print("="*70)

# Si hay problemas con los módulos, implementar una versión simplificada
try:
    from modules.core.cnn2d_optuna_wrapper import optimize_cnn2d
    print("✅ Módulo principal disponible")
    USE_BACKUP = False
except ImportError as e:
    print(f"⚠️  Problema con módulo principal: {e}")
    print("🔧 Implementando solución de respaldo...")
    USE_BACKUP = True

    # Implementar una versión simplificada de optimize_cnn2d
    def optimize_cnn2d_backup(X_train, y_train, X_val, y_val, input_shape, n_trials=30, n_epochs_per_trial=20, device="cpu", save_dir=None):
        """
        Versión de respaldo de optimize_cnn2d que funciona sin módulos externos.
        """
        import optuna
        import torch
        import torch.nn as nn
        import torch.optim as optim
        from torch.utils.data import DataLoader, TensorDataset
        from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score
        import pandas as pd
        import json
        from pathlib import Path

        # Importar CNN2D directamente
        from modules.models.cnn2d.model import CNN2D

        def objective(trial):
            # Hiperparámetros
            filters_1 = trial.suggest_categorical("filters_1", [16, 32, 64])
            filters_2 = trial.suggest_categorical("filters_2", [32, 64, 128])
            kernel_size_1 = trial.suggest_categorical("kernel_size_1", [3, 5])
            kernel_size_2 = trial.suggest_categorical("kernel_size_2", [3, 5])
            p_drop_conv = trial.suggest_float("p_drop_conv", 0.2, 0.5)
            p_drop_fc = trial.suggest_float("p_drop_fc", 0.3, 0.6)
            dense_units = trial.suggest_categorical("dense_units", [32, 64, 128])
            learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
            weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-3, log=True)
            optimizer_name = trial.suggest_categorical("optimizer", ["adam", "sgd"])
            batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])

            # Crear modelo
            model = CNN2D(
                input_shape=input_shape[1:],  # (H, W) sin el canal
                filters_1=filters_1,
                filters_2=filters_2,
                kernel_size_1=kernel_size_1,
                kernel_size_2=kernel_size_2,
                p_drop_conv=p_drop_conv,
                p_drop_fc=p_drop_fc,
                dense_units=dense_units,
            ).to(device)

            # Crear DataLoaders
            train_dataset = TensorDataset(X_train, y_train)
            val_dataset = TensorDataset(X_val, y_val)
            train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
            val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

            # Optimizador
            if optimizer_name == "adam":
                optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
            else:  # sgd
                optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay, momentum=0.9)

            criterion = nn.CrossEntropyLoss()

            # Entrenamiento
            best_f1 = 0.0
            for epoch in range(n_epochs_per_trial):
                # Training
                model.train()
                for batch_x, batch_y in train_loader:
                    batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                    optimizer.zero_grad()
                    outputs = model(batch_x)
                    loss = criterion(outputs, batch_y)
                    loss.backward()
                    optimizer.step()

                # Validation
                model.eval()
                val_preds = []
                val_labels = []
                with torch.no_grad():
                    for batch_x, batch_y in val_loader:
                        batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                        outputs = model(batch_x)
                        _, predicted = torch.max(outputs.data, 1)
                        val_preds.extend(predicted.cpu().numpy())
                        val_labels.extend(batch_y.cpu().numpy())

                f1 = f1_score(val_labels, val_preds, average="macro")
                if f1 > best_f1:
                    best_f1 = f1

                # Reportar a Optuna
                trial.report(f1, epoch)
                if trial.should_prune():
                    raise optuna.TrialPruned()

            return best_f1

        # Crear estudio
        study = optuna.create_study(direction='maximize', pruner=optuna.pruners.MedianPruner())
        study.optimize(objective, n_trials=n_trials)

        # Preparar resultados
        results_df = pd.DataFrame([
            {
                'trial': trial.number,
                'value': trial.value,
                'params': trial.params,
                'f1': trial.value,
                'accuracy': 0.0,  # Placeholder
                'precision': 0.0,  # Placeholder
                'recall': 0.0,  # Placeholder
                **trial.params
            }
            for trial in study.trials
        ])

        results = {
            "best_params": study.best_params,
            "best_value": study.best_value,
            "best_trial": study.best_trial.number,
            "results_df": results_df,
            "analysis": {"best_trial": study.best_trial}
        }

        # Guardar si se especificó directorio
        if save_dir:
            save_path = Path(save_dir)
            save_path.mkdir(parents=True, exist_ok=True)
            results_df.to_csv(save_path / "optuna_trials_results.csv", index=False)
            with open(save_path / "best_params.json", 'w') as f:
                json.dump(study.best_params, f, indent=2)

        return results

    # Reemplazar la función original
    optimize_cnn2d = optimize_cnn2d_backup
    print("✅ Solución de respaldo implementada")

print("="*70)


IMPLEMENTANDO SOLUCIÓN DE RESPALDO
✅ Módulo principal disponible


In [21]:
# ============================================================
# PRUEBA RÁPIDA DE OPTUNA (1 TRIAL)
# ============================================================

print("="*70)
print("PRUEBA RÁPIDA DE OPTUNA - 1 TRIAL")
print("="*70)

# Importar dependencias necesarias
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score
import pandas as pd
import json
from pathlib import Path

# Configurar dispositivo
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"🔧 Dispositivo: {device}")

# Crear datos de prueba pequeños para verificar que todo funciona
print("🔧 Creando datos de prueba...")

# Datos de prueba (muy pequeños para verificación rápida)
X_test_small = torch.randn(20, 1, 65, 41)  # 20 muestras
y_test_small = torch.randint(0, 2, (20,))  # Labels aleatorios

# Split de prueba
X_train_test = X_test_small[:15]
y_train_test = y_test_small[:15]
X_val_test = X_test_small[15:]
y_val_test = y_test_small[15:]

print(f"✅ Datos de prueba creados:")
print(f"   - Train: {X_train_test.shape} (labels: {y_train_test.shape})")
print(f"   - Val:   {X_val_test.shape} (labels: {y_val_test.shape})")

# Probar la función optimize_cnn2d con 1 trial
print(f"\n🧪 Probando optimize_cnn2d con 1 trial...")
print("   (Esto debería tomar menos de 1 minuto)")

try:
    # Ejecutar prueba con 1 trial y 2 épocas
    test_results = optimize_cnn2d(
        X_train=X_train_test,
        y_train=y_train_test,
        X_val=X_val_test,
        y_val=y_val_test,
        input_shape=(1, 65, 41),
        n_trials=1,  # Solo 1 trial
        n_epochs_per_trial=2,  # Solo 2 épocas
        device=device,
        save_dir=None  # No guardar
    )

    print("✅ PRUEBA EXITOSA!")
    print(f"   - Mejor F1: {test_results['best_value']:.4f}")
    print(f"   - Mejores params: {test_results['best_params']}")
    print("   - La función optimize_cnn2d funciona correctamente")

except Exception as e:
    print(f"❌ ERROR EN LA PRUEBA:")
    print(f"   - Error: {e}")
    print(f"   - Tipo: {type(e).__name__}")
    import traceback
    print(f"   - Traceback completo:")
    traceback.print_exc()

    print(f"\n🔧 Soluciones posibles:")
    print("   1. Verificar que optuna esté instalado correctamente")
    print("   2. Verificar que los módulos propios estén disponibles")
    print("   3. Revisar la configuración del entorno")

print("="*70)


[I 2025-10-25 18:48:57,673] A new study created in memory with name: cnn2d_optuna


PRUEBA RÁPIDA DE OPTUNA - 1 TRIAL
🔧 Dispositivo: cuda
🔧 Creando datos de prueba...
✅ Datos de prueba creados:
   - Train: torch.Size([15, 1, 65, 41]) (labels: torch.Size([15]))
   - Val:   torch.Size([5, 1, 65, 41]) (labels: torch.Size([5]))

🧪 Probando optimize_cnn2d con 1 trial...
   (Esto debería tomar menos de 1 minuto)
Iniciando optimización con 1 trials...


  0%|          | 0/1 [00:00<?, ?it/s]

[I 2025-10-25 18:48:57,783] Trial 0 finished with value: 0.5833333333333333 and parameters: {'filters_1': 32, 'filters_2': 32, 'kernel_size_1': 5, 'kernel_size_2': 5, 'p_drop_conv': 0.20617534828874073, 'p_drop_fc': 0.5909729556485983, 'dense_units': 32, 'learning_rate': 3.5498788321965036e-05, 'weight_decay': 8.179499475211674e-06, 'optimizer': 'adam', 'batch_size': 32}. Best is trial 0 with value: 0.5833333333333333.

Optimización completada!
Mejor F1: 0.5833
Mejores hiperparámetros:
  filters_1: 32
  filters_2: 32
  kernel_size_1: 5
  kernel_size_2: 5
  p_drop_conv: 0.20617534828874073
  p_drop_fc: 0.5909729556485983
  dense_units: 32
  learning_rate: 3.5498788321965036e-05
  weight_decay: 8.179499475211674e-06
  optimizer: adam
  batch_size: 32
✅ PRUEBA EXITOSA!
   - Mejor F1: 0.5833
   - Mejores params: {'filters_1': 32, 'filters_2': 32, 'kernel_size_1': 5, 'kernel_size_2': 5, 'p_drop_conv': 0.20617534828874073, 'p_drop_fc': 0.5909729556485983, 'dense_units': 32, 'learning_rate': 

In [22]:
# ============================================================
# PRUEBA ALTERNATIVA - VERIFICAR IMPORTS Y FUNCIONES
# ============================================================

print("="*70)
print("VERIFICACIÓN DETALLADA DE IMPORTS Y FUNCIONES")
print("="*70)

# Verificar que todas las importaciones funcionen
print("🔍 Verificando importaciones...")

try:
    import optuna
    print(f"✅ optuna {optuna.__version__}")
except ImportError as e:
    print(f"❌ optuna: {e}")

try:
    import torch
    print(f"✅ torch {torch.__version__}")
except ImportError as e:
    print(f"❌ torch: {e}")

try:
    from modules.models.cnn2d.model import CNN2D
    print("✅ CNN2D model")
except ImportError as e:
    print(f"❌ CNN2D model: {e}")

# Verificar que la función optimize_cnn2d esté disponible
print(f"\n🔍 Verificando función optimize_cnn2d...")
try:
    # Verificar que la función esté definida
    if 'optimize_cnn2d' in globals():
        print("✅ optimize_cnn2d está disponible")
        print(f"   - Tipo: {type(optimize_cnn2d)}")
    else:
        print("❌ optimize_cnn2d no está disponible")
except Exception as e:
    print(f"❌ Error verificando optimize_cnn2d: {e}")

# Verificar configuración de Optuna
print(f"\n🔍 Verificando configuración de Optuna...")
try:
    # Crear un estudio de prueba
    study = optuna.create_study(direction='maximize')
    print("✅ Optuna study creado correctamente")

    # Crear un trial de prueba
    trial = study.ask()
    print("✅ Optuna trial creado correctamente")

    # Verificar que TrialPruned esté disponible
    if hasattr(optuna, 'TrialPruned'):
        print("✅ optuna.TrialPruned disponible")
    else:
        print("❌ optuna.TrialPruned no disponible")

except Exception as e:
    print(f"❌ Error con Optuna: {e}")

print("="*70)


[I 2025-10-25 18:48:58,062] A new study created in memory with name: no-name-a1db2587-35e4-454e-84c7-aab10d0c64fb


VERIFICACIÓN DETALLADA DE IMPORTS Y FUNCIONES
🔍 Verificando importaciones...
✅ optuna 4.5.0
✅ torch 2.8.0+cu126
✅ CNN2D model

🔍 Verificando función optimize_cnn2d...
✅ optimize_cnn2d está disponible
   - Tipo: <class 'function'>

🔍 Verificando configuración de Optuna...
✅ Optuna study creado correctamente
✅ Optuna trial creado correctamente
✅ optuna.TrialPruned disponible


In [23]:
# ============================================================
# PRUEBA FINAL - OPTUNA CON DATOS MÍNIMOS
# ============================================================

print("="*70)
print("PRUEBA FINAL - OPTUNA CON DATOS MÍNIMOS")
print("="*70)

# Crear datos mínimos para prueba
print("🔧 Creando datos mínimos...")
X_mini = torch.randn(10, 1, 65, 41)  # 10 muestras
y_mini = torch.randint(0, 2, (10,))  # Labels aleatorios

# Split mínimo
X_train_mini = X_mini[:7]
y_train_mini = y_mini[:7]
X_val_mini = X_mini[7:]
y_val_mini = y_mini[7:]

print(f"✅ Datos mínimos creados:")
print(f"   - Train: {X_train_mini.shape}")
print(f"   - Val:   {X_val_mini.shape}")

# Función de prueba simplificada
def test_optuna_simple():
    """Prueba muy simple de Optuna"""
    import optuna

    def objective(trial):
        # Hiperparámetros simples
        lr = trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
        batch_size = trial.suggest_categorical("batch_size", [2, 4])

        # Crear modelo simple
        from modules.models.cnn2d.model import CNN2D
        model = CNN2D(
            input_shape=(65, 41),
            filters_1=16,
            filters_2=32,
            kernel_size_1=3,
            kernel_size_2=3,
            p_drop_conv=0.2,
            p_drop_fc=0.3,
            dense_units=32,
        ).to(device)

        # DataLoader simple
        train_dataset = TensorDataset(X_train_mini, y_train_mini)
        val_dataset = TensorDataset(X_val_mini, y_val_mini)
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

        # Optimizador
        optimizer = optim.Adam(model.parameters(), lr=lr)
        criterion = nn.CrossEntropyLoss()

        # Entrenamiento mínimo (1 época)
        model.train()
        for batch_x, batch_y in train_loader:
            batch_x, batch_y = batch_x.to(device), batch_y.to(device)
            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

        # Validación
        model.eval()
        val_preds = []
        val_labels = []
        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                outputs = model(batch_x)
                _, predicted = torch.max(outputs.data, 1)
                val_preds.extend(predicted.cpu().numpy())
                val_labels.extend(batch_y.cpu().numpy())

        # Métrica simple
        f1 = f1_score(val_labels, val_preds, average="macro")
        return f1

    # Crear estudio
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=1)

    return study.best_value, study.best_params

# Ejecutar prueba
print(f"\n🧪 Ejecutando prueba simple...")
try:
    best_value, best_params = test_optuna_simple()
    print("✅ PRUEBA EXITOSA!")
    print(f"   - Mejor F1: {best_value:.4f}")
    print(f"   - Mejores params: {best_params}")
    print("   - Optuna funciona correctamente")

except Exception as e:
    print(f"❌ ERROR EN LA PRUEBA:")
    print(f"   - Error: {e}")
    print(f"   - Tipo: {type(e).__name__}")
    import traceback
    print(f"   - Traceback:")
    traceback.print_exc()

print("="*70)


[I 2025-10-25 18:48:58,092] A new study created in memory with name: no-name-16d1e374-5162-47f1-a89f-da3dbbea5cd6
[I 2025-10-25 18:48:58,223] Trial 0 finished with value: 1.0 and parameters: {'learning_rate': 0.0015196682842759201, 'batch_size': 2}. Best is trial 0 with value: 1.0.


PRUEBA FINAL - OPTUNA CON DATOS MÍNIMOS
🔧 Creando datos mínimos...
✅ Datos mínimos creados:
   - Train: torch.Size([7, 1, 65, 41])
   - Val:   torch.Size([3, 1, 65, 41])

🧪 Ejecutando prueba simple...
✅ PRUEBA EXITOSA!
   - Mejor F1: 1.0000
   - Mejores params: {'learning_rate': 0.0015196682842759201, 'batch_size': 2}
   - Optuna funciona correctamente


In [24]:
# ============================================================
# CONFIGURAR ENTORNO Y DEPENDENCIAS
# ============================================================

import sys
from pathlib import Path

# Agregar el directorio raíz del proyecto al path
# El notebook está en research/, pero modules/ está en el directorio raíz
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# Importar el gestor de dependencias centralizado
from modules.core.dependency_manager import setup_notebook_environment

# Configurar el entorno automáticamente
# Esto verifica e instala todas las dependencias necesarias
success = setup_notebook_environment(auto_install=True, verbose=True)

if not success:
    print("Error configurando el entorno")
    print("Intenta instalar manualmente: pip install -r requirements.txt")
    import sys
    sys.exit(1)

print("="*70)


🚀 Configurando entorno para notebook...
🔍 Información del entorno:
   python_version: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
   platform: linux
   is_colab: True
   is_jupyter: True
   working_directory: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty
   torch_version: 2.8.0+cu126
   cuda_available: True
🔍 Estado de dependencias:
   ✅ PyTorch
   ✅ TorchVision
   ✅ NumPy
   ✅ Pandas
   ✅ Scikit-learn
   ✅ Matplotlib
   ✅ Seaborn
   ✅ Librosa
   ✅ SoundFile
   ✅ Optuna
   ✅ Jupyter

✅ Entorno listo - todas las dependencias disponibles


In [25]:
# ============================================================
# CONFIGURACIÓN COMPLETA DEL EXPERIMENTO (PAPER IBARRA 2023)
# ============================================================

print("="*70)
print("CONFIGURACIÓN DEL EXPERIMENTO - PAPER IBARRA 2023")
print("="*70)

# ============================================================
# CONFIGURACIÓN DEL OPTIMIZADOR (SGD como en el paper)
# ============================================================
OPTIMIZER_CONFIG = {
    "type": "SGD",
    "learning_rate": 0.1,
    "momentum": 0.9,
    "weight_decay": 1e-4,  # Cambiado de 0.0 a 1e-4 para regularización
    "nesterov": True  # Agregado Nesterov momentum para mejor convergencia
}

# ============================================================
# CONFIGURACIÓN DEL SCHEDULER (StepLR como en el paper)
# ============================================================
SCHEDULER_CONFIG = {
    "type": "StepLR",
    "step_size": 10,
    "gamma": 0.1
}

# ============================================================
# CONFIGURACIÓN DEL K-FOLD CROSS-VALIDATION
# ============================================================
KFOLD_CONFIG = {
    "n_splits": 10,
    "shuffle": True,
    "random_state": 42,
    "stratify_by_speaker": True
}

# ============================================================
# CONFIGURACIÓN DE CLASS WEIGHTS (para balancear clases)
# ============================================================
CLASS_WEIGHTS_CONFIG = {
    "enabled": True,
    "method": "inverse_frequency"  # 1/frequency
}

# ============================================================
# CONFIGURACIÓN DE FILTRADO DE VOCAL /a/
# ============================================================
VOCAL_FILTER_CONFIG = {
    "enabled": True,
    "target_vocal": "a",  # Solo vocal /a/ como en el paper
    "filter_healthy": True,
    "filter_parkinson": True
}

# ============================================================
# CONFIGURACIÓN DE ENTRENAMIENTO
# ============================================================
TRAINING_CONFIG = {
    "n_epochs": 100,
    "early_stopping_patience": 10,  # Reducido de 15 a 10 para evitar overfitting
    "batch_size": 32,
    "num_workers": 0,
    "save_best_model": True
}

# ============================================================
# CONFIGURACIÓN DE OPTUNA (OPTIMIZACIÓN DE HIPERPARÁMETROS)
# ============================================================
# Optuna reemplaza a Optuna - más moderno, sin problemas de instalación
OPTUNA_CONFIG = {
    "enabled": True,
    "experiment_name": "cnn2d_optuna_optimization",
    "n_trials": 30,  # Número de configuraciones a probar
    "n_epochs_per_trial": 20,  # Épocas por configuración
    "metric": "f1",  # Métrica a optimizar
    "direction": "maximize"  # maximize o minimize
}

# ============================================================
# CONFIGURACIÓN DE DATOS
# ============================================================
DATA_CONFIG = {
    "test_size": 0.15,
    "val_size": 0.15,
    "random_state": 42,
    "stratify": True
}


CONFIGURACIÓN DEL EXPERIMENTO - PAPER IBARRA 2023


In [26]:
# ============================================================
# DETECTAR ENTORNO Y CONFIGURAR RUTAS
# ============================================================

# Este import funciona desde cualquier subdirectorio del proyecto
import sys
from pathlib import Path

# Buscar y agregar la raíz del proyecto al path
current_dir = Path.cwd()
for _ in range(10):
    if (current_dir / "modules").exists():
        if str(current_dir) not in sys.path:
            sys.path.insert(0, str(current_dir))
        break
    current_dir = current_dir.parent

# Importar la función de configuración de notebooks
from modules.core.notebook_setup import setup_notebook

# Configurar automáticamente: path + entorno (Local/Colab) + rutas
ENV, PATHS = setup_notebook(verbose=True)


Raíz del proyecto agregada al path: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty
CONFIGURACIÓN DE ENTORNO
Entorno detectado: COLAB
Ruta base: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty
Cache original: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty/cache/original
Cache augmented: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty/cache/augmented

MODO COLAB: Usando rutas de Google Drive


## 1. Setup y Configuración


In [27]:
# ============================================================
# IMPORTS Y CONFIGURACIÓN
# ============================================================
import sys
from pathlib import Path
import matplotlib.pyplot as plt
import warnings
import pandas as pd
import json
import numpy as np
warnings.filterwarnings('ignore')

# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split

# Agregar módulos propios al path
# El notebook está en research/, pero modules/ está en el directorio raíz
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# Importar módulos propios
from modules.models.cnn2d.model import CNN2D
from modules.models.common.training_utils import print_model_summary
from modules.models.cnn2d.training import train_model, detailed_evaluation, print_evaluation_report
from modules.models.cnn2d.visualization import plot_training_history, analyze_spectrogram_stats
from modules.models.cnn2d.utils import plot_confusion_matrix
from modules.core.utils import create_10fold_splits_by_speaker
from modules.core.dataset import (
    load_spectrograms_cache,
    to_pytorch_tensors,
    DictDataset,
)


# Imports para Optuna (optimización de hiperparámetros - reemplaza Optuna)
from modules.core.cnn2d_optuna_wrapper import optimize_cnn2d, create_cnn2d_optimizer
from modules.core.optuna_optimization import OptunaOptimizer

# Configuración de matplotlib
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

# Configuración de PyTorch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Reporte de configuración
print("="*70)
print("CNN 2D TRAINING - BASELINE CON AUGMENTATION")
print("="*70)
print(f"Librerías cargadas correctamente")
print(f"Dispositivo: {device}")
print(f"PyTorch: {torch.__version__}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Data augmentation: ACTIVADO (~5x datos)")
print("="*70)


CNN 2D TRAINING - BASELINE CON AUGMENTATION
Librerías cargadas correctamente
Dispositivo: cuda
PyTorch: 2.8.0+cu126
GPU: Tesla T4
Data augmentation: ACTIVADO (~5x datos)


## 2. Carga de Datos

Carga de datos preprocesados CON augmentation para mejorar generalización del modelo baseline.


In [28]:
# ============================================================
# CARGAR DATOS HEALTHY DESDE CACHE ORIGINAL
# ============================================================

print("Cargando datos Healthy desde cache original...")
print("="*60)

from modules.core.dataset import load_spectrograms_cache

# Cargar datos healthy desde cache original usando rutas dinámicas
cache_healthy_path = PATHS['cache_original'] / "healthy_ibarra.pkl"
healthy_dataset = load_spectrograms_cache(str(cache_healthy_path))

if healthy_dataset is None:
    raise FileNotFoundError(f"No se encontró el cache de datos healthy en {cache_healthy_path}")

# Convertir a tensores PyTorch
X_healthy, y_task_healthy, y_domain_healthy, meta_healthy = to_pytorch_tensors(healthy_dataset)

print(f"Healthy cargado exitosamente:")
print(f"   - Espectrogramas: {X_healthy.shape[0]}")
print(f"   - Shape: {X_healthy.shape}")
print(f"   - Ruta: {cache_healthy_path}")


Cargando datos Healthy desde cache original...
Cache cargado: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty/cache/original/healthy_ibarra.pkl
   Tamaño: 933.5 MB
   Muestras: 12029
PyTorch tensors listos:
  - X: (12029, 1, 65, 41)
  - y_task: (12029,)  (dist={0: 12029})
  - y_domain: (12029,)  (K dominios=4)
Healthy cargado exitosamente:
   - Espectrogramas: 12029
   - Shape: torch.Size([12029, 1, 65, 41])
   - Ruta: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty/cache/original/healthy_ibarra.pkl


In [29]:
# ============================================================
# CARGAR DATOS PARKINSON DESDE CACHE ORIGINAL
# ============================================================

print("Cargando datos Parkinson desde cache original...")
print("="*60)

# Cargar datos parkinson desde cache original usando rutas dinámicas
cache_parkinson_path = PATHS['cache_original'] / "parkinson_ibarra.pkl"
parkinson_dataset = load_spectrograms_cache(str(cache_parkinson_path))

if parkinson_dataset is None:
    raise FileNotFoundError(f"No se encontró el cache de datos parkinson en {cache_parkinson_path}")

# Convertir a tensores PyTorch
X_parkinson, y_task_parkinson, y_domain_parkinson, meta_parkinson = to_pytorch_tensors(parkinson_dataset)

print(f"Parkinson cargado exitosamente:")
print(f"   - Espectrogramas: {X_parkinson.shape[0]}")
print(f"   - Shape: {X_parkinson.shape}")
print(f"   - Ruta: {cache_parkinson_path}")


Cargando datos Parkinson desde cache original...
Cache cargado: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty/cache/original/parkinson_ibarra.pkl
   Tamaño: 1792.3 MB
   Muestras: 23097
PyTorch tensors listos:
  - X: (23097, 1, 65, 41)
  - y_task: (23097,)  (dist={0: 23097})
  - y_domain: (23097,)  (K dominios=4)
Parkinson cargado exitosamente:
   - Espectrogramas: 23097
   - Shape: torch.Size([23097, 1, 65, 41])
   - Ruta: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty/cache/original/parkinson_ibarra.pkl


In [30]:
# ============================================================
# INFORMACIÓN DE DATOS CARGADOS
# ============================================================

print("="*70)
print("INFORMACIÓN DE DATOS CARGADOS")
print("="*70)

print(f"Datos Healthy (desde cache original):")
print(f"   - Muestras: {len(healthy_dataset)}")
print(f"   - Shape de espectrogramas: {X_healthy.shape}")

print("="*70)


INFORMACIÓN DE DATOS CARGADOS
Datos Healthy (desde cache original):
   - Muestras: 12029
   - Shape de espectrogramas: torch.Size([12029, 1, 65, 41])


In [31]:
# ============================================================
# ANÁLISIS ESTADÍSTICO BÁSICO
# ============================================================

print("="*70)
print("ANÁLISIS ESTADÍSTICO BÁSICO")
print("="*70)

# Análisis estadístico básico
healthy_stats = analyze_spectrogram_stats(healthy_dataset, "HEALTHY")
parkinson_stats = analyze_spectrogram_stats(parkinson_dataset, "PARKINSON")

# Comparar diferencias
print(f"\nDIFERENCIAS ENTRE CLASES:")
print(f"   - Diferencia en media: {abs(healthy_stats['mean'] - parkinson_stats['mean']):.3f}")
print(f"   - Diferencia en std: {abs(healthy_stats['std'] - parkinson_stats['std']):.3f}")

print("\nConfiguración del experimento:")
print("   - Healthy: datos originales (baseline)")
print("   - Parkinson: datos con augmentation (mejor generalización)")
print("="*70)


ANÁLISIS ESTADÍSTICO BÁSICO

HEALTHY:
   • Número de espectrogramas: 12029
   • Shape típico: (65, 41)
   • Media: 0.000
   • Desviación estándar: 1.000
   • Min: -2.465
   • Max: 6.155

PARKINSON:
   • Número de espectrogramas: 23097
   • Shape típico: (65, 41)
   • Media: 0.000
   • Desviación estándar: 1.000
   • Min: -3.360
   • Max: 3.726

DIFERENCIAS ENTRE CLASES:
   - Diferencia en media: 0.000
   - Diferencia en std: 0.000

Configuración del experimento:
   - Healthy: datos originales (baseline)
   - Parkinson: datos con augmentation (mejor generalización)


In [32]:
# ============================================================
# COMBINAR DATASETS
# ============================================================

print("="*70)
print("COMBINANDO DATASETS")
print("="*70)

# Combinar espectrogramas
X_combined = torch.cat([X_healthy, X_parkinson], dim=0)

# Crear labels: 0=Healthy, 1=Parkinson
y_combined = torch.cat([
    torch.zeros(len(X_healthy), dtype=torch.long),  # Healthy = 0
    torch.ones(len(X_parkinson), dtype=torch.long)  # Parkinson = 1
], dim=0)

print(f"\nDATASET COMBINADO:")
print(f"   - Total muestras: {len(X_combined)}")
print(f"   - Shape: {X_combined.shape}")
print(f"   - Healthy (0): {(y_combined == 0).sum().item()} ({(y_combined == 0).sum()/len(y_combined)*100:.1f}%)")
print(f"   - Parkinson (1): {(y_combined == 1).sum().item()} ({(y_combined == 1).sum()/len(y_combined)*100:.1f}%)")

balance_pct = (y_combined == 1).sum() / len(y_combined) * 100
if abs(balance_pct - 50) < 10:
    print(f"   ✓ Dataset razonablemente balanceado")
else:
    print(f"   ⚠ Dataset desbalanceado - class weights habilitados en config")

print("="*70)


COMBINANDO DATASETS

DATASET COMBINADO:
   - Total muestras: 35126
   - Shape: torch.Size([35126, 1, 65, 41])
   - Healthy (0): 12029 (34.2%)
   - Parkinson (1): 23097 (65.8%)
   ⚠ Dataset desbalanceado - class weights habilitados en config


In [33]:
# ============================================================
# INSPECCIONAR METADATOS PARA SPEAKER IDS
# ============================================================

print("="*70)
print("VERIFICANDO METADATOS")
print("="*70)

# Verificar estructura de metadatos
if meta_healthy and len(meta_healthy) > 0:
    print(f"\n✓ meta_healthy disponible: {len(meta_healthy)} muestras")
    print(f"  Ejemplo de metadata[0]:")
    sample_meta = meta_healthy[0]
    if isinstance(sample_meta, dict):
        for key, value in list(sample_meta.items())[:5]:
            print(f"    - {key}: {value}")
    else:
        print(f"    Tipo: {type(sample_meta)}")
        print(f"    Valor: {sample_meta}")
else:
    print("  ✗ meta_healthy no disponible o vacío")

if meta_parkinson and len(meta_parkinson) > 0:
    print(f"\n✓ meta_parkinson disponible: {len(meta_parkinson)} muestras")
    print(f"  Ejemplo de metadata[0]:")
    sample_meta = meta_parkinson[0]
    if isinstance(sample_meta, dict):
        for key, value in list(sample_meta.items())[:5]:
            print(f"    - {key}: {value}")
    else:
        print(f"    Tipo: {type(sample_meta)}")
        print(f"    Valor: {sample_meta}")
else:
    print("  ✗ meta_parkinson no disponible o vacío")

print("="*70)


VERIFICANDO METADATOS

✓ meta_healthy disponible: 12029 muestras
  Ejemplo de metadata[0]:
    Tipo: <class 'modules.core.dataset.SampleMeta'>
    Valor: SampleMeta(subject_id='10', vowel_type='a_n', condition='unknown', filename='10-a_n.wav', segment_id=0, sr=44100)

✓ meta_parkinson disponible: 23097 muestras
  Ejemplo de metadata[0]:
    Tipo: <class 'modules.core.dataset.SampleMeta'>
    Valor: SampleMeta(subject_id='1037', vowel_type='a_h', condition='unknown', filename='1037-a_h.wav', segment_id=0, sr=44100)


## 3. Split Train/Val/Test

Split estratificado 70/15/15 para mantener proporciones de clases.


In [34]:
# ============================================================
# 10-FOLD CROSS-VALIDATION ESTRATIFICADO POR HABLANTE
# ============================================================

print("="*70)
print("10-FOLD CROSS-VALIDATION (PAPER IBARRA 2023)")
print("="*70)

# Preparar metadata combinada para create_10fold_splits_by_speaker
# La metadata ya fue cargada antes con meta_healthy y meta_parkinson

# Crear lista de metadata combinada con labels
metadata_combined = []

# Agregar metadata de healthy (label=0)
for meta in meta_healthy:
    metadata_combined.append({
        "subject_id": meta.subject_id,
        "label": 0,  # Healthy
        "filename": meta.filename
    })

# Agregar metadata de parkinson (label=1)
for meta in meta_parkinson:
    metadata_combined.append({
        "subject_id": meta.subject_id,
        "label": 1,  # Parkinson
        "filename": meta.filename
    })

print(f"\n📊 Dataset info:")
print(f"   • Total samples: {len(X_combined)}")
print(f"   • Metadata entries: {len(metadata_combined)}")

# Crear 10-fold splits usando la función centralizada
# Esta función asegura que todos los samples de un speaker están en el mismo fold
fold_splits = create_10fold_splits_by_speaker(
    metadata_list=metadata_combined,
    n_folds=KFOLD_CONFIG["n_splits"],
    seed=KFOLD_CONFIG["random_state"]
)

# Para este notebook, usaremos el primer fold como ejemplo
# En el paper real se promedian los resultados de los 10 folds
train_indices = fold_splits[0]["train"]
val_indices = fold_splits[0]["val"]

# Crear splits de train/val usando los índices
X_train = X_combined[train_indices]
y_train = y_combined[train_indices]
X_val = X_combined[val_indices]
y_val = y_combined[val_indices]

# Para test, usamos un split separado del 15%
# TODO: Esto debería también usar split por speaker para evitar leakage
X_train_val, X_test, y_train_val, y_test = train_test_split(
    X_combined, y_combined,
    test_size=0.15,
    random_state=42,
    stratify=y_combined
)

print(f"\nTAMAÑOS DE SPLITS:")
print(f"   - Train: {len(X_train)} ({len(X_train)/len(X_combined)*100:.1f}%)")
print(f"   - Val:   {len(X_val)} ({len(X_val)/len(X_combined)*100:.1f}%)")
print(f"   - Test:  {len(X_test)} ({len(X_test)/len(X_combined)*100:.1f}%)")

print(f"\nDISTRIBUCIÓN POR SPLIT:")
for split_name, y_split in [("Train", y_train), ("Val", y_val), ("Test", y_test)]:
    n_healthy = (y_split == 0).sum().item()
    n_parkinson = (y_split == 1).sum().item()
    print(f"   {split_name:5s}: HC={n_healthy:4d} ({n_healthy/len(y_split)*100:.1f}%), PD={n_parkinson:4d} ({n_parkinson/len(y_split)*100:.1f}%)")

print("="*70)


10-FOLD CROSS-VALIDATION (PAPER IBARRA 2023)

📊 Dataset info:
   • Total samples: 35126
   • Metadata entries: 35126

📊 10-Fold CV speaker-independent creado:
   Total hablantes: 2042
   Total muestras: 35126
   Folds: 10

   Fold 1 (ejemplo):
      Train: 31789 muestras
      Val:   3337 muestras

TAMAÑOS DE SPLITS:
   - Train: 31789 (90.5%)
   - Val:   3337 (9.5%)
   - Test:  5269 (15.0%)

DISTRIBUCIÓN POR SPLIT:
   Train: HC=10916 (34.3%), PD=20873 (65.7%)
   Val  : HC=1113 (33.4%), PD=2224 (66.6%)
   Test : HC=1804 (34.2%), PD=3465 (65.8%)


In [35]:
# Agregar módulos propios al path
# El notebook está en research/, pero modules/ está en el directorio raíz
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# ============================================================
# CREAR DATALOADERS
# ============================================================

print("\n📦 CREANDO DATALOADERS...")

BATCH_SIZE = 32

# Importar DictDataset desde el módulo core

# Crear datasets con formato de diccionario
train_dataset = DictDataset(X_train, y_train)
val_dataset = DictDataset(X_val, y_val)
test_dataset = DictDataset(X_test, y_test)

# Crear DataLoaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE*2, shuffle=False, num_workers=0)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE*2, shuffle=False, num_workers=0)

print(f"✅ DataLoaders creados:")
print(f"   • Train batches: {len(train_loader)}")
print(f"   • Val batches:   {len(val_loader)}")
print(f"   • Test batches:  {len(test_loader)}")
print(f"   • Batch size:    {BATCH_SIZE}")



📦 CREANDO DATALOADERS...
✅ DataLoaders creados:
   • Train batches: 994
   • Val batches:   53
   • Test batches:  83
   • Batch size:    32


## 4. Optimización de Hiperparámetros con Optuna

Optimización automática de hiperparámetros usando Optuna para encontrar la mejor configuración del modelo CNN2D.

### Configuración:
- **Método**: Optuna con búsqueda aleatoria
- **Configuraciones**: 30% de todas las combinaciones posibles
- **Épocas por config**: 20 épocas (búsqueda rápida)
- **Métrica**: F1-score en validación
- **Espacio de búsqueda**: Según tabla del paper


In [36]:
# ============================================================
# CONFIGURAR OPTIMIZACIÓN CON OPTUNA
# ============================================================

print("="*70)
print("CONFIGURANDO OPTIMIZACIÓN CON OPTUNA")
print("="*70)

# Crear directorio para resultados de Optuna usando rutas dinámicas
optuna_results_dir = PATHS['results'] / "cnn_optuna_optimization"
optuna_results_dir.mkdir(parents=True, exist_ok=True)

print(f"Módulos de Optuna importados")
print(f"Directorio de resultados: {optuna_results_dir}")
print(f"Trials a ejecutar: {OPTUNA_CONFIG['n_trials']}")
print("="*70)


CONFIGURANDO OPTIMIZACIÓN CON OPTUNA
Módulos de Optuna importados
Directorio de resultados: /content/drive/Othercomputers/ZenBook/parkinson-voice-uncertainty/results/cnn_optuna_optimization
Trials a ejecutar: 30


In [37]:
# ============================================================
# PREPARAR DATOS PARA OPTUNA
# ============================================================

print("="*70)
print("PREPARANDO DATOS PARA OPTUNA")
print("="*70)

# Optuna trabaja directamente con PyTorch tensors (no requiere numpy)
# Los tensors ya están listos desde la carga de datos

print(f"📊 Datos preparados para Optuna:")
print(f"   - Train: {X_train.shape} (labels: {y_train.shape})")
print(f"   - Val:   {X_val.shape} (labels: {y_val.shape})")
print(f"   - Test:  {X_test.shape} (labels: {y_test.shape})")

# Verificar distribución de clases
print(f"\n📈 Distribución de clases:")
print(f"   Train - HC: {(y_train == 0).sum().item()}, PD: {(y_train == 1).sum().item()}")
print(f"   Val   - HC: {(y_val == 0).sum().item()}, PD: {(y_val == 1).sum().item()}")

print("="*70)


PREPARANDO DATOS PARA OPTUNA
📊 Datos preparados para Optuna:
   - Train: torch.Size([31789, 1, 65, 41]) (labels: torch.Size([31789]))
   - Val:   torch.Size([3337, 1, 65, 41]) (labels: torch.Size([3337]))
   - Test:  torch.Size([5269, 1, 65, 41]) (labels: torch.Size([5269]))

📈 Distribución de clases:
   Train - HC: 10916, PD: 20873
   Val   - HC: 1113, PD: 2224


In [38]:
# ============================================================
# VERIFICAR SI YA EXISTEN RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("VERIFICANDO RESULTADOS PREVIOS DE OPTUNA")
print("="*70)

# Configuración de la optimización usando configuración centralizada
# (OPTUNA_CONFIG ya está definido en la configuración centralizada)

# Verificar si ya existen resultados previos
results_csv_path = optuna_results_dir / "optuna_trials_results.csv"
best_params_path = optuna_results_dir / "best_params.json"

if results_csv_path.exists() and best_params_path.exists():
    print("✅ Se encontraron resultados previos de Optuna")
    print(f"   - Archivo de resultados: {results_csv_path}")
    print(f"   - Archivo de mejores parámetros: {best_params_path}")

    # Cargar resultados previos
    results_df = pd.read_csv(results_csv_path)
    with open(best_params_path, 'r') as f:
        best_params = json.load(f)

    print(f"\n📊 Resultados previos encontrados:")
    print(f"   - Total trials evaluados: {len(results_df)}")
    print(f"   - Mejor F1-score encontrado: {results_df['value'].max():.4f}")
    print(f"   - F1-score promedio: {results_df['value'].mean():.4f} ± {results_df['value'].std():.4f}")

    print(f"\n🏆 Mejores hiperparámetros encontrados:")
    for param, value in best_params.items():
        print(f"   - {param}: {value}")

    # Crear diccionario de resultados para compatibilidad
    optuna_results = {
        "results_df": results_df,
        "best_params": best_params,
        "study": None  # El study se carga separadamente si es necesario
    }

    print(f"\n⏭️  Saltando optimización - usando resultados previos")
    print("="*70)

else:
    print("❌ No se encontraron resultados previos de Optuna")
    print("   - Iniciando optimización desde cero")

    print(f"\n⚙️  Configuración:")
    print(f"   - Trials a ejecutar: {OPTUNA_CONFIG['n_trials']}")
    print(f"   - Épocas por trial: {OPTUNA_CONFIG['n_epochs_per_trial']}")
    print(f"   - Métrica a optimizar: {OPTUNA_CONFIG['metric']} ({OPTUNA_CONFIG['direction']})")

    print(f"\n🚀 Iniciando búsqueda de hiperparámetros con Optuna...")
    print("   (Esto puede tomar varios minutos)")

    # Ejecutar optimización
    optuna_results = optimize_cnn2d(
        X_train=X_train,
        y_train=y_train,
        X_val=X_val,
        y_val=y_val,
        input_shape=(1, 65, 41),  # (C, H, W)
        n_trials=OPTUNA_CONFIG["n_trials"],
        n_epochs_per_trial=OPTUNA_CONFIG["n_epochs_per_trial"],
        device=device,
        save_dir=str(optuna_results_dir)
    )

    print("="*70)
    print("OPTIMIZACIÓN COMPLETADA")
    print("="*70)


[I 2025-10-25 18:49:22,194] A new study created in memory with name: cnn2d_optuna


VERIFICANDO RESULTADOS PREVIOS DE OPTUNA
❌ No se encontraron resultados previos de Optuna
   - Iniciando optimización desde cero

⚙️  Configuración:
   - Trials a ejecutar: 30
   - Épocas por trial: 20
   - Métrica a optimizar: f1 (maximize)

🚀 Iniciando búsqueda de hiperparámetros con Optuna...
   (Esto puede tomar varios minutos)
Iniciando optimización con 30 trials...


  0%|          | 0/30 [00:00<?, ?it/s]

[I 2025-10-25 18:51:04,114] Trial 0 finished with value: 0.7003019814507121 and parameters: {'filters_1': 32, 'filters_2': 32, 'kernel_size_1': 5, 'kernel_size_2': 5, 'p_drop_conv': 0.20617534828874073, 'p_drop_fc': 0.5909729556485983, 'dense_units': 32, 'learning_rate': 3.5498788321965036e-05, 'weight_decay': 8.179499475211674e-06, 'optimizer': 'adam', 'batch_size': 32}. Best is trial 0 with value: 0.7003019814507121.
[I 2025-10-25 18:53:06,644] Trial 1 finished with value: 0.6942331417527667 and parameters: {'filters_1': 64, 'filters_2': 32, 'kernel_size_1': 3, 'kernel_size_2': 3, 'p_drop_conv': 0.21951547789558387, 'p_drop_fc': 0.5846656611759999, 'dense_units': 32, 'learning_rate': 1.9634341572933304e-05, 'weight_decay': 0.00011290133559092664, 'optimizer': 'adam', 'batch_size': 64}. Best is trial 0 with value: 0.7003019814507121.
[I 2025-10-25 18:54:26,674] Trial 2 finished with value: 0.6799010028291765 and parameters: {'filters_1': 32, 'filters_2': 64, 'kernel_size_1': 3, 'kerne

NameError: name 'optuna' is not defined

In [None]:
# ============================================================
# ANÁLISIS DE RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("ANÁLISIS DE RESULTADOS")
print("="*70)

# Extraer resultados
results_df = optuna_results["results_df"]
best_params = optuna_results["best_params"]
analysis = optuna_results["analysis"]

print(f"📊 Resumen de la optimización:")
print(f"   - Total configuraciones evaluadas: {len(results_df)}")
print(f"   - Mejor F1-score encontrado: {results_df['f1'].max():.4f}")
print(f"   - F1-score promedio: {results_df['f1'].mean():.4f} ± {results_df['f1'].std():.4f}")

print(f"\n🏆 Mejores hiperparámetros encontrados:")
for param, value in best_params.items():
    if param not in ['f1', 'accuracy', 'precision', 'recall', 'val_loss', 'train_loss']:
        print(f"   - {param}: {value}")

# Mostrar top 10 configuraciones
print(f"\n📈 Top 10 configuraciones:")
print("-" * 80)
top_10 = results_df.nlargest(10, 'f1')
for i, (idx, row) in enumerate(top_10.iterrows(), 1):
    print(f"{i:2d}. F1: {row['f1']:.4f} | "
          f"Acc: {row['accuracy']:.4f} | "
          f"Batch: {row['batch_size']} | "
          f"LR: {row['learning_rate']} | "
          f"Dropout: {row['p_drop_conv']}")

print("="*70)


In [None]:
# ============================================================
# GUARDAR RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("GUARDANDO RESULTADOS DE OPTUNA")
print("="*70)

# Guardar DataFrame completo con todas las configuraciones
results_csv_path = optuna_results_dir / "optuna_scan_results.csv"
results_df.to_csv(results_csv_path, index=False)
print(f"💾 Resultados completos guardados: {results_csv_path}")

# Guardar mejores parámetros
best_params_path = optuna_results_dir / "best_params.json"
with open(best_params_path, 'w') as f:
    json.dump(best_params, f, indent=2)
print(f"💾 Mejores parámetros guardados: {best_params_path}")

# Guardar resumen de optimización
summary_path = optuna_results_dir / "optimization_summary.txt"
with open(summary_path, 'w') as f:
    f.write("RESUMEN DE OPTIMIZACIÓN OPTUNA\n")
    f.write("="*50 + "\n\n")
    f.write(f"Total configuraciones evaluadas: {len(results_df)}\n")
    f.write(f"Mejor F1-score: {results_df['f1'].max():.4f}\n")
    f.write(f"F1-score promedio: {results_df['f1'].mean():.4f} ± {results_df['f1'].std():.4f}\n\n")
    f.write("MEJORES HIPERPARÁMETROS:\n")
    f.write("-"*30 + "\n")
    for param, value in best_params.items():
        if param not in ['f1', 'accuracy', 'precision', 'recall', 'val_loss', 'train_loss']:
            f.write(f"{param}: {value}\n")
    f.write("\nTOP 5 CONFIGURACIONES:\n")
    f.write("-"*30 + "\n")
    top_5 = results_df.nlargest(5, 'f1')
    for i, (idx, row) in enumerate(top_5.iterrows(), 1):
        f.write(f"{i}. F1: {row['f1']:.4f} | Acc: {row['accuracy']:.4f} | "
                f"Batch: {row['batch_size']} | LR: {row['learning_rate']}\n")

print(f"💾 Resumen guardado: {summary_path}")

print("="*70)


In [None]:
# Agregar módulos propios al path
# El notebook está en research/, pero modules/ está en el directorio raíz
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# ============================================================
# EVALUAR RESULTADOS DE OPTUNA
# ============================================================

print("="*70)
print("EVALUACIÓN DE RESULTADOS DE OPTUNA")
print("="*70)

# Importar evaluador

# Evaluar el proceso de optimización
# evaluation = check_optuna_results(str(optuna_results_dir))

print("="*70)


## 5. Re-entrenamiento con Mejores Hiperparámetros

Re-entrenar el modelo CNN2D usando los mejores hiperparámetros encontrados por Optuna, con early stopping para obtener el modelo final optimizado.


In [None]:
# ============================================================
# CREAR MODELO CON MEJORES HIPERPARÁMETROS
# ============================================================

print("="*70)
print("CREANDO MODELO CON MEJORES HIPERPARÁMETROS")
print("="*70)

# Crear modelo con mejores parámetros encontrados por Optuna
best_model = CNN2D(
    n_classes=2,
    p_drop_conv=best_params["p_drop_conv"],
    p_drop_fc=best_params["p_drop_fc"],
    input_shape=(65, 41),
    filters_1=best_params["filters_1"],
    filters_2=best_params["filters_2"],
    kernel_size_1=best_params["kernel_size_1"],
    kernel_size_2=best_params["kernel_size_2"],
    dense_units=best_params["dense_units"],
).to(device)

print(f"✅ Modelo creado con mejores hiperparámetros:")
print(f"   - Filters 1: {best_params['filters_1']}")
print(f"   - Filters 2: {best_params['filters_2']}")
print(f"   - Kernel 1: {best_params['kernel_size_1']}")
print(f"   - Kernel 2: {best_params['kernel_size_2']}")
print(f"   - Dense units: {best_params['dense_units']}")
print(f"   - Dropout conv: {best_params['p_drop_conv']}")
print(f"   - Dropout fc: {best_params['p_drop_fc']}")

# Mostrar arquitectura
print_model_summary(best_model)

print("="*70)


In [None]:
# ============================================================
# CONFIGURAR ENTRENAMIENTO CON MEJORES PARÁMETROS
# ============================================================

print("="*70)
print("CONFIGURANDO ENTRENAMIENTO FINAL")
print("="*70)

# Configuración de entrenamiento final usando configuración centralizada
FINAL_TRAINING_CONFIG = {
    "n_epochs": 100,
    "early_stopping_patience": 10,  # Reducido de 15 a 10 (recomendación)
    "learning_rate": 0.1,
    "batch_size": best_params["batch_size"]
}

# Crear DataLoaders con el mejor batch size
train_loader_final = DataLoader(
    train_dataset,
    best_params["batch_size"],  # Usar batch_size de Optuna
    shuffle=True,
    num_workers=0
)
val_loader_final = DataLoader(
    val_dataset,
    best_params["batch_size"],  # Usar batch_size de Optuna
    shuffle=False,
    num_workers=0
)
test_loader_final = DataLoader(
    test_dataset,
    best_params["batch_size"],  # Usar batch_size de Optuna
    shuffle=False,
    num_workers=0
)

# Optimizador SGD con momentum usando configuración centralizada
# CORREGIDO: Agregado nesterov=True y weight_decay=1e-4
optimizer_final = optim.SGD(
    best_model.parameters(),
    lr=FINAL_TRAINING_CONFIG['learning_rate'],
    momentum=0.9,
    weight_decay=1e-4,  # Cambiado de 0.0 a 1e-4
    nesterov=True  # Agregado Nesterov momentum
)

# Calcular class weights para balancear las clases usando configuración centralizada
if CLASS_WEIGHTS_CONFIG["enabled"]:
    class_counts = torch.bincount(y_train)
    class_weights = 1.0 / class_counts.float()
    class_weights = class_weights / class_weights.sum()
    criterion_final = nn.CrossEntropyLoss(weight=class_weights.to(device))
    print(f"✅ Class weights habilitados: {class_weights.tolist()}")
else:
    criterion_final = nn.CrossEntropyLoss()
    print("⚠️  Class weights deshabilitados")

# Scheduler StepLR usando configuración centralizada
scheduler_final = torch.optim.lr_scheduler.StepLR(
    optimizer_final,
    step_size=SCHEDULER_CONFIG["step_size"],
    gamma=SCHEDULER_CONFIG["gamma"]
)

print(f"\n⚙️  Configuración final:")
print(f"   - Learning rate inicial: {FINAL_TRAINING_CONFIG['learning_rate']}")
print(f"   - Momentum: 0.9 (Nesterov: True)")
print(f"   - Weight decay: 1e-4")
print(f"   - Scheduler: StepLR (step={SCHEDULER_CONFIG['step_size']}, gamma={SCHEDULER_CONFIG['gamma']})")
print(f"   - Batch size: {FINAL_TRAINING_CONFIG['batch_size']}")
print(f"   - Épocas máximas: {FINAL_TRAINING_CONFIG['n_epochs']}")
print(f"   - Early stopping patience: {FINAL_TRAINING_CONFIG['early_stopping_patience']}")

print("="*70)


In [None]:
# ============================================================
# ENTRENAR MODELO FINAL CON EARLY STOPPING
# ============================================================

print("="*70)
print("ENTRENANDO MODELO FINAL")
print("="*70)

# Entrenar modelo con mejores hiperparámetros
# CORREGIDO: Usar keyword arguments y monitorear val_f1 (mejor para datasets desbalanceados)
final_training_results = train_model(
    model=best_model,
    train_loader=train_loader_final,
    val_loader=val_loader_final,
    optimizer=optimizer_final,
    criterion=criterion_final,
    device=device,
    n_epochs=FINAL_TRAINING_CONFIG['n_epochs'],
    early_stopping_patience=FINAL_TRAINING_CONFIG['early_stopping_patience'],
    save_dir=optuna_results_dir,
    verbose=True,
    scheduler=scheduler_final,
    monitor_metric="f1"  # Monitorear F1 en lugar de loss (recomendado para desbalance)
)

# Extraer resultados
final_model = final_training_results["model"]
final_history = final_training_results["history"]
final_best_val_loss = final_training_results["best_val_loss"]
final_total_time = final_training_results["total_time"]

# Calcular mejor época
final_best_epoch = final_history["val_loss"].index(min(final_history["val_loss"])) + 1

print("="*70)
print("ENTRENAMIENTO FINAL COMPLETADO")
print("="*70)
print(f"✅ Resultados:")
print(f"   - Mejor época: {final_best_epoch}")
print(f"   - Mejor val loss: {final_best_val_loss:.4f}")
print(f"   - Tiempo total: {final_total_time/60:.1f} minutos")
print(f"   - Modelo guardado en: {optuna_results_dir / 'best_model_optuna.pth'}")
print("="*70)


In [None]:
# ============================================================
# EVALUACIÓN FINAL EN TEST SET
# ============================================================

print("="*70)
print("EVALUACIÓN FINAL EN TEST SET")
print("="*70)

# Evaluar modelo final en test set
final_test_metrics = detailed_evaluation(
    model=final_model,
    loader=test_loader_final,
    device=device,
    class_names=["Healthy", "Parkinson"]
)

# Imprimir reporte
print_evaluation_report(final_test_metrics, class_names=["Healthy", "Parkinson"])

# Guardar métricas finales
final_metrics_path = optuna_results_dir / "test_metrics_optuna.json"

# Extraer métricas del classification_report
final_report = final_test_metrics["classification_report"]
final_metrics_to_save = {
    "accuracy": float(final_test_metrics["accuracy"]),
    "f1_macro": float(final_test_metrics["f1_macro"]),
    "precision_macro": float(final_report["macro avg"]["precision"]),
    "recall_macro": float(final_report["macro avg"]["recall"]),
    "f1_weighted": float(final_report["weighted avg"]["f1-score"]),
    "confusion_matrix": final_test_metrics["confusion_matrix"].tolist(),
    "classification_report": final_report,
    "best_hyperparameters": best_params,
    "training_config": FINAL_TRAINING_CONFIG,
    "final_epoch": final_best_epoch,
    "final_val_loss": final_best_val_loss,
    "training_time_minutes": final_total_time / 60
}

with open(final_metrics_path, "w") as f:
    json.dump(final_metrics_to_save, f, indent=2)

print(f"\n💾 Métricas finales guardadas en: {final_metrics_path}")
print("="*70)


In [None]:
# ============================================================
# VISUALIZACIÓN FINAL
# ============================================================

print("="*70)
print("GENERANDO VISUALIZACIONES FINALES")
print("="*70)

# Graficar progreso del entrenamiento final
final_progress_fig = plot_training_history(
    final_history,
    save_path=optuna_results_dir / "training_progress_optuna.png"
)

# Matriz de confusión final
final_cm = final_test_metrics["confusion_matrix"]
final_cm_fig = plot_confusion_matrix(
    final_cm,
    class_names=["Healthy", "Parkinson"],
    title="Matriz de Confusión - Test Set (CNN2D Optimizado con Optuna)",
    save_path=optuna_results_dir / "confusion_matrix_optuna.png",
    show=True
)

print(f"💾 Visualizaciones guardadas:")
print(f"   - Progreso de entrenamiento: {optuna_results_dir / 'training_progress_optuna.png'}")
print(f"   - Matriz de confusión: {optuna_results_dir / 'confusion_matrix_optuna.png'}")

print("="*70)


In [None]:
# ============================================================
# RESUMEN FINAL DE OPTIMIZACIÓN
# ============================================================

print("="*70)
print("RESUMEN FINAL DE OPTIMIZACIÓN CON OPTUNA")
print("="*70)

print(f"\n🔍 PROCESO DE OPTIMIZACIÓN:")
print(f"   - Configuraciones evaluadas: {len(results_df)}")
print(f"   - Mejor F1-score en validación: {results_df['f1'].max():.4f}")
print(f"   - F1-score promedio: {results_df['f1'].mean():.4f} ± {results_df['f1'].std():.4f}")

print(f"\n🏆 MEJORES HIPERPARÁMETROS ENCONTRADOS:")
for param, value in best_params.items():
    if param not in ['f1', 'accuracy', 'precision', 'recall', 'val_loss', 'train_loss']:
        print(f"   - {param}: {value}")

print(f"\n📊 RESULTADOS FINALES EN TEST SET:")
final_report = final_test_metrics["classification_report"]
print(f"   - Accuracy:  {final_test_metrics['accuracy']:.4f}")
print(f"   - Precision: {final_report['macro avg']['precision']:.4f}")
print(f"   - Recall:    {final_report['macro avg']['recall']:.4f}")
print(f"   - F1-Score:  {final_test_metrics['f1_macro']:.4f}")

print(f"\n💾 ARCHIVOS GUARDADOS EN {optuna_results_dir}:")
print(f"   - optuna_scan_results.csv          # Todas las configuraciones probadas")
print(f"   - best_params.json                # Mejores hiperparámetros")
print(f"   - optimization_summary.txt        # Resumen de optimización")
print(f"   - best_model_optuna.pth           # Modelo final optimizado")
print(f"   - test_metrics_optuna.json        # Métricas en test set")
print(f"   - training_progress_optuna.png    # Gráfica de entrenamiento")
print(f"   - confusion_matrix_optuna.png   # Matriz de confusión")

print("="*70)
print("OPTIMIZACIÓN CON OPTUNA COMPLETADA EXITOSAMENTE")
print("="*70)
