# Hyperparameter Tuning con Optuna

## Objetivos

- Comprender qu√© son los **hyperpar√°metros** y por qu√© necesitan optimizaci√≥n
- Aprender a usar **Optuna** para buscar autom√°ticamente la mejor configuraci√≥n
- Definir espacios de b√∫squeda con diferentes tipos de par√°metros
- Comparar configuraciones de modelos
- Entrenar el modelo final con los mejores hyperpar√°metros

## Introducci√≥n a Hyperparameters

### ¬øPar√°metros vs Hyperpar√°metros?

**Par√°metros:**
- Los **pesos** (weights) y **sesgos** (biases) de la red neuronal
- Se aprenden autom√°ticamente durante el entrenamiento
- NO se establecen manualmente

**Hyperpar√°metros:**
- Configuraciones que T√ö estableces antes de entrenar
- NO se aprenden autom√°ticamente
- Afectan C√ìMO el modelo aprende

**Ejemplos de hyperpar√°metros:**
- N√∫mero de capas (network depth)
- N√∫mero de neuronas por capa (network width)
- Learning rate
- Batch size
- Tasa de Dropout
- Coeficiente de regularizaci√≥n L2
- Epochs, patience para early stopping

### ¬øPor qu√© optimizar hyperpar√°metros?

La elecci√≥n de hyperpar√°metros tiene un **GRAN IMPACTO** en el rendimiento final:
- Learning rate muy alto ‚Üí Entrenamiento inestable
- Learning rate muy bajo ‚Üí Convergencia lenta
- Modelo muy peque√±o ‚Üí Underfitting
- Modelo muy grande ‚Üí Overfitting
- Batch size afecta estabilidad y velocidad

**Soluci√≥n antigua:** B√∫squeda manual (lenta, subjetiva)
**Soluci√≥n moderna:** B√∫squeda automatizada con **Optuna** ‚úÖ

## ¬øQu√© es Optuna?

**Optuna** es un framework de c√≥digo abierto para optimizaci√≥n bayesiana de hiperpar√°metros.

**Caracter√≠sticas:**
- üéØ Busca autom√°ticamente los mejores hyperpar√°metros
- üìä Usa Machine Learning para guiar la b√∫squeda (Bayesian optimization)
- ‚ö° Eficiente - prueba menos configuraciones que b√∫squeda exhaustiva
- üêç Pyth√≥nico - API simple e intuitiva
- üîß Flexible - funciona con cualquier framework (PyTorch, TensorFlow, sklearn, etc.)
- üìà Visualizaciones √∫tiles del proceso de optimizaci√≥n

**Instalaci√≥n:**
```bash
pip install optuna
```

**Documentaci√≥n:** [Optuna oficial](https://optuna.readthedocs.io/)

## Importar librer√≠as

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import optuna
from optuna.trial import TrialState
from optuna.visualization import plot_optimization_history, plot_param_importances

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print(f"PyTorch version: {torch.__version__}")
print(f"Optuna version: {optuna.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Configurar visualizaci√≥n
plt.style.use('default')
sns.set_palette("husl")

# Fijar seeds para reproducibilidad
np.random.seed(42)
torch.manual_seed(42)

## Cargar y preparar datos

In [None]:
# Cargar dataset
if 'google.colab' in str(get_ipython()):
  data_path = 'https://raw.githubusercontent.com/cursos-COnCEPT/curso-tensorflow/refs/heads/main/CCP.csv'
else:
  data_path = os.getcwd() + '\\CCP.csv'

dataset = pd.read_csv(data_path, sep=',')
print(f"Dataset shape: {dataset.shape}")

In [None]:
# Divisi√≥n train-val-test (proporci√≥n est√°ndar)
train_ratio = 0.70
val_ratio = 0.15
test_ratio = 0.15

X = dataset.sample(frac=train_ratio+val_ratio, random_state=42)
X_test = dataset.drop(X.index)
X_train = X.sample(frac=train_ratio/(val_ratio+train_ratio), random_state=42)
X_val = X.drop(X_train.index)

# Separar features y target
y_train = X_train.pop('PE')
y_test = X_test.pop('PE')
y_val = X_val.pop('PE')

print(f"Train: {X_train.shape[0]}, Val: {X_val.shape[0]}, Test: {X_test.shape[0]}")

In [None]:
# Normalizaci√≥n con StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train_norm = scaler.transform(X_train)
X_val_norm = scaler.transform(X_val)
X_test_norm = scaler.transform(X_test)

# Convertir a tensors de PyTorch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

X_train_tensor = torch.FloatTensor(X_train_norm).to(device)
y_train_tensor = torch.FloatTensor(y_train.values).reshape(-1, 1).to(device)

X_val_tensor = torch.FloatTensor(X_val_norm).to(device)
y_val_tensor = torch.FloatTensor(y_val.values).reshape(-1, 1).to(device)

X_test_tensor = torch.FloatTensor(X_test_norm).to(device)
y_test_tensor = torch.FloatTensor(y_test.values).reshape(-1, 1).to(device)

print(f"Dispositivo: {device}")
print(f"Input size: {X_train_norm.shape[1]}")

## Paso 1: Definir el espacio de b√∫squeda

### ¬øQu√© par√°metros vamos a optimizar?

Optaremos por optimizar los siguientes hyperpar√°metros:

1. **`hidden_size1`:** N√∫mero de neuronas en la 1¬™ capa (16-256)
2. **`hidden_size2`:** N√∫mero de neuronas en la 2¬™ capa (8-128)
3. **`learning_rate`:** Opciones discretas [0.001, 0.005, 0.01]
4. **`dropout_rate`:** Tasa de dropout (0.0-0.5)
5. **`weight_decay`:** Regularizaci√≥n L2 (0.0-0.01)
6. **`batch_size`:** Tama√±o de batch [32, 64, 128, 256]

### Tipos de par√°metros en Optuna

- **`trial.suggest_int()`:** Par√°metros enteros (ej: n√∫mero de neuronas)
- **`trial.suggest_float()`:** Par√°metros continuos (ej: learning rate)
- **`trial.suggest_categorical()`:** Opciones discretas (ej: batch size)

**Referencia:** [`Optuna API - Trial object`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html)

In [None]:
class TrialModel(nn.Module):
    """Modelo parametrizado para Optuna"""
    def __init__(self, input_size, hidden_size1, hidden_size2, dropout_rate):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_size, hidden_size1),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            
            nn.Linear(hidden_size1, hidden_size2),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            
            nn.Linear(hidden_size2, 1)
        )
    
    def forward(self, x):
        return self.net(x)

print("Clase TrialModel definida")

In [None]:
def objective(trial):
    """
    Funci√≥n objetivo para Optuna.
    Optuna intentar√° MINIMIZAR el valor retornado (validation loss).
    """
    
    # ============ DEFINIR ESPACIO DE B√öSQUEDA ============
    # Par√°metros enteros para la arquitectura
    hidden_size1 = trial.suggest_int('hidden_size1', 16, 256, step=16)
    hidden_size2 = trial.suggest_int('hidden_size2', 8, 128, step=8)
    dropout_rate = trial.suggest_float('dropout_rate', 0.0, 0.5)
    
    # Par√°metros de entrenamiento
    learning_rate = trial.suggest_categorical('learning_rate', [1e-3, 5e-3, 1e-2])
    weight_decay = trial.suggest_float('weight_decay', 0.0, 0.01, log=True)  # log=True para espaciado logar√≠tmico
    batch_size = trial.suggest_categorical('batch_size', [32, 64, 128, 256])
    
    # ============ CREAR MODELO ============
    input_size = X_train_norm.shape[1]
    model = TrialModel(input_size, hidden_size1, hidden_size2, dropout_rate).to(device)
    
    # ============ CONFIGURAR ENTRENAMIENTO ============
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    
    # DataLoader con el batch_size sugerido
    train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    
    # ============ ENTRENAMIENTO ============
    epochs = 100
    best_val_loss = float('inf')
    patience = 15
    patience_counter = 0
    
    for epoch in range(epochs):
        # Training
        model.train()
        for X_batch, y_batch in train_loader:
            X_batch = X_batch.to(device)
            y_batch = y_batch.to(device)
            
            y_pred = model(X_batch)
            loss = criterion(y_pred, y_batch)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        # Validation
        model.eval()
        with torch.no_grad():
            y_val_pred = model(X_val_tensor)
            val_loss = criterion(y_val_pred, y_val_tensor).item()
        
        # Early stopping
        if val_loss < best_val_loss - 0.0001:
            best_val_loss = val_loss
            patience_counter = 0
        else:
            patience_counter += 1
            if patience_counter >= patience:
                break
        
        # Informar a Optuna del progreso (permite pruning)
        trial.report(val_loss, epoch)
        
        # Opcionalmente, Optuna puede detener ensayos prometedores
        if trial.should_prune():
            raise optuna.TrialPruned()
    
    return best_val_loss

print("Funci√≥n objetivo definida")

## Paso 2: Crear y ejecutar el Study de Optuna

### ¬øQu√© es un Study?

Un **Study** en Optuna es el objeto que orquesta toda la optimizaci√≥n:
- Realiza m√∫ltiples "trials" (pruebas)
- Guarda el historial de todos los trials
- Usa Bayesian optimization para elegir qu√© probar despu√©s
- Encuentra la mejor configuraci√≥n

**Opciones principales:**
- `direction='minimize'` o `'maximize'` - ¬øqu√© optimizamos?
- `sampler` - Algoritmo de b√∫squeda (TPE por defecto - muy bueno)
- `pruner` - Detiene trials malos temprano

**Referencia:** [`optuna.create_study()`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.create_study.html)

In [None]:
# Crear un Study
# TPE (Tree-structured Parzen Estimator) es el sampler por defecto y muy efectivo
study = optuna.create_study(
    direction='minimize',  # Queremos minimizar validation loss
    sampler=optuna.samplers.TPESampler(seed=42),
    pruner=optuna.pruners.MedianPruner(),  # Detiene trials que van mal
)

print("Study creado")
print(f"Sampler: {study.sampler}")
print(f"Pruner: {study.pruner}")

In [None]:
# Ejecutar la optimizaci√≥n
# Esto probar√° diferentes combinaciones de hyperpar√°metros
print("\n" + "="*70)
print("INICIANDO OPTIMIZACI√ìN DE HYPERPAR√ÅMETROS CON OPTUNA")
print("="*70)

study.optimize(
    objective,
    n_trials=20,  # N√∫mero de configuraciones a probar
    show_progress_bar=True,
    gc_after_trial=True,  # Liberar memoria despu√©s de cada trial
)

print("\n" + "="*70)
print("OPTIMIZACI√ìN COMPLETADA")
print("="*70)

## Paso 3: Analizar resultados de Optuna

In [None]:
# Obtener el mejor trial
best_trial = study.best_trial

print("\nüìä MEJOR CONFIGURACI√ìN ENCONTRADA")
print("="*70)
print(f"Validation Loss: {best_trial.value:.6f}")
print(f"\nHyperpar√°metros:")
for key, value in best_trial.params.items():
    print(f"  {key}: {value}")
print("="*70)

In [None]:
# Resumen de trials
trials_df = study.trials_dataframe()
print(f"\nResumen de {len(trials_df)} trials realizados:")
print(trials_df[['number', 'value', 'state']].head(10))

# Estad√≠sticas
print(f"\nEstad√≠sticas:")
print(f"  Mejor valor (min): {trials_df['value'].min():.6f}")
print(f"  Peor valor (max): {trials_df['value'].max():.6f}")
print(f"  Promedio: {trials_df['value'].mean():.6f}")
print(f"  Trials completados: {len(trials_df[trials_df['state'] == 'COMPLETE'])}")
print(f"  Trials prunados: {len(trials_df[trials_df['state'] == 'PRUNED'])}")

In [None]:
# Visualizar el historial de optimizaci√≥n
fig = plot_optimization_history(study).show()
plt.tight_layout()
plt.show()

print("\nüìà El gr√°fico muestra c√≥mo el mejor valor encontrado mejora en cada trial")

In [None]:
# Importancia de par√°metros
fig = plot_param_importances(study).show()
plt.tight_layout()
plt.show()

print("\nüìä Importancia de par√°metros:")
print("Muestra qu√© par√°metros tienen mayor impacto en el resultado")

## Paso 4: Entrenar modelo final con los mejores hyperpar√°metros

In [None]:
# Extraer los mejores hyperpar√°metros
best_params = best_trial.params

print("\n" + "="*70)
print("ENTRENANDO MODELO FINAL CON MEJORES HYPERPAR√ÅMETROS")
print("="*70)

# Crear el modelo
input_size = X_train_norm.shape[1]
final_model = TrialModel(
    input_size,
    hidden_size1=best_params['hidden_size1'],
    hidden_size2=best_params['hidden_size2'],
    dropout_rate=best_params['dropout_rate']
).to(device)

print(f"\nModelo creado con arquitectura:")
print(f"  Capa 1: {input_size} ‚Üí {best_params['hidden_size1']} + ReLU + Dropout({best_params['dropout_rate']:.3f})")
print(f"  Capa 2: {best_params['hidden_size1']} ‚Üí {best_params['hidden_size2']} + ReLU + Dropout({best_params['dropout_rate']:.3f})")
print(f"  Salida: {best_params['hidden_size2']} ‚Üí 1")

total_params = sum(p.numel() for p in final_model.parameters())
print(f"  Total de par√°metros: {total_params}")

In [None]:
# Configurar entrenamiento
criterion = nn.MSELoss()
optimizer = optim.Adam(
    final_model.parameters(),
    lr=best_params['learning_rate'],
    weight_decay=best_params['weight_decay']
)

# DataLoader
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)

print(f"\nConfiguraci√≥n de entrenamiento:")
print(f"  Learning rate: {best_params['learning_rate']}")
print(f"  Weight decay (L2): {best_params['weight_decay']:.6f}")
print(f"  Batch size: {best_params['batch_size']}")
print(f"  Batches por √©poca: {len(train_loader)}")

In [None]:
# Entrenar el modelo final
epochs = 150
history = {'train_loss': [], 'val_loss': [], 'train_mae': [], 'val_mae': []}
best_val_loss = float('inf')
patience = 20
patience_counter = 0

print("\nEntrenando...")

for epoch in range(epochs):
    # Training
    final_model.train()
    train_loss = 0.0
    train_mae = 0.0
    
    for X_batch, y_batch in train_loader:
        X_batch = X_batch.to(device)
        y_batch = y_batch.to(device)
        
        y_pred = final_model(X_batch)
        loss = criterion(y_pred, y_batch)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item() * X_batch.size(0)
        train_mae += torch.abs(y_pred - y_batch).sum().item()
    
    train_loss /= len(train_loader.dataset)
    train_mae /= len(train_loader.dataset)
    
    # Validation
    final_model.eval()
    with torch.no_grad():
        y_val_pred = final_model(X_val_tensor)
        val_loss = criterion(y_val_pred, y_val_tensor).item()
        val_mae = torch.abs(y_val_pred - y_val_tensor).mean().item()
    
    history['train_loss'].append(train_loss)
    history['val_loss'].append(val_loss)
    history['train_mae'].append(train_mae)
    history['val_mae'].append(val_mae)
    
    # Early stopping
    if val_loss < best_val_loss - 0.0001:
        best_val_loss = val_loss
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print(f"Early stopping en √©poca {epoch}")
            break
    
    if (epoch + 1) % 25 == 0:
        print(f"√âpoca {epoch+1}/{epochs}, Train Loss: {train_loss:.6f}, Val Loss: {val_loss:.6f}")

print(f"\nEntrenamiento completado en {len(history['train_loss'])} √©pocas")

In [None]:
# Visualizar el entrenamiento
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss
axes[0].plot(history['train_loss'], label='Train Loss', linewidth=2)
axes[0].plot(history['val_loss'], label='Val Loss', linewidth=2)
axes[0].set_xlabel('√âpoca', fontsize=12)
axes[0].set_ylabel('MSE Loss', fontsize=12)
axes[0].set_title('Evoluci√≥n de la Funci√≥n de P√©rdida', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# MAE
axes[1].plot(history['train_mae'], label='Train MAE', linewidth=2)
axes[1].plot(history['val_mae'], label='Val MAE', linewidth=2)
axes[1].set_xlabel('√âpoca', fontsize=12)
axes[1].set_ylabel('MAE (MW)', fontsize=12)
axes[1].set_title('Evoluci√≥n del Error Absoluto Medio', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Paso 5: Evaluaci√≥n final en test set

In [None]:
# Evaluar en test set
final_model.eval()
with torch.no_grad():
    y_test_pred = final_model(X_test_tensor).cpu().numpy().flatten()
    y_train_pred = final_model(X_train_tensor).cpu().numpy().flatten()

# Calcular m√©tricas
y_test_np = y_test.values
y_train_np = y_train.values

test_mae = mean_absolute_error(y_test_np, y_test_pred)
test_mse = mean_squared_error(y_test_np, y_test_pred)
test_r2 = r2_score(y_test_np, y_test_pred)

train_mae = mean_absolute_error(y_train_np, y_train_pred)
train_mse = mean_squared_error(y_train_np, y_train_pred)
train_r2 = r2_score(y_train_np, y_train_pred)

print("\n" + "="*70)
print("EVALUACI√ìN EN TEST SET")
print("="*70)
print(f"\nTrain:")
print(f"  MAE: {train_mae:.4f} MW")
print(f"  MSE: {train_mse:.4f}")
print(f"  R¬≤: {train_r2:.4f}")

print(f"\nTest:")
print(f"  MAE: {test_mae:.4f} MW")
print(f"  MSE: {test_mse:.4f}")
print(f"  R¬≤: {test_r2:.4f}")

print(f"\nGeneralizaci√≥n (Test - Train):")
print(f"  MAE difference: {test_mae - train_mae:.4f} MW")
print(f"  R¬≤ difference: {test_r2 - train_r2:.4f}")
print("="*70)

In [None]:
# Visualizar predicciones
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Train set
axes[0].scatter(y_train_np, y_train_pred, alpha=0.5, s=20)
axes[0].plot([y_train_np.min(), y_train_np.max()], [y_train_np.min(), y_train_np.max()], 'r--', lw=2)
axes[0].set_xlabel('Valor Real (MW)', fontsize=12)
axes[0].set_ylabel('Predicci√≥n (MW)', fontsize=12)
axes[0].set_title(f'Train Set (MAE: {train_mae:.2f} MW)', fontsize=13, fontweight='bold')
axes[0].grid(True, alpha=0.3)
axes[0].axis('equal')

# Test set
axes[1].scatter(y_test_np, y_test_pred, alpha=0.5, s=20, color='orange')
axes[1].plot([y_test_np.min(), y_test_np.max()], [y_test_np.min(), y_test_np.max()], 'r--', lw=2)
axes[1].set_xlabel('Valor Real (MW)', fontsize=12)
axes[1].set_ylabel('Predicci√≥n (MW)', fontsize=12)
axes[1].set_title(f'Test Set (MAE: {test_mae:.2f} MW)', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3)
axes[1].axis('equal')

plt.tight_layout()
plt.show()

## Resumen: Optuna vs B√∫squeda Manual

### Ventajas de usar Optuna

| Aspecto | B√∫squeda Manual | Optuna |
|--------|------------------|--------|
| **Tiempo** | ‚è±Ô∏è Horas/d√≠as | ‚ö° Minutos |
| **Consistencia** | üòï Sesgos humanos | üéØ Reproducible |
| **Eficiencia** | üìâ Prueba todo | üìà Aprende del historial |
| **Documentaci√≥n** | üìù Dif√≠cil de seguir | üìä An√°lisis autom√°ticos |
| **Escalabilidad** | üêå Dif√≠cil con muchos params | üöÄ Funciona con 10+ params |

### Conceptos clave

- **Trial:** Una prueba con una combinaci√≥n de hyperpar√°metros
- **Study:** El proceso completo de optimizaci√≥n
- **Objective:** La funci√≥n que Optuna minimiza/maximiza
- **Bayesian Optimization:** Algoritmo que aprende qu√© par√°metros funcionan mejor
- **Pruning:** Detener trials que van mal temprano

### Pr√≥ximos pasos

1. **Expandir espacio de b√∫squeda:**
   - Optimizar m√°s hyperpar√°metros
   - Explorar diferentes arquitecturas (3+ capas)
   - Incluir t√©cnicas de regularizaci√≥n

2. **Usar callbacks y checkpoints:**
   - Guardar el mejor modelo encontrado
   - An√°lisis m√°s detallado de resultados

3. **Cross-validation:**
   - Evaluar en m√∫ltiples splits de datos
   - Resultados m√°s robustos

4. **Otras librer√≠as:**
   - Ray Tune (para distribuido)
   - Hyperopt
   - Grid/Random search baselines

## Referencias

- [Optuna Documentaci√≥n Oficial](https://optuna.readthedocs.io/)
- [PyTorch Optim](https://pytorch.org/docs/stable/optim.html)
- [Hyperparameter Optimization - Andrew Ng](https://www.deeplearning.ai/)
- [Practical Hyperparameter Optimization - Sebastian Raschka](https://sebastianraschka.com/)