## Configuración del entorno de entrenamiento

### Optimizaciones CUDA y configuración del dispositivo

Se configuran las optimizaciones para CUDA:
- **cudnn.benchmark**: Optimiza automáticamente los algoritmos de convolución para el hardware específico
- **allow_tf32**: Permite usar TensorFloat-32 para acelerar operaciones en GPUs Ampere y superiores
- **DEVICE**: Detecta automáticamente si CUDA está disponible para usar GPU

In [34]:
import torch
torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### Montaje de Google Drive

Esta celda monta Google Drive para acceder a los datos almacenados en la nube. 
**Nota**: Esta celda solo funciona en Google Colab. Para uso local, asegúrate de tener los datos en la ruta correcta.

In [35]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Carga de datos por arquitectura

### Funciones para carga y preprocesamiento de datos

Esta sección incluye:

1. **WEIGHTS_MAP**: Mapeo de arquitecturas a sus pesos preentrenados oficiales
2. **get_transforms()**: Obtiene las transformaciones oficiales para cada arquitectura
3. **load_data_by_arch()**: Carga los datasets optimizados con:
   - Transformaciones específicas por arquitectura
   - DataLoaders optimizados con `pin_memory`, `persistent_workers` y `prefetch_factor`
   - Configuración para train/validation/test splits

**Optimizaciones implementadas**:
- `num_workers=8`: Carga de datos en paralelo
- `pin_memory=True`: Transferencia más rápida CPU→GPU
- `persistent_workers=True`: Reutiliza workers entre epochs
- `prefetch_factor=4`: Prebufferiza lotes para reducir latencia

In [37]:
from torch.utils.data import DataLoader
from torchvision import datasets, models, transforms

WEIGHTS_MAP = {
    "resnet50": models.ResNet50_Weights.DEFAULT,
    "efficientnet_b3": models.EfficientNet_B3_Weights.DEFAULT,
    "mobilenet_v3_large": models.MobileNet_V3_Large_Weights.DEFAULT,
}

def get_transforms(arch: str):
    """
    Devuelve los transforms oficiales según la arquitectura.
    """
    arch = arch.lower()
    weights = WEIGHTS_MAP[arch]
    tfm = weights.transforms(antialias=True)
    return tfm

# --- DataLoaders optimizados ---
def load_data_by_arch(
    arch: str,
    batch_size: int,
    num_workers: int = 8,
):
    base = f"/content/drive/MyDrive/data/versionB"

    t_train = get_transforms(arch)
    t_eval  = get_transforms(arch)

    train_ds = datasets.ImageFolder(f"{base}/train", transform=t_train)
    val_ds   = datasets.ImageFolder(f"{base}/val",   transform=t_eval)
    test_ds  = datasets.ImageFolder(f"{base}/test",  transform=t_eval)

    common = dict(num_workers=num_workers,
                  pin_memory=True,
                  persistent_workers=True)

    train_loader = DataLoader(
        train_ds, batch_size=batch_size, shuffle=True,
        prefetch_factor=4, drop_last=True, **common
    )
    val_loader = DataLoader(
        val_ds, batch_size=batch_size, shuffle=False, **common
    )
    test_loader = DataLoader(
        test_ds, batch_size=batch_size, shuffle=False, **common
    )
    return train_loader, val_loader, test_loader


## Funciones de entrenamiento y evaluación

Esta sección contiene todas las funciones principales para el entrenamiento:

### 🏗️ Construcción del modelo (`build_model`)
- Soporte para ResNet50, EfficientNet-B3 y MobileNet-V3-Large
- Utiliza pesos preentrenados de ImageNet
- Reemplaza la cabeza clasificadora con dropout personalizado
- Aplica optimizaciones de memoria (`channels_last`, `torch.compile`)

### 📊 Evaluación (`evaluate`)
Calcula métricas comprehensivas:
- **Básicas**: Accuracy, Balanced Accuracy, F1-macro, Precision, Recall
- **Avanzadas**: ROC-AUC macro, PR-AUC macro, Cohen's Kappa, Matthews Correlation
- **Por clase**: Reporte detallado con precision/recall/F1 por categoría
- **Matriz de confusión**: Para análisis detallado de errores

### 🚀 Entrenamiento (`train_once`)
Implementa técnicas avanzadas:
- **Learning rates diferenciales**: Cabeza (1e-3) vs Backbone (1e-4)
- **Mixed Precision Training**: AMP para acelerar entrenamiento
- **Label Smoothing**: Reduce overfitting (α=0.05)
- **ReduceLROnPlateau**: Reduce LR cuando F1-macro no mejora
- **Early Stopping**: Para en base a F1-macro con paciencia de 5 epochs

### 💾 Pipeline completo (`run_single_training`)
Orquesta todo el proceso y guarda automáticamente:
- Mejor checkpoint del modelo
- Historial de entrenamiento completo

In [None]:
import torch
import torch.nn as nn
from torch.optim import AdamW
from torch.amp import autocast, GradScaler
from torchvision import models
from sklearn.metrics import (
    accuracy_score, f1_score, precision_score, recall_score, confusion_matrix,
    balanced_accuracy_score, roc_auc_score, average_precision_score,
    cohen_kappa_score, matthews_corrcoef, classification_report
)
import numpy as np
import copy
from sklearn.preprocessing import label_binarize

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def build_model(arch: str, num_classes: int, dropout: float = 0.2, pretrained: bool = True):
    arch = arch.lower()
    if arch == "resnet50":
        weights = models.ResNet50_Weights.DEFAULT if pretrained else None
        m = models.resnet50(weights=weights)
        in_f = m.fc.in_features
        m.fc = nn.Sequential(nn.Dropout(dropout), nn.Linear(in_f, num_classes))

    elif arch == "efficientnet_b3":
        weights = models.EfficientNet_B3_Weights.DEFAULT if pretrained else None
        m = models.efficientnet_b3(weights=weights)
        in_f = m.classifier[1].in_features
        m.classifier[1] = nn.Sequential(nn.Dropout(dropout), nn.Linear(in_f, num_classes))

    elif arch == "mobilenet_v3_large":
        weights = models.MobileNet_V3_Large_Weights.DEFAULT if pretrained else None
        m = models.mobilenet_v3_large(weights=weights)
        in_f = m.classifier[3].in_features
        m.classifier[3] = nn.Sequential(nn.Dropout(dropout), nn.Linear(in_f, num_classes))

    for p in m.parameters():
        p.requires_grad = True
    m = m.to(DEVICE).to(memory_format=torch.channels_last)
    m = torch.compile(m, mode="reduce-overhead", fullgraph=False)
    return m


def _to_device_cl(x: torch.Tensor) -> torch.Tensor:
    return x.to(DEVICE, non_blocking=True).to(memory_format=torch.channels_last)


@torch.no_grad()
def evaluate(model: nn.Module, loader, criterion: nn.Module):
    model.eval()
    losses, y_true, y_prob = [], [], []
    class_names = getattr(loader.dataset, "classes", None)

    for x, y in loader:
        x = _to_device_cl(x)
        y = y.to(DEVICE, non_blocking=True)
        logits = model(x)
        loss = criterion(logits, y)
        losses.append(loss.item())
        y_true.extend(y.detach().cpu().tolist())
        y_prob.extend(torch.softmax(logits, dim=1).detach().cpu().tolist())

    y_true = np.array(y_true)
    y_prob = np.array(y_prob)      # [N, C]
    y_pred = y_prob.argmax(axis=1)
    C = y_prob.shape[1]

    acc     = accuracy_score(y_true, y_pred)
    bal_acc = balanced_accuracy_score(y_true, y_pred)
    f1m     = f1_score(y_true, y_pred, average="macro")
    prem    = precision_score(y_true, y_pred, average="macro", zero_division=0)
    recm    = recall_score(y_true, y_pred, average="macro")

    try:
        auc_macro = roc_auc_score(y_true, y_prob, multi_class="ovr", average="macro")
    except ValueError:
        auc_macro = float("nan")
    try:
        y_true_bin = label_binarize(y_true, classes=np.arange(C))
        ap_macro = average_precision_score(y_true_bin, y_prob, average="macro")
    except ValueError:
        ap_macro = float("nan")

    kappa = cohen_kappa_score(y_true, y_pred)
    mcc   = matthews_corrcoef(y_true, y_pred)

    report = classification_report(
        y_true, y_pred, target_names=class_names, output_dict=True, zero_division=0
    )

    return {
        "loss": float(np.mean(losses)),
        "acc": float(acc),
        "bal_acc": float(bal_acc),
        "f1_macro": float(f1m),
        "precision_macro": float(prem),
        "recall_macro": float(recm),
        "auc_macro": float(auc_macro),
        "ap_macro": float(ap_macro),
        "kappa": float(kappa),
        "mcc": float(mcc),
        "per_class": report,
        "cm": confusion_matrix(y_true, y_pred),
    }

def train_once(model: nn.Module, train_loader, val_loader,
               epochs: int = 30, lr_head: float = 1e-3, lr_backbone: float = 1e-4,
               weight_decay: float = 1e-4, es_patience: int = 5, amp: bool = True):
    criterion = nn.CrossEntropyLoss(label_smoothing=0.05)

    head_keys = ("fc", "classifier")
    head_params, backbone_params = [], []
    for n, p in model.named_parameters():
        if not p.requires_grad:
            continue
        (head_params if any(k in n for k in head_keys) else backbone_params).append(p)

    optimizer = AdamW([
        {"params": backbone_params, "lr": lr_backbone},
        {"params": head_params,     "lr": lr_head}
    ], weight_decay=weight_decay)

    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode="max", factor=0.5, patience=3
    )
    scaler = GradScaler(enabled=(amp and torch.cuda.is_available()))

    best_f1, best_state, wait = -1.0, None, 0
    history = []

    for ep in range(1, epochs + 1):
        model.train()
        running, n = 0.0, 0
        for x, y in train_loader:
            x = x.to(DEVICE, non_blocking=True).to(memory_format=torch.channels_last)
            y = y.to(DEVICE, non_blocking=True)
            optimizer.zero_grad(set_to_none=True)
            with autocast(device_type=DEVICE.type, enabled=(amp and torch.cuda.is_available())):
                logits = model(x)
                loss = criterion(logits, y)
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()
            bs = y.size(0)
            running += loss.item() * bs
            n += bs
        train_loss = running / max(1, n)

        valm = evaluate(model, val_loader, criterion)
        scheduler.step(valm["f1_macro"])
        history.append({"epoch": ep, "train_loss": train_loss, **valm})
        print(
          f"[{ep:02d}/{epochs}] "
          f"train={train_loss:.4f} | val_loss={valm['loss']:.4f} "
          f"acc={valm['acc']:.4f} balAcc={valm['bal_acc']:.4f} "
          f"f1M={valm['f1_macro']:.4f} AUC={valm['auc_macro']:.4f} "
          f"AP={valm['ap_macro']:.4f} "
          f"kappa={valm['kappa']:.4f} mcc={valm['mcc']:.4f} "
      )

        if valm["f1_macro"] > best_f1:
            best_f1 = valm["f1_macro"]
            best_state = copy.deepcopy(model.state_dict())
            wait = 0
        else:
            wait += 1
            if wait >= es_patience:
                print(f"Early stopping en epoch {ep:02d} (mejor F1-macro={best_f1:.4f})")
                break

    if best_state is not None:
        model.load_state_dict(best_state)
    return model, history

def run_single_training(arch: str, train_loader, val_loader, out_prefix: str):
    num_classes = len(train_loader.dataset.classes)
    class_to_idx = train_loader.dataset.class_to_idx
    model = build_model(arch, num_classes=num_classes, dropout=0.2, pretrained=True)

    model, hist = train_once(
        model, train_loader, val_loader,
        epochs=30, lr_head=1e-3, lr_backbone=1e-4, weight_decay=1e-4,
        es_patience=5, amp=True
    )

    ckpt_path = f"./{out_prefix}_{arch}_best.pth"
    torch.save({
        "arch": arch,
        "num_classes": num_classes,
        "state_dict": model.state_dict(),
        "class_to_idx": class_to_idx,
    }, ckpt_path)

    hist_path = f"./{out_prefix}_{arch}_history.pt"
    torch.save({"history": hist}, hist_path)

    print(f"Guardado: {ckpt_path} | {hist_path}")
    return ckpt_path, hist_path

## 🧪 Experimentos de entrenamiento

En esta sección se ejecuta el entrenamiento para cada arquitectura de manera secuencial. Cada experimento:

1. **Carga los datos** con transformaciones específicas de la arquitectura
2. **Entrena el modelo** con las mejores prácticas implementadas
3. **Guarda automáticamente**:
   - `run_vB_{arch}_best.pth`: Mejor checkpoint del modelo
   - `run_vB_{arch}_history.pt`: Historial completo de métricas

**Configuración común**:
- Batch size: 32
- Epochs máximos: 30 (con early stopping)
- Optimizer: AdamW con weight decay
- Scheduler: ReduceLROnPlateau

### ResNet50
**ResNet50** es una arquitectura clásica con conexiones residuales que permite entrenar redes muy profundas. 
- Utiliza bloques residuales para evitar el problema del gradiente que desaparece
- Excelente baseline para tareas de clasificación de imágenes
- Balance óptimo entre precisión y eficiencia computacional

In [45]:
train_loader, val_loader, _ = load_data_by_arch(arch="resnet50", batch_size=32)
run_single_training("resnet50", train_loader, val_loader, out_prefix="run_vB")

[01/30] train=0.5700 | val_loss=0.4797 acc=0.8417 balAcc=0.8152 f1M=0.7889 AUC=0.9476 AP=0.8509 kappa=0.7422 mcc=0.7465 
[02/30] train=0.2476 | val_loss=0.4944 acc=0.8745 balAcc=0.7561 f1M=0.7748 AUC=0.9360 AP=0.8347 kappa=0.7844 mcc=0.7924 
[03/30] train=0.2121 | val_loss=0.4566 acc=0.8745 balAcc=0.7968 f1M=0.8057 AUC=0.9502 AP=0.8594 kappa=0.7885 mcc=0.7890 
[04/30] train=0.1940 | val_loss=0.4407 acc=0.8759 balAcc=0.8166 f1M=0.8205 AUC=0.9531 AP=0.8612 kappa=0.7922 mcc=0.7923 
[05/30] train=0.1862 | val_loss=0.4558 acc=0.8799 balAcc=0.8044 f1M=0.8160 AUC=0.9461 AP=0.8467 kappa=0.7975 mcc=0.7984 
[06/30] train=0.1855 | val_loss=0.4510 acc=0.8759 balAcc=0.7901 f1M=0.8065 AUC=0.9475 AP=0.8534 kappa=0.7892 mcc=0.7913 
[07/30] train=0.1822 | val_loss=0.4445 acc=0.8786 balAcc=0.8060 f1M=0.8180 AUC=0.9505 AP=0.8609 kappa=0.7951 mcc=0.7960 
[08/30] train=0.1814 | val_loss=0.4351 acc=0.8786 balAcc=0.8160 f1M=0.8212 AUC=0.9542 AP=0.8645 kappa=0.7964 mcc=0.7966 
[09/30] train=0.1800 | val_loss=

('./run_vB_resnet50_best.pth', './run_vB_resnet50_history.pt')

### EfficientNet-B3

**EfficientNet-B3** utiliza compound scaling para optimizar simultáneamente profundidad, ancho y resolución:
- Arquitectura más eficiente computacionalmente que ResNet
- Mejor precisión con menos parámetros mediante el uso de escalado compuesto
- Incorpora squeeze-and-excitation blocks para mejorar la representación de características

In [46]:
train_loader, val_loader, _ = load_data_by_arch(arch="efficientnet_b3", batch_size=32)
run_single_training("efficientnet_b3", train_loader, val_loader, out_prefix="run_vB")

Downloading: "https://download.pytorch.org/models/efficientnet_b3_rwightman-b3899882.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b3_rwightman-b3899882.pth


100%|██████████| 47.2M/47.2M [00:00<00:00, 211MB/s]


[01/30] train=0.5422 | val_loss=0.4287 acc=0.8759 balAcc=0.8415 f1M=0.8255 AUC=0.9568 AP=0.8756 kappa=0.7964 mcc=0.7976 
[02/30] train=0.3331 | val_loss=0.4726 acc=0.9018 balAcc=0.8432 f1M=0.8487 AUC=0.9405 AP=0.8489 kappa=0.8357 mcc=0.8361 
[03/30] train=0.2643 | val_loss=0.5213 acc=0.8759 balAcc=0.7646 f1M=0.7824 AUC=0.9189 AP=0.8295 kappa=0.7875 mcc=0.7924 
[04/30] train=0.2255 | val_loss=0.4322 acc=0.8936 balAcc=0.8116 f1M=0.8236 AUC=0.9616 AP=0.8810 kappa=0.8205 mcc=0.8223 
[05/30] train=0.2130 | val_loss=0.4526 acc=0.8895 balAcc=0.8383 f1M=0.8351 AUC=0.9501 AP=0.8496 kappa=0.8163 mcc=0.8163 
[06/30] train=0.2120 | val_loss=0.4795 acc=0.8827 balAcc=0.7751 f1M=0.7924 AUC=0.9371 AP=0.8484 kappa=0.7996 mcc=0.8042 
[07/30] train=0.1983 | val_loss=0.4478 acc=0.8936 balAcc=0.8110 f1M=0.8241 AUC=0.9541 AP=0.8662 kappa=0.8202 mcc=0.8218 
Early stopping en epoch 07 (mejor F1-macro=0.8487)
Guardado: ./run_vB_efficientnet_b3_best.pth | ./run_vB_efficientnet_b3_history.pt


('./run_vB_efficientnet_b3_best.pth', './run_vB_efficientnet_b3_history.pt')

### MobileNet-V3-Large

**MobileNet-V3-Large** está optimizada para dispositivos móviles y aplicaciones con recursos limitados:
- Utiliza depthwise separable convolutions para reducir el costo computacional
- Incorpora hard-swish activation y squeeze-and-excitation blocks
- Ideal para deployment en producción donde la latencia es crítica
- Mantiene buena precisión con un modelo significativamente más pequeño

In [47]:
train_loader, val_loader, _ = load_data_by_arch(arch="mobilenet_v3_large", batch_size=32)
run_single_training("mobilenet_v3_large", train_loader, val_loader, out_prefix="run_vB")

Downloading: "https://download.pytorch.org/models/mobilenet_v3_large-5c1a4163.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v3_large-5c1a4163.pth


100%|██████████| 21.1M/21.1M [00:00<00:00, 198MB/s]


[01/30] train=0.5240 | val_loss=0.5100 acc=0.8513 balAcc=0.8111 f1M=0.8051 AUC=0.9431 AP=0.8540 kappa=0.7552 mcc=0.7577 
[02/30] train=0.3412 | val_loss=0.4459 acc=0.8827 balAcc=0.8125 f1M=0.8249 AUC=0.9569 AP=0.8778 kappa=0.8029 mcc=0.8063 
[03/30] train=0.2523 | val_loss=0.4767 acc=0.8840 balAcc=0.7865 f1M=0.8025 AUC=0.9296 AP=0.8506 kappa=0.8032 mcc=0.8075 
[04/30] train=0.2171 | val_loss=0.4277 acc=0.8950 balAcc=0.8522 f1M=0.8453 AUC=0.9578 AP=0.8876 kappa=0.8260 mcc=0.8262 
[05/30] train=0.1994 | val_loss=0.4424 acc=0.9004 balAcc=0.8521 f1M=0.8509 AUC=0.9491 AP=0.8580 kappa=0.8341 mcc=0.8341 
[06/30] train=0.1921 | val_loss=0.4507 acc=0.9018 balAcc=0.8451 f1M=0.8480 AUC=0.9506 AP=0.8610 kappa=0.8360 mcc=0.8362 
[07/30] train=0.1964 | val_loss=0.4655 acc=0.8827 balAcc=0.8475 f1M=0.8319 AUC=0.9531 AP=0.8559 kappa=0.8069 mcc=0.8084 
[08/30] train=0.1906 | val_loss=0.4441 acc=0.8990 balAcc=0.8340 f1M=0.8419 AUC=0.9414 AP=0.8624 kappa=0.8306 mcc=0.8311 
[09/30] train=0.1823 | val_loss=

('./run_vB_mobilenet_v3_large_best.pth',
 './run_vB_mobilenet_v3_large_history.pt')

## 📋 Resumen y próximos pasos

### ✅ Resultados del entrenamiento

Al completar estos experimentos, tendrás:

1. **Tres modelos entrenados** con diferentes arquitecturas
2. **Checkpoints guardados** con los mejores pesos de cada modelo
3. **Historial completo** de métricas para análisis posterior


### 🚀 Próximos pasos recomendados

1. **Análisis comparativo**: Comparar métricas entre arquitecturas
2. **Evaluación en test set**: Usar el notebook `02_eval_inf.ipynb` 
4. **Análisis de errores**: Examinar matrices de confusión
5. **Inferencia**: Probar los modelos en nuevas imágenes

### 💡 Notas importantes

- Los modelos están listos para inferencia y evaluación
- Se utilizó early stopping basado en F1-macro para evitar overfitting
- Las transformaciones son específicas de cada arquitectura para máximo rendimiento