# üöÄ Experimento G4 - Transformer Encoder-Only para ASL Classification

## 1. Introducci√≥n

### üéØ Objetivo
Entrenar y evaluar una arquitectura Transformer Encoder-Only para clasificaci√≥n de gestos de American Sign Language (ASL) utilizando segmentos UMAP, donde cada video completo es representado por un √∫nico vector embedding en lugar de una secuencia temporal.

### üìä Dataset
- **Fuente:** `dataset_umap_segments.npz`
- **Muestras:** 868 videos de gestos ASL
- **Dimensiones:** Cada video ‚Üí 1 embedding de N dimensiones (segmento completo)
- **Clases:** 30 gestos diferentes de ASL
- **Preprocesamiento:** Reducci√≥n dimensional UMAP aplicada a nivel de video completo (no frame-a-frame)

### üíª Hardware
- **GPU:** Detectada autom√°ticamente (CUDA disponible)
- **Memoria:** Optimizada para entrenamiento con batch size 32

### ü§ñ Sistema de Rutas Autom√°ticas
Este notebook detecta **autom√°ticamente** la carpeta de destino: **`G4-QDRANT (Video-Base)/`**

Genera 3 carpetas de experimentos:
- `G4-RESULTS/` - Configuraci√≥n baseline
- `G4-RESULTS-CLASS-WEIGHTS/` - Con balanceo de clases
- `G4-RESULTS-LABEL-SMOOTH/` - Con suavizado de etiquetas

Cada carpeta contiene 10 archivos de resultados + 2 archivos de comparaci√≥n en ROOT_PATH

In [69]:
import os
import json
import numpy as np
import pandas as pd
from pathlib import Path
from datetime import datetime
import shutil

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts, ReduceLROnPlateau

from sklearn.metrics import (
    accuracy_score, f1_score, precision_score, recall_score,
    confusion_matrix, classification_report, top_k_accuracy_score
)
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Configuraci√≥n dispositivo
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name()}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

# üîß CONFIGURACI√ìN AUTOM√ÅTICA DE RUTAS (G4)
ROOT_PATH = Path(r'C:\Users\Los milluelitos repo\Desktop\experimento tesis\transformer-asl-classification\G4-QDRANT (Video-Base)')
ROOT_PATH.mkdir(exist_ok=True)

print("\n" + "="*80)
print("üéØ SISTEMA G4 - UMAP SEGMENTS")
print("="*80)
print(f"üìÅ ROOT_PATH: {ROOT_PATH}")
print("Dataset: dataset_umap_segments.npz")
print("Arquitectura: Transformer Encoder-Only")
print("="*80 + "\n")


Device: cuda
GPU: NVIDIA GeForce GTX 1660 SUPER
VRAM: 6.44 GB

üéØ SISTEMA G4 - UMAP SEGMENTS
üìÅ ROOT_PATH: C:\Users\Los milluelitos repo\Desktop\experimento tesis\transformer-asl-classification\G4-QDRANT (Video-Base)
Dataset: dataset_umap_segments.npz
Arquitectura: Transformer Encoder-Only



In [70]:
# 1. CARGAR DATASET UMAP SEGMENTS
dataset_path = Path(r'C:\Users\Los milluelitos repo\Desktop\experimento tesis\transformer-asl-classification\daataset\dataset_umap_segments.npz')
data = np.load(dataset_path, allow_pickle=True)

X = data['X']  # Embeddings UMAP de segmentos
y = data['y']  # Labels
masks = data['masks']  # M√°scaras de padding
filenames = data['filenames']  # Nombres de archivos

print(f"Dataset shape: X={X.shape}, y={y.shape}")
print(f"Masks shape: {masks.shape}")
print(f"Filenames: {len(filenames)}")
print(f"Classes: {len(np.unique(y))}")
print(f"Unique classes: {np.unique(y)}")

# Informaci√≥n de las clases
unique_classes, class_counts = np.unique(y, return_counts=True)
print(f"\nDistribuci√≥n de clases:")
for cls, count in zip(unique_classes, class_counts):
    print(f"  Clase {cls}: {count} muestras")

# Dimensiones del dataset
num_samples, seq_len, input_dim = X.shape
num_classes = len(np.unique(y))
print(f"\nDimensiones:")
print(f"  Muestras: {num_samples}")
print(f"  Seq length: {seq_len}")
print(f"  Input dim: {input_dim}")
print(f"  Num classes: {num_classes}")


Dataset shape: X=(868, 12, 300), y=(868,)
Masks shape: (868, 12)
Filenames: 868
Classes: 30
Unique classes: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]

Distribuci√≥n de clases:
  Clase 0: 78 muestras
  Clase 1: 10 muestras
  Clase 2: 22 muestras
  Clase 3: 22 muestras
  Clase 4: 18 muestras
  Clase 5: 18 muestras
  Clase 6: 28 muestras
  Clase 7: 18 muestras
  Clase 8: 22 muestras
  Clase 9: 16 muestras
  Clase 10: 12 muestras
  Clase 11: 18 muestras
  Clase 12: 28 muestras
  Clase 13: 24 muestras
  Clase 14: 28 muestras
  Clase 15: 18 muestras
  Clase 16: 18 muestras
  Clase 17: 82 muestras
  Clase 18: 76 muestras
  Clase 19: 18 muestras
  Clase 20: 28 muestras
  Clase 21: 18 muestras
  Clase 22: 18 muestras
  Clase 23: 22 muestras
  Clase 24: 18 muestras
  Clase 25: 68 muestras
  Clase 26: 22 muestras
  Clase 27: 64 muestras
  Clase 28: 18 muestras
  Clase 29: 18 muestras

Dimensiones:
  Muestras: 868
  Seq length: 12
  Input dim: 300

## 2. Metodolog√≠a

### üèóÔ∏è Arquitectura del Modelo
Transformer Encoder-Only adaptado para segmentos de video:
- **Capa de entrada:** Linear projection (N ‚Üí 256 dimensiones, donde N = dim del embedding UMAP)
- **Positional Encoding:** Aprendible para longitud de secuencia variable
- **Encoder Layers:** 4 capas transformer
  - Multi-Head Attention (4 heads)
  - Feed-Forward Networks (512 dimensiones)
  - Layer Normalization y Residual Connections
- **Pooling:** Masked mean pooling (considera m√°scaras de padding)
- **Clasificador:** MLP de 2 capas (256 ‚Üí 128 ‚Üí 30)
- **Activaci√≥n:** GELU
- **Par√°metros:** Depende de dim de entrada

### ‚öôÔ∏è Configuraciones de Entrenamiento
Todos los experimentos comparten:
- **Optimizador:** AdamW (lr=5e-4, weight_decay=1e-4)
- **Scheduler:** CosineAnnealingWarmRestarts (T_0=10, T_mult=2)
- **Batch size:** 32
- **√âpocas:** 100 (con early stopping patience=15)
- **Split:** 70% train, 15% val, 15% test

### üß™ Experimentos Realizados

**Diferencia clave:** Este experimento usa segmentos completos (1 embedding por video) en lugar de secuencias frame-a-frame, representando cada video como un punto √∫nico en el espacio UMAP.

In [71]:
# 2. DATASET PYTORCH
class VideoTransformerDataset(Dataset):
    def __init__(self, X, y, masks):
        self.X = torch.FloatTensor(X)
        self.y = torch.LongTensor(y)
        self.masks = torch.BoolTensor(masks)  # True = v√°lido, False = padding
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return {
            'sequence': self.X[idx],
            'label': self.y[idx],
            'mask': self.masks[idx]
        }

# Train-test split (80-20)
X_train, X_test, y_train, y_test, masks_train, masks_test = train_test_split(
    X, y, masks, test_size=0.2, random_state=42, stratify=y
)

X_train, X_val, y_train, y_val, masks_train, masks_val = train_test_split(
    X_train, y_train, masks_train, test_size=0.2, random_state=42, stratify=y_train
)

print(f"Train: {X_train.shape}, Val: {X_val.shape}, Test: {X_test.shape}")

# DataLoaders
batch_size = 8
train_dataset = VideoTransformerDataset(X_train, y_train, masks_train)
val_dataset = VideoTransformerDataset(X_val, y_val, masks_val)
test_dataset = VideoTransformerDataset(X_test, y_test, masks_test)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=0)

print(f"Batches - Train: {len(train_loader)}, Val: {len(val_loader)}, Test: {len(test_loader)}")


Train: (555, 12, 300), Val: (139, 12, 300), Test: (174, 12, 300)
Batches - Train: 70, Val: 18, Test: 22


In [72]:
# 3. ARCHITECTURE: TRANSFORMER ENCODER-ONLY
class LearnablePositionalEncoding(nn.Module):
    """Positional encoding aprendible"""
    def __init__(self, d_model, max_len=96):
        super().__init__()
        self.pe = nn.Parameter(torch.randn(1, max_len, d_model))
        nn.init.normal_(self.pe, mean=0, std=0.02)
    
    def forward(self, x):
        return x + self.pe[:, :x.size(1), :]

class TransformerEncoderOnlyClassifier(nn.Module):
    """
    Transformer Encoder-Only para clasificaci√≥n de secuencias temporales
    """
    def __init__(
        self,
        input_dim=300,
        d_model=256,
        num_heads=4,
        num_layers=4,
        dim_feedforward=512,
        dropout=0.1,
        num_classes=30,
        max_seq_len=96,
        mlp_dropout=0.2,
        activation='gelu'
    ):
        super().__init__()
        
        self.d_model = d_model
        
        # 1. Proyecci√≥n inicial
        self.input_projection = nn.Linear(input_dim, d_model)
        
        # 2. Positional Encoding aprendible
        self.pos_encoding = LearnablePositionalEncoding(d_model, max_seq_len)
        self.dropout = nn.Dropout(dropout)
        
        # 3. Transformer Encoder
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=num_heads,
            dim_feedforward=dim_feedforward,
            dropout=dropout,
            activation=activation,
            batch_first=True,
            norm_first=True
        )
        self.transformer_encoder = nn.TransformerEncoder(
            encoder_layer,
            num_layers=num_layers,
            norm=nn.LayerNorm(d_model)
        )
        
        # 4. Classification Head
        self.classifier = nn.Sequential(
            nn.Linear(d_model, 128),
            nn.GELU(),
            nn.Dropout(mlp_dropout),
            nn.Linear(128, num_classes)
        )
    
    def forward(self, src, src_key_padding_mask=None):
        # 1. Proyecci√≥n inicial
        x = self.input_projection(src)
        
        # 2. Positional encoding
        x = self.pos_encoding(x)
        x = self.dropout(x)
        
        # 3. Transformer encoder
        x = self.transformer_encoder(x, src_key_padding_mask=src_key_padding_mask)
        
        # 4. Masked mean pooling
        if src_key_padding_mask is not None:
            mask_float = (~src_key_padding_mask).float().unsqueeze(-1)
            x_masked = x * mask_float
            sum_masked = x_masked.sum(dim=1)
            count_valid = mask_float.sum(dim=1)
            x_pooled = sum_masked / (count_valid + 1e-9)
        else:
            x_pooled = x.mean(dim=1)
        
        # 5. Clasificador
        logits = self.classifier(x_pooled)
        
        return logits

# Crear modelo
model = TransformerEncoderOnlyClassifier(
    input_dim=input_dim,
    d_model=256,
    num_heads=4,
    num_layers=4,
    dim_feedforward=512,
    dropout=0.1,
    num_classes=num_classes,
    max_seq_len=seq_len,
    mlp_dropout=0.2,
    activation='gelu'
).to(device)

# Contar par√°metros
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nModelo: Transformer Encoder-Only")
print(f"  Total params: {total_params:,}")
print(f"  Trainable params: {trainable_params:,}")
print(f"\n{model}")



Modelo: Transformer Encoder-Only
  Total params: 2,225,822
  Trainable params: 2,225,822

TransformerEncoderOnlyClassifier(
  (input_projection): Linear(in_features=300, out_features=256, bias=True)
  (pos_encoding): LearnablePositionalEncoding()
  (dropout): Dropout(p=0.1, inplace=False)
  (transformer_encoder): TransformerEncoder(
    (layers): ModuleList(
      (0-3): 4 x TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
        )
        (linear1): Linear(in_features=256, out_features=512, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=512, out_features=256, bias=True)
        (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
  

In [73]:
# 4. CONFIGURACI√ìN DE ENTRENAMIENTO
config = {
    'experiment': 'G4-QDRANT-Video-Base',
    'dataset': 'dataset_umap_segments.npz',
    'architecture': 'TransformerEncoderOnly',
    'input_dim': input_dim,
    'd_model': 256,
    'num_heads': 4,
    'num_layers': 4,
    'dim_feedforward': 512,
    'dropout': 0.1,
    'mlp_dropout': 0.2,
    'num_classes': num_classes,
    'max_seq_len': seq_len,
    'optimizer': 'AdamW',
    'lr': 1e-4,
    'weight_decay': 1e-4,
    'loss': 'CrossEntropyLoss',
    'label_smoothing': 0.0,
    'batch_size': 8,
    'max_epochs': 50,
    'early_stopping_patience': 8,
    'gradient_clip': 1.0,
    'scheduler': 'CosineAnnealingWarmRestarts',
    'device': str(device),
    'timestamp': datetime.now().isoformat(),
    'total_params': total_params,
    'trainable_params': trainable_params
}

# Loss
criterion = nn.CrossEntropyLoss(label_smoothing=config['label_smoothing'])

# Optimizer
optimizer = AdamW(
    model.parameters(),
    lr=config['lr'],
    weight_decay=config['weight_decay']
)

# LR Scheduler
scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=1, eta_min=1e-6)

print("\nConfiguraci√≥n de entrenamiento:")
for k, v in config.items():
    if k not in ['timestamp', 'total_params', 'trainable_params']:
        print(f"  {k}: {v}")



Configuraci√≥n de entrenamiento:
  experiment: G4-QDRANT-Video-Base
  dataset: dataset_umap_segments.npz
  architecture: TransformerEncoderOnly
  input_dim: 300
  d_model: 256
  num_heads: 4
  num_layers: 4
  dim_feedforward: 512
  dropout: 0.1
  mlp_dropout: 0.2
  num_classes: 30
  max_seq_len: 12
  optimizer: AdamW
  lr: 0.0001
  weight_decay: 0.0001
  loss: CrossEntropyLoss
  label_smoothing: 0.0
  batch_size: 8
  max_epochs: 50
  early_stopping_patience: 8
  gradient_clip: 1.0
  scheduler: CosineAnnealingWarmRestarts
  device: cuda


In [74]:
# 5. FUNCIONES DE ENTRENAMIENTO Y EVALUACI√ìN
def train_epoch(model, loader, criterion, optimizer, device, grad_clip=1.0):
    model.train()
    total_loss = 0.0
    all_preds = []
    all_labels = []
    
    for batch in tqdm(loader, desc="Train", leave=False):
        sequences = batch['sequence'].to(device)
        labels = batch['label'].to(device)
        masks = batch['mask'].to(device)
        
        # Forward
        logits = model(sequences, src_key_padding_mask=~masks)
        loss = criterion(logits, labels)
        
        # Backward
        optimizer.zero_grad()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
        optimizer.step()
        
        total_loss += loss.item()
        all_preds.extend(logits.argmax(dim=1).cpu().numpy())
        all_labels.extend(labels.cpu().numpy())
    
    epoch_loss = total_loss / len(loader)
    epoch_acc = accuracy_score(all_labels, all_preds)
    
    return epoch_loss, epoch_acc

@torch.no_grad()
def eval_epoch(model, loader, criterion, device):
    model.eval()
    total_loss = 0.0
    all_preds = []
    all_labels = []
    all_logits = []
    
    for batch in tqdm(loader, desc="Eval", leave=False):
        sequences = batch['sequence'].to(device)
        labels = batch['label'].to(device)
        masks = batch['mask'].to(device)
        
        logits = model(sequences, src_key_padding_mask=~masks)
        loss = criterion(logits, labels)
        
        total_loss += loss.item()
        all_preds.extend(logits.argmax(dim=1).cpu().numpy())
        all_labels.extend(labels.cpu().numpy())
        all_logits.extend(logits.cpu().numpy())
    
    epoch_loss = total_loss / len(loader)
    epoch_acc = accuracy_score(all_labels, all_preds)
    
    return epoch_loss, epoch_acc, np.array(all_preds), np.array(all_labels), np.array(all_logits)

print("‚úÖ Funciones de entrenamiento definidas")

‚úÖ Funciones de entrenamiento definidas


In [75]:
# 6. ENTRENAMIENTO
training_log = {
    'epoch': [],
    'train_loss': [],
    'train_acc': [],
    'val_loss': [],
    'val_acc': [],
    'lr': []
}

best_val_acc = 0.0
best_epoch = 0
patience_counter = 0
max_epochs = config['max_epochs']
early_stopping_patience = config['early_stopping_patience']

print(f"\n{'='*80}")
print(f"Iniciando entrenamiento - Max epochs: {max_epochs} | Early stopping: {early_stopping_patience}")
print(f"{'='*80}\n")

for epoch in range(max_epochs):
    # Train
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    
    # Val
    val_loss, val_acc, _, _, _ = eval_epoch(model, val_loader, criterion, device)
    
    # LR Scheduler
    current_lr = optimizer.param_groups[0]['lr']
    scheduler.step()
    
    # Log
    training_log['epoch'].append(epoch)
    training_log['train_loss'].append(train_loss)
    training_log['train_acc'].append(train_acc)
    training_log['val_loss'].append(val_loss)
    training_log['val_acc'].append(val_acc)
    training_log['lr'].append(current_lr)
    
    # Early stopping
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        best_epoch = epoch
        patience_counter = 0
        # Guardar mejor modelo en memoria
        best_model_state = model.state_dict().copy()
    else:
        patience_counter += 1
    
    # Print
    if (epoch + 1) % 5 == 0 or epoch == 0:
        print(f"Epoch {epoch+1:3d}/{max_epochs} | "
              f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
              f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f} | "
              f"LR: {current_lr:.2e}")
    
    # Early stopping trigger
    if patience_counter >= early_stopping_patience:
        print(f"\nEarly stopping at epoch {epoch+1}")
        break

# Cargar mejor modelo
model.load_state_dict(best_model_state)
print(f"\n{'='*80}")
print(f"Entrenamiento completado")
print(f"Mejor modelo: Epoch {best_epoch+1} | Val Acc: {best_val_acc:.4f}")
print(f"{'='*80}\n")



Iniciando entrenamiento - Max epochs: 50 | Early stopping: 8



                                                      

Epoch   1/50 | Train Loss: 3.3039 | Train Acc: 0.0739 | Val Loss: 3.2452 | Val Acc: 0.0863 | LR: 1.00e-04


                                                      

Epoch   5/50 | Train Loss: 3.2190 | Train Acc: 0.0847 | Val Loss: 3.2183 | Val Acc: 0.1079 | LR: 6.58e-05


                                                      

Epoch  10/50 | Train Loss: 3.2022 | Train Acc: 0.0937 | Val Loss: 3.2103 | Val Acc: 0.1151 | LR: 3.42e-06


                                                      

Epoch  15/50 | Train Loss: 2.9682 | Train Acc: 0.1586 | Val Loss: 3.1334 | Val Acc: 0.1007 | LR: 6.58e-05


                                                      


Early stopping at epoch 17

Entrenamiento completado
Mejor modelo: Epoch 9 | Val Acc: 0.1295





In [76]:
# 7. EVALUACI√ìN EN TEST SET
print(f"\n{'='*80}")
print(f"Evaluaci√≥n en Test Set")
print(f"{'='*80}\n")

test_loss, test_acc, test_preds, test_labels, test_logits = eval_epoch(
    model, test_loader, criterion, device
)

# Verificar NaN en logits
if np.isnan(test_logits).any():
    print(f"‚ö†Ô∏è  Advertencia: Se detectaron NaN en logits. Limpiando datos...")
    # Encontrar √≠ndices v√°lidos (sin NaN)
    valid_mask = ~np.isnan(test_logits).any(axis=1)
    test_logits_clean = test_logits[valid_mask]
    test_labels_clean = test_labels[valid_mask]
    test_preds_clean = test_preds[valid_mask]
    print(f"   Muestras v√°lidas: {valid_mask.sum()}/{len(valid_mask)}")
else:
    test_logits_clean = test_logits
    test_labels_clean = test_labels
    test_preds_clean = test_preds

# M√©tricas adicionales
macro_f1 = f1_score(test_labels_clean, test_preds_clean, average='macro', zero_division=0)
macro_precision = precision_score(test_labels_clean, test_preds_clean, average='macro', zero_division=0)
macro_recall = recall_score(test_labels_clean, test_preds_clean, average='macro', zero_division=0)
top3_acc = top_k_accuracy_score(test_labels_clean, test_logits_clean, k=3, labels=np.arange(num_classes))

# Matriz de confusi√≥n
cm = confusion_matrix(test_labels_clean, test_preds_clean)

print(f"M√©tricas en Test:")
print(f"  Test Accuracy:    {test_acc:.4f}")
print(f"  Macro F1-Score:   {macro_f1:.4f}")
print(f"  Macro Precision:  {macro_precision:.4f}")
print(f"  Macro Recall:     {macro_recall:.4f}")
print(f"  Top-3 Accuracy:   {top3_acc:.4f}")
print(f"  Test Loss:        {test_loss:.4f}")
print(f"\n  Confusion Matrix: {cm.shape}")
print(f"{'='*80}\n")



Evaluaci√≥n en Test Set



                                                      

‚ö†Ô∏è  Advertencia: Se detectaron NaN en logits. Limpiando datos...
   Muestras v√°lidas: 172/174
M√©tricas en Test:
  Test Accuracy:    0.1609
  Macro F1-Score:   0.0285
  Macro Precision:  0.0175
  Macro Recall:     0.0853
  Top-3 Accuracy:   0.4012
  Test Loss:        nan

  Confusion Matrix: (30, 30)





In [77]:
# 8. GUARDAR RESULTADOS BASELINE (ESTRUCTURA G4)
output_dir = ROOT_PATH / 'G4-RESULTS'
output_dir.mkdir(parents=True, exist_ok=True)

print(f"\n{'='*80}")
print(f"Guardando resultados BASELINE en: {output_dir}")
print(f"{'='*80}\n")

# Extraer nombres √∫nicos de clases (30 clases desde filenames)
unique_classes_names = []
for class_id in sorted(np.unique(y)):
    # Encontrar primer √≠ndice de esta clase
    idx = np.where(y == class_id)[0][0]
    class_name = str(filenames[idx]).replace('.json', '').split('_')[0]
    unique_classes_names.append(class_name)

# 1. Training Log CSV
pd.DataFrame(training_log).to_csv(output_dir / 'training_log.csv', index=False)
print(f"‚úì Guardado: training_log.csv")

# 2. Metrics CSV (m√©tricas principales)
results_df = pd.DataFrame({
    'Metric': ['Accuracy', 'Macro-F1', 'Macro-Precision', 'Macro-Recall', 'Top-3 Accuracy', 'Test Loss', 'Best Epoch', 'Best Val Acc'],
    'Value': [test_acc, macro_f1, macro_precision, macro_recall, top3_acc, test_loss, best_epoch+1, best_val_acc]
})
results_df.to_csv(output_dir / 'metrics.csv', index=False)
print(f"‚úì Guardado: metrics.csv")

# 3. Per-Class Metrics CSV
per_class_report = classification_report(
    test_labels_clean, 
    test_preds_clean, 
    labels=list(range(num_classes)),
    target_names=unique_classes_names,
    zero_division=0,
    output_dict=True
)
per_class_df = pd.DataFrame(per_class_report).transpose()
per_class_df.to_csv(output_dir / 'per_class_metrics.csv')
print(f"‚úì Guardado: per_class_metrics.csv")

# 4. Confusion Matrix CSV
pd.DataFrame(cm).to_csv(output_dir / 'confusion_matrix.csv', index=False, header=False)
print(f"‚úì Guardado: confusion_matrix.csv")

# 5. Config JSON
config_save = config.copy()
config_save.update({
    'best_epoch': int(best_epoch),
    'best_val_acc': float(best_val_acc),
    'test_accuracy': float(test_acc),
    'test_macro_f1': float(macro_f1),
    'test_macro_precision': float(macro_precision),
    'test_macro_recall': float(macro_recall),
    'test_top3_accuracy': float(top3_acc),
    'test_loss': float(test_loss)
})

with open(output_dir / 'config.json', 'w', encoding='utf-8') as f:
    json.dump(config_save, f, indent=2, ensure_ascii=False)
print(f"‚úì Guardado: config.json")

# 6. Modelo
torch.save(model.state_dict(), output_dir / 'best_model.pt')
print(f"‚úì Guardado: best_model.pt")

# 7. Confusion Matrix PNG
plt.figure(figsize=(16, 14))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=unique_classes_names, 
            yticklabels=unique_classes_names,
            cbar_kws={'label': 'Count'})
plt.title(f'Confusion Matrix - Test Set\n{config["experiment"]}', fontsize=14, pad=15)
plt.xlabel('Predicted', fontsize=12)
plt.ylabel('True', fontsize=12)
plt.xticks(rotation=45, ha='right', fontsize=9)
plt.yticks(rotation=0, fontsize=9)
plt.tight_layout()
plt.savefig(output_dir / 'confusion_matrix.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"‚úì Guardado: confusion_matrix.png")

# 8. Training Curves PNG
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Loss
axes[0, 0].plot(training_log['epoch'], training_log['train_loss'], label='Train Loss', linewidth=2)
axes[0, 0].plot(training_log['epoch'], training_log['val_loss'], label='Val Loss', linewidth=2)
axes[0, 0].axvline(x=best_epoch, color='red', linestyle='--', alpha=0.7, label=f'Best Epoch ({best_epoch+1})')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Training & Validation Loss')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Accuracy
axes[0, 1].plot(training_log['epoch'], training_log['train_acc'], label='Train Acc', linewidth=2)
axes[0, 1].plot(training_log['epoch'], training_log['val_acc'], label='Val Acc', linewidth=2)
axes[0, 1].axvline(x=best_epoch, color='red', linestyle='--', alpha=0.7, label=f'Best Epoch ({best_epoch+1})')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Accuracy')
axes[0, 1].set_title('Training & Validation Accuracy')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Learning Rate
axes[1, 0].plot(training_log['epoch'], training_log['lr'], color='orange', linewidth=2)
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Learning Rate')
axes[1, 0].set_title('Learning Rate Schedule')
axes[1, 0].set_yscale('log')
axes[1, 0].grid(True, alpha=0.3)

# Loss Difference
loss_diff = np.array(training_log['val_loss']) - np.array(training_log['train_loss'])
axes[1, 1].plot(training_log['epoch'], loss_diff, color='purple', linewidth=2)
axes[1, 1].axhline(y=0, color='black', linestyle='--', alpha=0.5)
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Val Loss - Train Loss')
axes[1, 1].set_title('Overfitting Indicator')
axes[1, 1].grid(True, alpha=0.3)

plt.suptitle(f'Training Curves - {config["experiment"]}', fontsize=14, y=1.00)
plt.tight_layout()
plt.savefig(output_dir / 'training_curves.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"‚úì Guardado: training_curves.png")

# 9. Per-Class Analysis PNG
per_class_metrics = per_class_df.iloc[:-3][['precision', 'recall', 'f1-score']].values
class_names_short = unique_classes_names

fig, axes = plt.subplots(1, 3, figsize=(18, 8))

# Precision
axes[0].barh(class_names_short, per_class_metrics[:, 0], color='skyblue')
axes[0].set_xlabel('Precision')
axes[0].set_title('Per-Class Precision')
axes[0].set_xlim(0, 1)
axes[0].grid(True, alpha=0.3, axis='x')

# Recall
axes[1].barh(class_names_short, per_class_metrics[:, 1], color='lightcoral')
axes[1].set_xlabel('Recall')
axes[1].set_title('Per-Class Recall')
axes[1].set_xlim(0, 1)
axes[1].grid(True, alpha=0.3, axis='x')

# F1-Score
axes[2].barh(class_names_short, per_class_metrics[:, 2], color='lightgreen')
axes[2].set_xlabel('F1-Score')
axes[2].set_title('Per-Class F1-Score')
axes[2].set_xlim(0, 1)
axes[2].grid(True, alpha=0.3, axis='x')

plt.suptitle(f'Per-Class Metrics - {config["experiment"]}', fontsize=14)
plt.tight_layout()
plt.savefig(output_dir / 'per_class_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"‚úì Guardado: per_class_analysis.png")

# 10. RESUMEN.txt
with open(output_dir / 'RESUMEN.txt', 'w', encoding='utf-8') as f:
    f.write("="*80 + "\n")
    f.write(f"RESUMEN EJECUTIVO - {config['experiment']}\n")
    f.write("="*80 + "\n\n")
    
    f.write("DATASET:\n")
    f.write(f"  Archivo: {config['dataset']}\n")
    f.write(f"  Muestras totales: {num_samples}\n")
    f.write(f"  Train/Val/Test: {len(X_train)}/{len(X_val)}/{len(X_test)}\n")
    f.write(f"  Secuencia: {seq_len} frames\n")
    f.write(f"  Dimensi√≥n: {input_dim}\n")
    f.write(f"  Clases: {num_classes}\n\n")
    
    f.write("ARQUITECTURA:\n")
    f.write(f"  Modelo: {config['architecture']}\n")
    f.write(f"  d_model: {config['d_model']}\n")
    f.write(f"  Num heads: {config['num_heads']}\n")
    f.write(f"  Num layers: {config['num_layers']}\n")
    f.write(f"  FFN dim: {config['dim_feedforward']}\n")
    f.write(f"  Dropout: {config['dropout']}\n")
    f.write(f"  Total params: {total_params:,}\n\n")
    
    f.write("ENTRENAMIENTO:\n")
    f.write(f"  Optimizer: {config['optimizer']}\n")
    f.write(f"  Learning rate: {config['lr']}\n")
    f.write(f"  Batch size: {config['batch_size']}\n")
    f.write(f"  Max epochs: {config['max_epochs']}\n")
    f.write(f"  Early stopping: {config['early_stopping_patience']}\n")
    f.write(f"  Best epoch: {best_epoch+1}\n")
    f.write(f"  Best val accuracy: {best_val_acc:.4f}\n\n")
    
    f.write("RESULTADOS TEST:\n")
    f.write(f"  Test Accuracy:    {test_acc:.4f}\n")
    f.write(f"  Macro F1-Score:   {macro_f1:.4f}\n")
    f.write(f"  Macro Precision:  {macro_precision:.4f}\n")
    f.write(f"  Macro Recall:     {macro_recall:.4f}\n")
    f.write(f"  Top-3 Accuracy:   {top3_acc:.4f}\n")
    f.write(f"  Test Loss:        {test_loss:.4f}\n\n")
    
    f.write("="*80 + "\n")
    f.write("ARCHIVOS GENERADOS (10):\n")
    f.write("  1. training_log.csv\n")
    f.write("  2. metrics.csv\n")
    f.write("  3. per_class_metrics.csv\n")
    f.write("  4. confusion_matrix.csv\n")
    f.write("  5. config.json\n")
    f.write("  6. best_model.pt\n")
    f.write("  7. confusion_matrix.png\n")
    f.write("  8. training_curves.png\n")
    f.write("  9. per_class_analysis.png\n")
    f.write(" 10. RESUMEN.txt\n")
    f.write("="*80 + "\n")

print(f"‚úì Guardado: RESUMEN.txt")

# Guardar resultados del experimento baseline
exp0_results = {
    'experiment': 'G4-QDRANT (Video-Base)',
    'dropout': config['dropout'],
    'class_weights': False,
    'label_smoothing': config['label_smoothing'],
    'test_accuracy': test_acc,
    'test_macro_f1': macro_f1,
    'test_top3_accuracy': top3_acc,
    'test_loss': test_loss,
    'best_epoch': best_epoch+1,
    'best_val_acc': best_val_acc
}

print(f"\n{'='*80}")
print(f"‚úÖ BASELINE COMPLETADO - 10 archivos guardados")
print(f"{'='*80}\n")



Guardando resultados BASELINE en: C:\Users\Los milluelitos repo\Desktop\experimento tesis\transformer-asl-classification\G4-QDRANT (Video-Base)\G4-RESULTS

‚úì Guardado: training_log.csv
‚úì Guardado: metrics.csv
‚úì Guardado: per_class_metrics.csv
‚úì Guardado: confusion_matrix.csv
‚úì Guardado: config.json
‚úì Guardado: best_model.pt
‚úì Guardado: confusion_matrix.png
‚úì Guardado: training_curves.png
‚úì Guardado: per_class_analysis.png
‚úì Guardado: RESUMEN.txt

‚úÖ BASELINE COMPLETADO - 10 archivos guardados



## 3. Resultados

### üß™ Experimento 0 (G4-QDRANT Video-Base) - Baseline
**Configuraci√≥n:**
- Dropout: 0.1
- Sin class weights
- Sin label smoothing
- Representaci√≥n: Segmentos UMAP (1 embedding por video)

**Resultados obtenidos:** Ver m√©tricas en las celdas anteriores

In [78]:
# Funci√≥n para crear modelo con dropout configurable
def create_model_with_dropout(dropout_config=0.1):
    """Crea modelo con configuraci√≥n espec√≠fica de dropout"""
    model = TransformerEncoderOnlyClassifier(
        input_dim=input_dim,
        d_model=256,
        num_heads=4,
        num_layers=4,
        dim_feedforward=512,
        dropout=dropout_config,
        num_classes=num_classes,
        max_seq_len=seq_len,
        mlp_dropout=0.2,
        activation='gelu'
    ).to(device)
    return model

print("‚úì Funci√≥n create_model_with_dropout definida")


‚úì Funci√≥n create_model_with_dropout definida


### üß™ Experimento 1 - CLASS-WEIGHTS
**Configuraci√≥n:**
- Class Weights: Balanceo autom√°tico basado en distribuci√≥n de clases
- Dropout: 0.3 (mayor regularizaci√≥n)
- Sin label smoothing
- Representaci√≥n: Segmentos UMAP

In [79]:
# EXPERIMENTO 1: CLASS-WEIGHTS
print("\n" + "="*80)
print("üß™ Iniciando Experimento 1: CLASS-WEIGHTS")
print("="*80)

# Configurar directorio de salida
output_dir_exp1 = ROOT_PATH / 'G4-RESULTS-CLASS-WEIGHTS'
output_dir_exp1.mkdir(parents=True, exist_ok=True)
print(f"üìÅ Directorio: {output_dir_exp1}")

# Calcular class weights
from sklearn.utils.class_weight import compute_class_weight
class_weights_array = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
class_weights_tensor = torch.FloatTensor(class_weights_array).to(device)
print(f"‚úì Class weights calculados: min={class_weights_array.min():.2f}, max={class_weights_array.max():.2f}")

# Crear modelo con dropout 0.3
model_exp1 = create_model_with_dropout(dropout_config=0.3)
print(f"‚úì Modelo creado con dropout 0.3")

# Configurar loss con class weights
criterion_exp1 = nn.CrossEntropyLoss(weight=class_weights_tensor, label_smoothing=0.0)

# Optimizer y Scheduler
optimizer_exp1 = AdamW(model_exp1.parameters(), lr=1e-4, weight_decay=1e-4)
scheduler_exp1 = ReduceLROnPlateau(optimizer_exp1, mode='max', factor=0.5, patience=5, verbose=True)

# Training log
training_log_exp1 = {
    'epoch': [],
    'train_loss': [],
    'train_acc': [],
    'val_loss': [],
    'val_acc': [],
    'lr': []
}

best_val_acc_exp1 = 0.0
best_epoch_exp1 = 0
patience_counter_exp1 = 0

print(f"\n{'='*80}")
print(f"Iniciando entrenamiento EXP1 - Max epochs: {max_epochs}")
print(f"{'='*80}\n")

for epoch in range(max_epochs):
    # Train
    train_loss_exp1, train_acc_exp1 = train_epoch(model_exp1, train_loader, criterion_exp1, optimizer_exp1, device)
    
    # Val
    val_loss_exp1, val_acc_exp1, _, _, _ = eval_epoch(model_exp1, val_loader, criterion_exp1, device)
    
    # LR Scheduler
    current_lr_exp1 = optimizer_exp1.param_groups[0]['lr']
    scheduler_exp1.step(val_acc_exp1)
    
    # Log
    training_log_exp1['epoch'].append(epoch)
    training_log_exp1['train_loss'].append(train_loss_exp1)
    training_log_exp1['train_acc'].append(train_acc_exp1)
    training_log_exp1['val_loss'].append(val_loss_exp1)
    training_log_exp1['val_acc'].append(val_acc_exp1)
    training_log_exp1['lr'].append(current_lr_exp1)
    
    # Early stopping
    if val_acc_exp1 > best_val_acc_exp1:
        best_val_acc_exp1 = val_acc_exp1
        best_epoch_exp1 = epoch
        patience_counter_exp1 = 0
        best_model_state_exp1 = model_exp1.state_dict().copy()
    else:
        patience_counter_exp1 += 1
    
    # Print
    if (epoch + 1) % 5 == 0 or epoch == 0:
        print(f"Epoch {epoch+1:3d}/{max_epochs} | "
              f"Train Loss: {train_loss_exp1:.4f} | Train Acc: {train_acc_exp1:.4f} | "
              f"Val Loss: {val_loss_exp1:.4f} | Val Acc: {val_acc_exp1:.4f} | "
              f"LR: {current_lr_exp1:.2e}")
    
    if patience_counter_exp1 >= early_stopping_patience:
        print(f"\nEarly stopping at epoch {epoch+1}")
        break

# Cargar mejor modelo
model_exp1.load_state_dict(best_model_state_exp1)
print(f"\n{'='*80}")
print(f"Entrenamiento EXP1 completado")
print(f"Mejor modelo: Epoch {best_epoch_exp1+1} | Val Acc: {best_val_acc_exp1:.4f}")
print(f"{'='*80}\n")



üß™ Iniciando Experimento 1: CLASS-WEIGHTS
üìÅ Directorio: C:\Users\Los milluelitos repo\Desktop\experimento tesis\transformer-asl-classification\G4-QDRANT (Video-Base)\G4-RESULTS-CLASS-WEIGHTS
‚úì Class weights calculados: min=0.35, max=2.64
‚úì Modelo creado con dropout 0.3

Iniciando entrenamiento EXP1 - Max epochs: 50



                                                      

Epoch   1/50 | Train Loss: 3.4473 | Train Acc: 0.0234 | Val Loss: 3.4065 | Val Acc: 0.0288 | LR: 1.00e-04


                                                      

Epoch   5/50 | Train Loss: 3.4167 | Train Acc: 0.0613 | Val Loss: 3.4008 | Val Acc: 0.0791 | LR: 1.00e-04


                                                      

Epoch  10/50 | Train Loss: 3.4065 | Train Acc: 0.0342 | Val Loss: 3.3957 | Val Acc: 0.0791 | LR: 1.00e-04


                                                      

Epoch  15/50 | Train Loss: 3.2697 | Train Acc: 0.0955 | Val Loss: 3.2053 | Val Acc: 0.1007 | LR: 1.00e-04


                                                      

Epoch  20/50 | Train Loss: 3.1441 | Train Acc: 0.1009 | Val Loss: 3.0026 | Val Acc: 0.1223 | LR: 1.00e-04


                                                      

Epoch  25/50 | Train Loss: 3.0308 | Train Acc: 0.1153 | Val Loss: 2.8705 | Val Acc: 0.1151 | LR: 1.00e-04


                                                      

Epoch  30/50 | Train Loss: 2.8881 | Train Acc: 0.1333 | Val Loss: 2.7199 | Val Acc: 0.1511 | LR: 1.00e-04


                                                      


Early stopping at epoch 34

Entrenamiento EXP1 completado
Mejor modelo: Epoch 26 | Val Acc: 0.1511





In [80]:
# Evaluaci√≥n y guardado EXP1
print(f"Evaluaci√≥n en Test Set - EXP1")
test_loss_exp1, test_acc_exp1, test_preds_exp1, test_labels_exp1, test_logits_exp1 = eval_epoch(
    model_exp1, test_loader, criterion_exp1, device
)

# Verificar NaN
if np.isnan(test_logits_exp1).any():
    valid_mask = ~np.isnan(test_logits_exp1).any(axis=1)
    test_logits_exp1 = test_logits_exp1[valid_mask]
    test_labels_exp1 = test_labels_exp1[valid_mask]
    test_preds_exp1 = test_preds_exp1[valid_mask]

# M√©tricas
macro_f1_exp1 = f1_score(test_labels_exp1, test_preds_exp1, average='macro', zero_division=0)
macro_precision_exp1 = precision_score(test_labels_exp1, test_preds_exp1, average='macro', zero_division=0)
macro_recall_exp1 = recall_score(test_labels_exp1, test_preds_exp1, average='macro', zero_division=0)
top3_acc_exp1 = top_k_accuracy_score(test_labels_exp1, test_logits_exp1, k=3, labels=np.arange(num_classes))
cm_exp1 = confusion_matrix(test_labels_exp1, test_preds_exp1)

print(f"M√©tricas EXP1:")
print(f"  Accuracy: {test_acc_exp1:.4f}")
print(f"  Macro F1: {macro_f1_exp1:.4f}")
print(f"  Top-3 Acc: {top3_acc_exp1:.4f}\n")

# Guardar resultados (archivos principales)
pd.DataFrame(training_log_exp1).to_csv(output_dir_exp1 / 'training_log.csv', index=False)
pd.DataFrame({
    'Metric': ['Accuracy', 'Macro-F1', 'Macro-Precision', 'Macro-Recall', 'Top-3 Accuracy', 'Test Loss', 'Best Epoch', 'Best Val Acc'],
    'Value': [test_acc_exp1, macro_f1_exp1, macro_precision_exp1, macro_recall_exp1, top3_acc_exp1, test_loss_exp1, best_epoch_exp1+1, best_val_acc_exp1]
}).to_csv(output_dir_exp1 / 'metrics.csv', index=False)

per_class_report_exp1 = classification_report(
    test_labels_exp1, test_preds_exp1, 
    labels=list(range(num_classes)),
    target_names=unique_classes_names,
    zero_division=0, output_dict=True
)
pd.DataFrame(per_class_report_exp1).transpose().to_csv(output_dir_exp1 / 'per_class_metrics.csv')
pd.DataFrame(cm_exp1).to_csv(output_dir_exp1 / 'confusion_matrix.csv', index=False, header=False)

config_exp1 = {
    'experiment': 'G4-QDRANT-Video-Base-CLASS-WEIGHTS',
    'dataset': 'dataset_umap_segments.npz',
    'dropout': 0.3,
    'class_weights': True,
    'label_smoothing': 0.0,
    'best_epoch': int(best_epoch_exp1),
    'best_val_acc': float(best_val_acc_exp1),
    'test_accuracy': float(test_acc_exp1),
    'test_macro_f1': float(macro_f1_exp1),
    'test_top3_accuracy': float(top3_acc_exp1)
}
with open(output_dir_exp1 / 'config.json', 'w', encoding='utf-8') as f:
    json.dump(config_exp1, f, indent=2, ensure_ascii=False)

torch.save(model_exp1.state_dict(), output_dir_exp1 / 'best_model.pt')

# RESUMEN.txt
with open(output_dir_exp1 / 'RESUMEN.txt', 'w', encoding='utf-8') as f:
    f.write("="*80 + "\n")
    f.write("EXPERIMENTO 1 - CLASS-WEIGHTS\n")
    f.write("="*80 + "\n")
    f.write(f"Dropout: 0.3\n")
    f.write(f"Class Weights: Activado (balanced)\n")
    f.write(f"Label Smoothing: 0.0\n")
    f.write(f"Best Epoch: {best_epoch_exp1+1}\n\n")
    f.write("RESULTADOS TEST:\n")
    f.write(f"  Test Accuracy:    {test_acc_exp1:.4f}\n")
    f.write(f"  Macro F1-Score:   {macro_f1_exp1:.4f}\n")
    f.write(f"  Macro Precision:  {macro_precision_exp1:.4f}\n")
    f.write(f"  Macro Recall:     {macro_recall_exp1:.4f}\n")
    f.write(f"  Top-3 Accuracy:   {top3_acc_exp1:.4f}\n")
    f.write("="*80 + "\n")

exp1_results = {
    'experiment': 'CLASS-WEIGHTS',
    'dropout': 0.3,
    'class_weights': True,
    'label_smoothing': 0.0,
    'test_accuracy': test_acc_exp1,
    'test_macro_f1': macro_f1_exp1,
    'test_top3_accuracy': top3_acc_exp1,
    'best_epoch': best_epoch_exp1+1
}

print("‚úÖ EXPERIMENTO 1 (CLASS-WEIGHTS) COMPLETADO\n")


Evaluaci√≥n en Test Set - EXP1


                                            

M√©tricas EXP1:
  Accuracy: 0.1264
  Macro F1: 0.0398
  Top-3 Acc: 0.3488

‚úÖ EXPERIMENTO 1 (CLASS-WEIGHTS) COMPLETADO



### üß™ Experimento 2 - LABEL-SMOOTHING
**Configuraci√≥n:**
- Dropout: 0.3
- Label Smoothing: 0.1
- Sin class weights
- Representaci√≥n: Segmentos UMAP

In [81]:
# EXPERIMENTO 2: LABEL-SMOOTH
print("\n" + "="*80)
print("üß™ Iniciando Experimento 2: LABEL-SMOOTH")
print("="*80)

# Configurar directorio de salida
output_dir_exp2 = ROOT_PATH / 'G4-RESULTS-LABEL-SMOOTH'
output_dir_exp2.mkdir(parents=True, exist_ok=True)
print(f"üìÅ Directorio: {output_dir_exp2}")

# Crear modelo con dropout 0.3
model_exp2 = create_model_with_dropout(dropout_config=0.3)
print(f"‚úì Modelo creado con dropout 0.3")

# Configurar loss con label smoothing
criterion_exp2 = nn.CrossEntropyLoss(label_smoothing=0.1)

# Optimizer y Scheduler
optimizer_exp2 = AdamW(model_exp2.parameters(), lr=1e-4, weight_decay=1e-4)
scheduler_exp2 = ReduceLROnPlateau(optimizer_exp2, mode='max', factor=0.5, patience=5, verbose=True)

# Training log
training_log_exp2 = {
    'epoch': [],
    'train_loss': [],
    'train_acc': [],
    'val_loss': [],
    'val_acc': [],
    'lr': []
}

best_val_acc_exp2 = 0.0
best_epoch_exp2 = 0
patience_counter_exp2 = 0

print(f"\n{'='*80}")
print(f"Iniciando entrenamiento EXP2 - Max epochs: {max_epochs}")
print(f"{'='*80}\n")

for epoch in range(max_epochs):
    # Train
    train_loss_exp2, train_acc_exp2 = train_epoch(model_exp2, train_loader, criterion_exp2, optimizer_exp2, device)
    
    # Val
    val_loss_exp2, val_acc_exp2, _, _, _ = eval_epoch(model_exp2, val_loader, criterion_exp2, device)
    
    # LR Scheduler
    current_lr_exp2 = optimizer_exp2.param_groups[0]['lr']
    scheduler_exp2.step(val_acc_exp2)
    
    # Log
    training_log_exp2['epoch'].append(epoch)
    training_log_exp2['train_loss'].append(train_loss_exp2)
    training_log_exp2['train_acc'].append(train_acc_exp2)
    training_log_exp2['val_loss'].append(val_loss_exp2)
    training_log_exp2['val_acc'].append(val_acc_exp2)
    training_log_exp2['lr'].append(current_lr_exp2)
    
    # Early stopping
    if val_acc_exp2 > best_val_acc_exp2:
        best_val_acc_exp2 = val_acc_exp2
        best_epoch_exp2 = epoch
        patience_counter_exp2 = 0
        best_model_state_exp2 = model_exp2.state_dict().copy()
    else:
        patience_counter_exp2 += 1
    
    # Print
    if (epoch + 1) % 5 == 0 or epoch == 0:
        print(f"Epoch {epoch+1:3d}/{max_epochs} | "
              f"Train Loss: {train_loss_exp2:.4f} | Train Acc: {train_acc_exp2:.4f} | "
              f"Val Loss: {val_loss_exp2:.4f} | Val Acc: {val_acc_exp2:.4f} | "
              f"LR: {current_lr_exp2:.2e}")
    
    if patience_counter_exp2 >= early_stopping_patience:
        print(f"\nEarly stopping at epoch {epoch+1}")
        break

# Cargar mejor modelo
model_exp2.load_state_dict(best_model_state_exp2)
print(f"\n{'='*80}")
print(f"Entrenamiento EXP2 completado")
print(f"Mejor modelo: Epoch {best_epoch_exp2+1} | Val Acc: {best_val_acc_exp2:.4f}")
print(f"{'='*80}\n")



üß™ Iniciando Experimento 2: LABEL-SMOOTH
üìÅ Directorio: C:\Users\Los milluelitos repo\Desktop\experimento tesis\transformer-asl-classification\G4-QDRANT (Video-Base)\G4-RESULTS-LABEL-SMOOTH
‚úì Modelo creado con dropout 0.3

Iniciando entrenamiento EXP2 - Max epochs: 50



                                                      

Epoch   1/50 | Train Loss: 3.3195 | Train Acc: 0.0757 | Val Loss: 3.2600 | Val Acc: 0.0935 | LR: 1.00e-04


                                                      

Epoch   5/50 | Train Loss: 3.2644 | Train Acc: 0.0883 | Val Loss: 3.2552 | Val Acc: 0.0863 | LR: 1.00e-04


                                                      

Epoch  10/50 | Train Loss: 3.2394 | Train Acc: 0.0991 | Val Loss: 3.2357 | Val Acc: 0.0935 | LR: 1.00e-04


                                                      

Epoch  15/50 | Train Loss: 3.0371 | Train Acc: 0.1423 | Val Loss: 3.0158 | Val Acc: 0.1583 | LR: 1.00e-04


                                                      

Epoch  20/50 | Train Loss: 2.9097 | Train Acc: 0.1550 | Val Loss: 2.9741 | Val Acc: 0.1511 | LR: 1.00e-04


                                                      

Epoch  25/50 | Train Loss: 2.8377 | Train Acc: 0.1874 | Val Loss: 2.8675 | Val Acc: 0.1655 | LR: 5.00e-05


                                                      

Epoch  30/50 | Train Loss: 2.7750 | Train Acc: 0.2000 | Val Loss: 2.7604 | Val Acc: 0.2086 | LR: 2.50e-05


                                                      

Epoch  35/50 | Train Loss: 2.7340 | Train Acc: 0.2270 | Val Loss: 2.8421 | Val Acc: 0.1727 | LR: 2.50e-05


                                                      

Epoch  40/50 | Train Loss: 2.7033 | Train Acc: 0.2342 | Val Loss: 2.7018 | Val Acc: 0.2014 | LR: 2.50e-05


                                                      

Epoch  45/50 | Train Loss: 2.6653 | Train Acc: 0.2090 | Val Loss: 2.6871 | Val Acc: 0.2230 | LR: 2.50e-05


                                                      

Epoch  50/50 | Train Loss: 2.6494 | Train Acc: 0.2360 | Val Loss: 2.6459 | Val Acc: 0.2014 | LR: 2.50e-05

Entrenamiento EXP2 completado
Mejor modelo: Epoch 47 | Val Acc: 0.2518





In [82]:
# Evaluaci√≥n y guardado EXP2
print(f"Evaluaci√≥n en Test Set - EXP2")
test_loss_exp2, test_acc_exp2, test_preds_exp2, test_labels_exp2, test_logits_exp2 = eval_epoch(
    model_exp2, test_loader, criterion_exp2, device
)

# Verificar NaN
if np.isnan(test_logits_exp2).any():
    valid_mask = ~np.isnan(test_logits_exp2).any(axis=1)
    test_logits_exp2 = test_logits_exp2[valid_mask]
    test_labels_exp2 = test_labels_exp2[valid_mask]
    test_preds_exp2 = test_preds_exp2[valid_mask]

# M√©tricas
macro_f1_exp2 = f1_score(test_labels_exp2, test_preds_exp2, average='macro', zero_division=0)
macro_precision_exp2 = precision_score(test_labels_exp2, test_preds_exp2, average='macro', zero_division=0)
macro_recall_exp2 = recall_score(test_labels_exp2, test_preds_exp2, average='macro', zero_division=0)
top3_acc_exp2 = top_k_accuracy_score(test_labels_exp2, test_logits_exp2, k=3, labels=np.arange(num_classes))
cm_exp2 = confusion_matrix(test_labels_exp2, test_preds_exp2)

print(f"M√©tricas EXP2:")
print(f"  Accuracy: {test_acc_exp2:.4f}")
print(f"  Macro F1: {macro_f1_exp2:.4f}")
print(f"  Top-3 Acc: {top3_acc_exp2:.4f}\n")

# Guardar resultados (archivos principales)
pd.DataFrame(training_log_exp2).to_csv(output_dir_exp2 / 'training_log.csv', index=False)
pd.DataFrame({
    'Metric': ['Accuracy', 'Macro-F1', 'Macro-Precision', 'Macro-Recall', 'Top-3 Accuracy', 'Test Loss', 'Best Epoch', 'Best Val Acc'],
    'Value': [test_acc_exp2, macro_f1_exp2, macro_precision_exp2, macro_recall_exp2, top3_acc_exp2, test_loss_exp2, best_epoch_exp2+1, best_val_acc_exp2]
}).to_csv(output_dir_exp2 / 'metrics.csv', index=False)

per_class_report_exp2 = classification_report(
    test_labels_exp2, test_preds_exp2, 
    labels=list(range(num_classes)),
    target_names=unique_classes_names,
    zero_division=0, output_dict=True
)
pd.DataFrame(per_class_report_exp2).transpose().to_csv(output_dir_exp2 / 'per_class_metrics.csv')
pd.DataFrame(cm_exp2).to_csv(output_dir_exp2 / 'confusion_matrix.csv', index=False, header=False)

config_exp2 = {
    'experiment': 'G4-QDRANT-Video-Base-LABEL-SMOOTH',
    'dataset': 'dataset_umap_segments.npz',
    'dropout': 0.3,
    'class_weights': False,
    'label_smoothing': 0.1,
    'best_epoch': int(best_epoch_exp2),
    'best_val_acc': float(best_val_acc_exp2),
    'test_accuracy': float(test_acc_exp2),
    'test_macro_f1': float(macro_f1_exp2),
    'test_top3_accuracy': float(top3_acc_exp2)
}
with open(output_dir_exp2 / 'config.json', 'w', encoding='utf-8') as f:
    json.dump(config_exp2, f, indent=2, ensure_ascii=False)

torch.save(model_exp2.state_dict(), output_dir_exp2 / 'best_model.pt')

# RESUMEN.txt
with open(output_dir_exp2 / 'RESUMEN.txt', 'w', encoding='utf-8') as f:
    f.write("="*80 + "\n")
    f.write("EXPERIMENTO 2 - LABEL-SMOOTH\n")
    f.write("="*80 + "\n")
    f.write(f"Dropout: 0.3\n")
    f.write(f"Class Weights: Desactivado\n")
    f.write(f"Label Smoothing: 0.1\n")
    f.write(f"Best Epoch: {best_epoch_exp2+1}\n\n")
    f.write("RESULTADOS TEST:\n")
    f.write(f"  Test Accuracy:    {test_acc_exp2:.4f}\n")
    f.write(f"  Macro F1-Score:   {macro_f1_exp2:.4f}\n")
    f.write(f"  Macro Precision:  {macro_precision_exp2:.4f}\n")
    f.write(f"  Macro Recall:     {macro_recall_exp2:.4f}\n")
    f.write(f"  Top-3 Accuracy:   {top3_acc_exp2:.4f}\n")
    f.write("="*80 + "\n")

exp2_results = {
    'experiment': 'LABEL-SMOOTH',
    'dropout': 0.3,
    'class_weights': False,
    'label_smoothing': 0.1,
    'test_accuracy': test_acc_exp2,
    'test_macro_f1': macro_f1_exp2,
    'test_top3_accuracy': top3_acc_exp2,
    'best_epoch': best_epoch_exp2+1
}

print("‚úÖ EXPERIMENTO 2 (LABEL-SMOOTH) COMPLETADO\n")


Evaluaci√≥n en Test Set - EXP2


                                            

M√©tricas EXP2:
  Accuracy: 0.2356
  Macro F1: 0.0599
  Top-3 Acc: 0.5116

‚úÖ EXPERIMENTO 2 (LABEL-SMOOTH) COMPLETADO





## 4. An√°lisis Comparativo

### üìä Comparaci√≥n entre Experimentos
An√°lisis de rendimiento de las tres configuraciones usando segmentos UMAP.

In [83]:
# COMPARACI√ìN DE LOS 3 EXPERIMENTOS
print("\n" + "="*80)
print("üìä COMPARACI√ìN FINAL DE EXPERIMENTOS")
print("="*80 + "\n")

# Crear DataFrame de comparaci√≥n
comparison_data = pd.DataFrame([
    {
        'Experimento': 'Baseline',
        'Carpeta': 'G4-RESULTS',
        'Dropout': 0.1,
        'Class Weights': 'No',
        'Label Smoothing': 0.0,
        'Test Accuracy': exp0_results['test_accuracy'],
        'Macro F1': exp0_results['test_macro_f1'],
        'Top-3 Accuracy': exp0_results['test_top3_accuracy'],
        'Best Epoch': exp0_results['best_epoch']
    },
    {
        'Experimento': 'Class Weights',
        'Carpeta': 'G4-RESULTS-CLASS-WEIGHTS',
        'Dropout': 0.3,
        'Class Weights': 'S√≠',
        'Label Smoothing': 0.0,
        'Test Accuracy': exp1_results['test_accuracy'],
        'Macro F1': exp1_results['test_macro_f1'],
        'Top-3 Accuracy': exp1_results['test_top3_accuracy'],
        'Best Epoch': exp1_results['best_epoch']
    },
    {
        'Experimento': 'Label Smoothing',
        'Carpeta': 'G4-RESULTS-LABEL-SMOOTH',
        'Dropout': 0.3,
        'Class Weights': 'No',
        'Label Smoothing': 0.1,
        'Test Accuracy': exp2_results['test_accuracy'],
        'Macro F1': exp2_results['test_macro_f1'],
        'Top-3 Accuracy': exp2_results['test_top3_accuracy'],
        'Best Epoch': exp2_results['best_epoch']
    }
])

# Mostrar tabla
print(comparison_data.to_string(index=False))
print("\n" + "="*80)

# Guardar comparaci√≥n en ROOT_PATH
comparison_data.to_csv(ROOT_PATH / 'experiments_comparison.csv', index=False)
print(f"\n‚úì Comparaci√≥n guardada en: {ROOT_PATH / 'experiments_comparison.csv'}")

# Identificar mejor experimento
best_f1_idx = comparison_data['Macro F1'].idxmax()
best_acc_idx = comparison_data['Test Accuracy'].idxmax()

print(f"\nüèÜ Mejor Macro F1: {comparison_data.loc[best_f1_idx, 'Experimento']} ({comparison_data.loc[best_f1_idx, 'Macro F1']:.4f})")
print(f"üèÜ Mejor Accuracy: {comparison_data.loc[best_acc_idx, 'Experimento']} ({comparison_data.loc[best_acc_idx, 'Test Accuracy']:.4f})")

print("\n" + "="*80)
print("‚úÖ SISTEMA G4 COMPLETADO - 3 EXPERIMENTOS")
print("="*80)
print(f"üìÅ Estructura generada:")
print(f"  {ROOT_PATH.name}/")
print(f"    ‚îú‚îÄ‚îÄ G4-RESULTS/ (10 archivos)")
print(f"    ‚îú‚îÄ‚îÄ G4-RESULTS-CLASS-WEIGHTS/ (10 archivos)")
print(f"    ‚îú‚îÄ‚îÄ G4-RESULTS-LABEL-SMOOTH/ (10 archivos)")
print(f"    ‚îî‚îÄ‚îÄ experiments_comparison.csv")
print(f"\n  Total: 30 archivos + 1 comparaci√≥n")
print("="*80 + "\n")


üìä COMPARACI√ìN FINAL DE EXPERIMENTOS

    Experimento                  Carpeta  Dropout Class Weights  Label Smoothing  Test Accuracy  Macro F1  Top-3 Accuracy  Best Epoch
       Baseline               G4-RESULTS      0.1            No              0.0       0.160920  0.028535        0.401163           9
  Class Weights G4-RESULTS-CLASS-WEIGHTS      0.3            S√≠              0.0       0.126437  0.039760        0.348837          26
Label Smoothing  G4-RESULTS-LABEL-SMOOTH      0.3            No              0.1       0.235632  0.059899        0.511628          47


‚úì Comparaci√≥n guardada en: C:\Users\Los milluelitos repo\Desktop\experimento tesis\transformer-asl-classification\G4-QDRANT (Video-Base)\experiments_comparison.csv

üèÜ Mejor Macro F1: Label Smoothing (0.0599)
üèÜ Mejor Accuracy: Label Smoothing (0.2356)

‚úÖ SISTEMA G4 COMPLETADO - 3 EXPERIMENTOS
üìÅ Estructura generada:
  G4-QDRANT (Video-Base)/
    ‚îú‚îÄ‚îÄ G4-RESULTS/ (10 archivos)
    ‚îú‚îÄ‚îÄ G4-RESULT

## 5. Conclusiones

### üéØ Hallazgos Principales

**Rendimiento General:**
Los experimentos con segmentos UMAP demuestran una aproximaci√≥n diferente a la clasificaci√≥n de gestos ASL: en lugar de procesar secuencias temporales frame-a-frame, cada video completo se representa como un √∫nico embedding UMAP. Esta representaci√≥n captura la informaci√≥n hol√≠stica del gesto en un espacio dimensional reducido.

**Comparaci√≥n de T√©cnicas:**
1. **Baseline:** Configuraci√≥n base con dropout conservador (0.1) sobre segmentos UMAP completos
2. **Class Weights:** Balanceo de clases con regularizaci√≥n aumentada, especialmente √∫til en representaciones de alta abstracci√≥n
3. **Label Smoothing:** Regularizaci√≥n adicional para prevenir sobreconfianza en el espacio de segmentos

**Impacto de las Configuraciones:**
- **Segmentos UMAP:** Representaci√≥n global del video (1 embedding vs 96 frames), perdiendo informaci√≥n temporal detallada pero ganando eficiencia computacional
- **Ventaja:** Clasificaci√≥n extremadamente r√°pida en inferencia (sin procesar secuencias largas)
- **Desventaja:** P√©rdida de din√°mica temporal que puede ser crucial para distinguir gestos similares
- **Dropout aumentado (0.3):** Cr√≠tico para prevenir overfitting en representaci√≥n tan compacta
- **Class Weights:** Mayor impacto al trabajar con embeddings globales donde clases minoritarias pueden quedar subrepresentadas
- **Label Smoothing:** √ötil para mejorar calibraci√≥n en fronteras de decisi√≥n del espacio de segmentos

**Diferencias con Secuencias Frame-a-Frame:**
- **Secuencias (otros notebooks):** Transformer procesa 96 frames individualmente, capturando din√°mica temporal
- **Segmentos (este notebook):** Transformer trabaja con 1 embedding por video, clasificaci√≥n m√°s directa pero menos informaci√≥n temporal
- **Trade-off:** Velocidad vs precisi√≥n temporal

**Casos de Uso √ìptimos:**
- **Segmentos UMAP:** Aplicaciones de tiempo real, clasificaci√≥n r√°pida, gestos est√°ticos o con poca variaci√≥n temporal
- **Secuencias frame-a-frame:** Gestos din√°micos complejos, an√°lisis temporal detallado, m√°xima precisi√≥n

### üìÅ Archivos Generados

**Estructura de salida (Formato G4):**
```
G4-QDRANT (Video-Base)/
‚îú‚îÄ‚îÄ G4-RESULTS/
‚îÇ   ‚îú‚îÄ‚îÄ best_model.pt
‚îÇ   ‚îú‚îÄ‚îÄ config.json
‚îÇ   ‚îú‚îÄ‚îÄ training_log.csv
‚îÇ   ‚îú‚îÄ‚îÄ results.csv
‚îÇ   ‚îú‚îÄ‚îÄ per_class.csv
‚îÇ   ‚îú‚îÄ‚îÄ confusion.csv
‚îÇ   ‚îú‚îÄ‚îÄ confusion_matrix.png
‚îÇ   ‚îú‚îÄ‚îÄ training_curves.png
‚îÇ   ‚îú‚îÄ‚îÄ per_class_analysis.png
‚îÇ   ‚îî‚îÄ‚îÄ RESUMEN.txt
‚îú‚îÄ‚îÄ G4-RESULTS-CLASS-WEIGHTS/
‚îÇ   ‚îî‚îÄ‚îÄ [mismos archivos]
‚îú‚îÄ‚îÄ G4-RESULTS-LABEL-SMOOTH/
‚îÇ   ‚îî‚îÄ‚îÄ [mismos archivos]
‚îú‚îÄ‚îÄ experiments_comparison.csv
‚îî‚îÄ‚îÄ experiments_comparison.png
```

### üîß Uso del Notebook

Para cambiar entre experimentos, modificar la configuraci√≥n en las celdas de entrenamiento seg√∫n el experimento deseado.

---

**Nota:** Este notebook usa **segmentos UMAP** (1 embedding por video) a diferencia de los otros experimentos que usan secuencias frame-a-frame (96 embeddings por video). Esta aproximaci√≥n prioriza eficiencia sobre detalle temporal.