<a href="https://colab.research.google.com/github/mxag11z/EMO/blob/main/TransformersApproach.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Transformers approach:

Estrategia 3: entrenar un modelo multilingüe en el conjunto de entrenamiento en inglés y evaluarlo directamente en el conjunto de evaluación en español.

In [24]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [26]:
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments,AdamW,get_linear_schedule_with_warmup
from torch.utils.data import Dataset, DataLoader
import numpy as np
from sklearn.metrics import mean_absolute_error
from tqdm import tqdm


In [15]:
class EmotionDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]

        encoding = self.tokenizer(
            text,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.float)
        }


In [37]:
def train_model(model, train_dataloader, val_dataloader, device, epochs=3):
    optimizer = AdamW(model.parameters(), lr=2e-5)

    # Configurar el scheduler
    total_steps = len(train_dataloader) * epochs
    scheduler = get_linear_schedule_with_warmup(
        optimizer,
        num_warmup_steps=0,
        num_training_steps=total_steps
    )

    # Criterio MSE para regresión
    criterion = nn.MSELoss()

    best_val_loss = float('inf')

    for epoch in range(epochs):
        print(f'\nEpoch {epoch + 1}/{epochs}')

        # Training
        model.train()
        train_loss = 0
        for batch in tqdm(train_dataloader, desc='Training'):
            optimizer.zero_grad()

            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(input_ids, attention_mask=attention_mask)
            logits = outputs.logits.squeeze()

            loss = criterion(logits, labels)
            loss.backward()

            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            scheduler.step()

            train_loss += loss.item()

        avg_train_loss = train_loss / len(train_dataloader)

        # Validation
        model.eval()
        val_loss = 0
        predictions = []
        true_labels = []

        with torch.no_grad():
            for batch in tqdm(val_dataloader, desc='Validation'):
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['labels'].to(device)

                outputs = model(input_ids, attention_mask=attention_mask)
                logits = outputs.logits.squeeze()

                loss = criterion(logits, labels)
                val_loss += loss.item()

                predictions.extend(logits.cpu().numpy())
                true_labels.extend(labels.cpu().numpy())

        avg_val_loss = val_loss / len(val_dataloader)
        val_mae = mean_absolute_error(true_labels, predictions)

        print(f'Average training loss: {avg_train_loss:.4f}')
        print(f'Average validation loss: {avg_val_loss:.4f}')
        print(f'Validation MAE: {val_mae:.4f}')

        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss

    return val_mae

In [36]:
def compare_models(train_texts, train_labels, dev_texts, dev_labels, test_texts, test_labels):
    """
    Compara diferentes modelos usando train, dev y test sets
    """
    # Definir los modelos a comparar
    models = {
        'mBERT': 'bert-base-multilingual-cased',
        'XLM-RoBERTa': 'xlm-roberta-base',
        'mT5': 'google/mt5-base'
    }

    results = {}
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    for model_name, model_path in models.items():
        print(f'\nEvaluating {model_name}...')

        # Cargar tokenizer y modelo
        tokenizer = AutoTokenizer.from_pretrained(model_path)
        model = AutoModelForSequenceClassification.from_pretrained(
            model_path,
            num_labels=1  # Regresión: una salida continua
        ).to(device)

        # Crear datasets
        train_dataset = EmotionDataset(train_texts, train_labels, tokenizer)
        dev_dataset = EmotionDataset(dev_texts, dev_labels, tokenizer)
        test_dataset = EmotionDataset(test_texts, test_labels, tokenizer)

        # Crear dataloaders
        train_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True)
        dev_dataloader = DataLoader(dev_dataset, batch_size=16)
        test_dataloader = DataLoader(test_dataset, batch_size=16)

        # Entrenar usando dev para validación
        train_mae = train_model(model, train_dataloader, dev_dataloader, device)
        print(f"Training MAE: {train_mae:.4f}")

        # Evaluar en test
        test_mae = evaluate_model(model, test_dataloader, device)
        print(f"Test MAE: {test_mae:.4f}")

        results[model_name] = {
            'train_mae': train_mae,
            'test_mae': test_mae
        }

    return results

In [35]:
def read_data(emotion):
    """
    Lee los datos de train, dev y test para una emoción
    """
    # Datos de entrenamiento (inglés)
    with open(f"/content/drive/MyDrive/PLN project/data/en/train/{emotion}.txt", 'r', encoding='utf-8') as f:
        train_X = f.readlines()
    with open(f"/content/drive/MyDrive/PLN project/data/en/train/{emotion}_labels.txt", 'r', encoding='utf-8') as f:
        train_y = [float(line.strip()) for line in f.readlines()]

    # Datos de validación (dev)
    with open(f"/content/drive/MyDrive/PLN project/data/en/dev/{emotion}.txt", 'r', encoding='utf-8') as f:
        dev_X = f.readlines()
    with open(f"/content/drive/MyDrive/PLN project/data/en/dev/{emotion}_labels.txt", 'r', encoding='utf-8') as f:
        dev_y = [float(line.strip()) for line in f.readlines()]

    # Datos de test (español)
    with open(f"/content/drive/MyDrive/PLN project/data/es/test/{emotion}.txt", 'r', encoding='utf-8') as f:
        test_X = f.readlines()
    with open(f"/content/drive/MyDrive/PLN project/data/es/test/{emotion}_labels.txt", 'r', encoding='utf-8') as f:
        test_y = [float(line.strip()) for line in f.readlines()]

    return train_X, train_y, dev_X, dev_y, test_X, test_y

In [34]:
def evaluate_all_emotions():
    """
    Evalúa todos los modelos en todas las emociones
    """
    emotions = ['joy', 'anger', 'sadness', 'fear']
    all_results = {}

    for emotion in emotions:
        print(f"\n=== Evaluating {emotion} ===")
        # Cargar datos incluyendo dev set
        train_texts, train_labels, dev_texts, dev_labels, test_texts, test_labels = read_data(emotion)

        # Comparar modelos
        results = compare_models(
            train_texts, train_labels,
            dev_texts, dev_labels,
            test_texts, test_labels
        )
        all_results[emotion] = results

        # Mostrar resultados para esta emoción
        print(f"\nResults for {emotion}:")
        for model_name, scores in results.items():
            print(f"{model_name}:")
            print(f"  Training MAE = {scores['train_mae']:.4f}")
            print(f"  Test MAE = {scores['test_mae']:.4f}")

    return all_results

In [33]:
def evaluate_model(model, dataloader, device):
    """
    Evalúa el modelo en un conjunto de datos
    """
    model.eval()
    predictions = []
    actual_labels = []

    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels']

            outputs = model(input_ids, attention_mask=attention_mask)
            predictions.extend(outputs.logits.squeeze().cpu().numpy())
            actual_labels.extend(labels.numpy())

    return mean_absolute_error(actual_labels, predictions)

In [38]:
all_results = evaluate_all_emotions()

# Mostrar resultados completos
print("\n=== RESULTADOS FINALES ===")
for emotion in all_results:
    print(f"\n{emotion.upper()}:")
    for model_name, mae in all_results[emotion].items():
        print(f"{model_name}: MAE = {mae:.4f}")


=== Evaluating joy ===

Evaluating mBERT...


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 52/52 [00:17<00:00,  2.96it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.22it/s]


Average training loss: 0.0549
Average validation loss: 0.0351
Validation MAE: 0.1561

Epoch 2/3


Training: 100%|██████████| 52/52 [00:17<00:00,  2.93it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00,  9.74it/s]


Average training loss: 0.0317
Average validation loss: 0.0276
Validation MAE: 0.1308

Epoch 3/3


Training: 100%|██████████| 52/52 [00:18<00:00,  2.87it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00,  9.35it/s]


Average training loss: 0.0227
Average validation loss: 0.0199
Validation MAE: 0.1131
Training MAE: 0.1131
Test MAE: 0.2335

Evaluating XLM-RoBERTa...


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 52/52 [00:20<00:00,  2.54it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.05it/s]


Average training loss: 0.1274
Average validation loss: 0.0453
Validation MAE: 0.1763

Epoch 2/3


Training: 100%|██████████| 52/52 [00:20<00:00,  2.54it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.30it/s]


Average training loss: 0.0510
Average validation loss: 0.0271
Validation MAE: 0.1329

Epoch 3/3


Training: 100%|██████████| 52/52 [00:20<00:00,  2.56it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.56it/s]


Average training loss: 0.0328
Average validation loss: 0.0214
Validation MAE: 0.1165
Training MAE: 0.1165
Test MAE: 0.1894

Evaluating mT5...


Some weights of MT5ForSequenceClassification were not initialized from the model checkpoint at google/mt5-base and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 52/52 [00:48<00:00,  1.08it/s]
Validation: 100%|██████████| 5/5 [00:01<00:00,  3.62it/s]


Average training loss: 0.3144
Average validation loss: 0.1358
Validation MAE: 0.3045

Epoch 2/3


Training: 100%|██████████| 52/52 [00:48<00:00,  1.08it/s]
Validation: 100%|██████████| 5/5 [00:01<00:00,  3.71it/s]


Average training loss: 0.2023
Average validation loss: 0.1306
Validation MAE: 0.2799

Epoch 3/3


Training: 100%|██████████| 52/52 [00:48<00:00,  1.08it/s]
Validation: 100%|██████████| 5/5 [00:01<00:00,  3.67it/s]


Average training loss: 0.1934
Average validation loss: 0.1168
Validation MAE: 0.2745
Training MAE: 0.2745
Test MAE: 0.3042

Results for joy:
mBERT:
  Training MAE = 0.1131
  Test MAE = 0.2335
XLM-RoBERTa:
  Training MAE = 0.1165
  Test MAE = 0.1894
mT5:
  Training MAE = 0.2745
  Test MAE = 0.3042

=== Evaluating anger ===

Evaluating mBERT...


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 54/54 [00:19<00:00,  2.84it/s]
Validation: 100%|██████████| 6/6 [00:00<00:00, 10.62it/s]


Average training loss: 0.0530
Average validation loss: 0.0236
Validation MAE: 0.1175

Epoch 2/3


Training: 100%|██████████| 54/54 [00:18<00:00,  2.84it/s]
Validation: 100%|██████████| 6/6 [00:00<00:00, 10.63it/s]


Average training loss: 0.0260
Average validation loss: 0.0198
Validation MAE: 0.1071

Epoch 3/3


Training: 100%|██████████| 54/54 [00:19<00:00,  2.84it/s]
Validation: 100%|██████████| 6/6 [00:00<00:00, 10.66it/s]


Average training loss: 0.0150
Average validation loss: 0.0205
Validation MAE: 0.1103
Training MAE: 0.1103
Test MAE: 0.2393

Evaluating XLM-RoBERTa...


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 54/54 [00:21<00:00,  2.54it/s]
Validation: 100%|██████████| 6/6 [00:00<00:00, 11.41it/s]


Average training loss: 0.0577
Average validation loss: 0.0240
Validation MAE: 0.1157

Epoch 2/3


Training: 100%|██████████| 54/54 [00:21<00:00,  2.55it/s]
Validation: 100%|██████████| 6/6 [00:00<00:00, 11.52it/s]


Average training loss: 0.0340
Average validation loss: 0.0231
Validation MAE: 0.1158

Epoch 3/3


Training: 100%|██████████| 54/54 [00:21<00:00,  2.55it/s]
Validation: 100%|██████████| 6/6 [00:00<00:00, 11.43it/s]


Average training loss: 0.0277
Average validation loss: 0.0181
Validation MAE: 0.1015
Training MAE: 0.1015
Test MAE: 0.2009

Evaluating mT5...


Some weights of MT5ForSequenceClassification were not initialized from the model checkpoint at google/mt5-base and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 54/54 [00:50<00:00,  1.07it/s]
Validation: 100%|██████████| 6/6 [00:01<00:00,  4.13it/s]


Average training loss: 0.1548
Average validation loss: 0.0501
Validation MAE: 0.1705

Epoch 2/3


Training: 100%|██████████| 54/54 [00:50<00:00,  1.08it/s]
Validation: 100%|██████████| 6/6 [00:01<00:00,  4.19it/s]


Average training loss: 0.1416
Average validation loss: 0.0622
Validation MAE: 0.1958

Epoch 3/3


Training: 100%|██████████| 54/54 [00:50<00:00,  1.08it/s]
Validation: 100%|██████████| 6/6 [00:01<00:00,  4.17it/s]


Average training loss: 0.1469
Average validation loss: 0.0527
Validation MAE: 0.1736
Training MAE: 0.1736
Test MAE: 0.2592

Results for anger:
mBERT:
  Training MAE = 0.1103
  Test MAE = 0.2393
XLM-RoBERTa:
  Training MAE = 0.1015
  Test MAE = 0.2009
mT5:
  Training MAE = 0.1736
  Test MAE = 0.2592

=== Evaluating sadness ===

Evaluating mBERT...


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 50/50 [00:17<00:00,  2.86it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.07it/s]


Average training loss: 0.0676
Average validation loss: 0.0305
Validation MAE: 0.1465

Epoch 2/3


Training: 100%|██████████| 50/50 [00:17<00:00,  2.86it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.04it/s]


Average training loss: 0.0419
Average validation loss: 0.0300
Validation MAE: 0.1485

Epoch 3/3


Training: 100%|██████████| 50/50 [00:17<00:00,  2.86it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00,  9.91it/s]


Average training loss: 0.0335
Average validation loss: 0.0269
Validation MAE: 0.1396
Training MAE: 0.1396
Test MAE: 0.2220

Evaluating XLM-RoBERTa...


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 50/50 [00:19<00:00,  2.57it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.79it/s]


Average training loss: 0.0866
Average validation loss: 0.0390
Validation MAE: 0.1646

Epoch 2/3


Training: 100%|██████████| 50/50 [00:19<00:00,  2.56it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.93it/s]


Average training loss: 0.0472
Average validation loss: 0.0221
Validation MAE: 0.1165

Epoch 3/3


Training: 100%|██████████| 50/50 [00:19<00:00,  2.57it/s]
Validation: 100%|██████████| 5/5 [00:00<00:00, 10.91it/s]


Average training loss: 0.0400
Average validation loss: 0.0233
Validation MAE: 0.1204
Training MAE: 0.1204
Test MAE: 0.2098

Evaluating mT5...


Some weights of MT5ForSequenceClassification were not initialized from the model checkpoint at google/mt5-base and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 50/50 [00:46<00:00,  1.08it/s]
Validation: 100%|██████████| 5/5 [00:01<00:00,  3.94it/s]


Average training loss: 0.3183
Average validation loss: 0.1348
Validation MAE: 0.3102

Epoch 2/3


Training: 100%|██████████| 50/50 [00:46<00:00,  1.09it/s]
Validation: 100%|██████████| 5/5 [00:01<00:00,  3.90it/s]


Average training loss: 0.1709
Average validation loss: 0.1065
Validation MAE: 0.2636

Epoch 3/3


Training: 100%|██████████| 50/50 [00:46<00:00,  1.08it/s]
Validation: 100%|██████████| 5/5 [00:01<00:00,  3.97it/s]


Average training loss: 0.1432
Average validation loss: 0.1011
Validation MAE: 0.2461
Training MAE: 0.2461
Test MAE: 0.2881

Results for sadness:
mBERT:
  Training MAE = 0.1396
  Test MAE = 0.2220
XLM-RoBERTa:
  Training MAE = 0.1204
  Test MAE = 0.2098
mT5:
  Training MAE = 0.2461
  Test MAE = 0.2881

=== Evaluating fear ===

Evaluating mBERT...


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 72/72 [00:25<00:00,  2.83it/s]
Validation: 100%|██████████| 7/7 [00:00<00:00,  9.33it/s]


Average training loss: 0.0502
Average validation loss: 0.0240
Validation MAE: 0.1287

Epoch 2/3


Training: 100%|██████████| 72/72 [00:25<00:00,  2.82it/s]
Validation: 100%|██████████| 7/7 [00:00<00:00,  9.42it/s]


Average training loss: 0.0259
Average validation loss: 0.0178
Validation MAE: 0.1082

Epoch 3/3


Training: 100%|██████████| 72/72 [00:25<00:00,  2.84it/s]
Validation: 100%|██████████| 7/7 [00:00<00:00,  9.61it/s]


Average training loss: 0.0162
Average validation loss: 0.0165
Validation MAE: 0.1022
Training MAE: 0.1022
Test MAE: 0.2360

Evaluating XLM-RoBERTa...


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 72/72 [00:28<00:00,  2.53it/s]
Validation: 100%|██████████| 7/7 [00:00<00:00, 10.10it/s]


Average training loss: 0.0663
Average validation loss: 0.0274
Validation MAE: 0.1377

Epoch 2/3


Training: 100%|██████████| 72/72 [00:28<00:00,  2.54it/s]
Validation: 100%|██████████| 7/7 [00:00<00:00, 10.33it/s]


Average training loss: 0.0366
Average validation loss: 0.0249
Validation MAE: 0.1271

Epoch 3/3


Training: 100%|██████████| 72/72 [00:28<00:00,  2.54it/s]
Validation: 100%|██████████| 7/7 [00:00<00:00, 10.15it/s]


Average training loss: 0.0293
Average validation loss: 0.0202
Validation MAE: 0.1153
Training MAE: 0.1153
Test MAE: 0.1986

Evaluating mT5...


Some weights of MT5ForSequenceClassification were not initialized from the model checkpoint at google/mt5-base and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/3


Training: 100%|██████████| 72/72 [01:07<00:00,  1.07it/s]
Validation: 100%|██████████| 7/7 [00:01<00:00,  3.71it/s]


Average training loss: 0.7935
Average validation loss: 0.1335
Validation MAE: 0.2983

Epoch 2/3


Training: 100%|██████████| 72/72 [01:07<00:00,  1.07it/s]
Validation: 100%|██████████| 7/7 [00:01<00:00,  3.72it/s]


Average training loss: 0.2016
Average validation loss: 0.1397
Validation MAE: 0.2978

Epoch 3/3


Training: 100%|██████████| 72/72 [01:07<00:00,  1.07it/s]
Validation: 100%|██████████| 7/7 [00:01<00:00,  3.73it/s]


Average training loss: 0.1802
Average validation loss: 0.1370
Validation MAE: 0.2913
Training MAE: 0.2913
Test MAE: 0.3291

Results for fear:
mBERT:
  Training MAE = 0.1022
  Test MAE = 0.2360
XLM-RoBERTa:
  Training MAE = 0.1153
  Test MAE = 0.1986
mT5:
  Training MAE = 0.2913
  Test MAE = 0.3291

=== RESULTADOS FINALES ===

JOY:


TypeError: unsupported format string passed to dict.__format__