<a href="https://colab.research.google.com/github/mxag11z/EMO/blob/main/ModeloRegresion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Regresion 4 emociones Joy, sadness, anger, fear

Estrategia 1: Utilizar la traducción automática, traducir el conjunto de datos de evaluación en español al inglés y evaluar un modelo entrenado en el conjunto de entrenamiento en inglés.


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import nltk
import numpy as np
from sklearn import linear_model
import sklearn.metrics
from sklearn.feature_extraction.text import CountVectorizer

nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [4]:
def read_data(emotion):
    """
    Lee los datos de train y test para una emoción
    """
    # Datos de entrenamiento
    with open(f"/content/drive/MyDrive/PLN project/data/en/train/{emotion}.txt", 'r', encoding='utf-8') as f:
        train_X = f.readlines()
    with open(f"/content/drive/MyDrive/PLN project/data/en/train/{emotion}_labels.txt", 'r', encoding='utf-8') as f:
        train_y = [float(line.strip()) for line in f.readlines()]

    # Datos de test
    with open(f"/content/drive/MyDrive/PLN project/data/es_test_translatedEn/translated_testToEn_{emotion}.txt", 'r', encoding='utf-8') as f:
        test_X = f.readlines()
    with open(f"/content/drive/MyDrive/PLN project/data/es/test/{emotion}_labels.txt", 'r', encoding='utf-8') as f:
        test_y = [float(line.strip()) for line in f.readlines()]

    return (train_X, train_y), (test_X, test_y)

Palabras que contribuyen a predecir la aparición de la emocion (positive features)o su asuencia (negative features)

In [5]:
def analyze_weights(model, vectorizer, emotion, num_features=5):
    """
    Analiza los pesos más importantes del modelo
    """
    reverse_vocab = {v: k for k, v in vectorizer.vocabulary_.items()}
    sort_index = np.argsort(model.coef_)

    print(f"\nTop positive features for {emotion}:")
    for k in reversed(sort_index[-num_features:]):
        print(f"{model.coef_[k]:.5f}\t{reverse_vocab[k]}")

    print(f"\nTop negative features for {emotion}:")
    for k in sort_index[:num_features]:
        print(f"{model.coef_[k]:.5f}\t{reverse_vocab[k]}")

# Función para entrenar y evaluar el modelo para una emoción

In [6]:
def train_emotion_regressor(emotion,alpha=100):
    """
    Entrena y evalúa un regresor para una emoción específica
    """

    vectorizer = CountVectorizer(
        max_features=10000,
        ngram_range=(1,2),
        lowercase=True,
        strip_accents=None,
        binary=True
    )

    # Cargar datos
    (train_X, train_y), (test_X, test_y) = read_data(emotion)

    X_train = vectorizer.fit_transform(train_X)
    X_test = vectorizer.transform(test_X)

    # Entrenar modelo
    model = linear_model.Ridge(alpha=alpha, fit_intercept=True)
    model.fit(X_train, train_y)

    # Evaluar
    preds = model.predict(X_test)
    mae = sklearn.metrics.mean_absolute_error(test_y, preds)

    # Analizar pesos
    analyze_weights(model, vectorizer, emotion)

    return model, vectorizer, mae

# Función para entrenar y evaluar todas las emociones

In [7]:
def train_all_emotions():
    """
    Entrena y evalúa regresores para todas las emociones
    """
    emotions = ['joy', 'anger', 'sadness', 'fear']
    results = {} #set

    for emotion in emotions:
        print(f"\n-----Processing {emotion}")
        model, vectorizer, mae = train_emotion_regressor(emotion)
        print(f"MAE{emotion}: {mae:.4f}")

        results[emotion] = {
            'model': model,
            'vectorizer': vectorizer,
            'mae': mae
        }

    return results

In [8]:
resultados = train_all_emotions()


-----Processing joy

Top positive features for joy:
0.04660	happy
0.04222	hilarious
0.03923	love
0.02730	day
0.02431	today

Top negative features for joy:
-0.02428	but
-0.01930	when
-0.01914	if
-0.01855	glee
-0.01807	don
MAEjoy: 0.2206

-----Processing anger

Top positive features for anger:
0.04270	fuming
0.03374	angry
0.02662	people
0.02391	so
0.02126	me

Top negative features for anger:
-0.02307	love
-0.01571	follow
-0.01522	snap
-0.01473	frown
-0.01442	sting
MAEanger: 0.2213

-----Processing sadness

Top positive features for sadness:
0.05167	depression
0.04154	depressing
0.03508	sad
0.03323	my
0.03091	sadness

Top negative features for sadness:
-0.04273	serious
-0.02874	blues
-0.02745	pine
-0.01969	dark
-0.01836	sober
MAEsadness: 0.2245

-----Processing fear

Top positive features for fear:
0.06426	nervous
0.04802	anxiety
0.04437	panic
0.04171	nightmare
0.03207	terror

Top negative features for fear:
-0.02861	start
-0.02779	awe
-0.02598	you
-0.02387	terrific
-0.02082	shake
MAEfea

#RESULTADOS FINALES CON MAE COMO METRICA

In [9]:

  print("\n=== RESULTADOS ===")
  for emotion, data in resultados.items():
        print(f"{emotion}: MAE = {data['mae']:.4f}")


=== RESULTADOS ===
joy: MAE = 0.2206
anger: MAE = 0.2213
sadness: MAE = 0.2245
fear: MAE = 0.2148


# Estrategia 2
Estrategia 2: Utilizar la traducción automática, traducir el conjunto de entrenamiento y desarrollo en inglés al español y evaluar un modelo entrenado en el conjunto de traducido en textos en español. Definamos una función para leer otro conjunto de datos, los datos de entrenamiento traducidos del ingles al español. Podriamos reutilizar la función read_data, pero como defini la ruta especifica en la función definamosla de nuevo

In [17]:
def read_data2(emotion):
    """
    Lee los datos de train y test para una emoción
    """
    # Datos de entrenamiento
    with open(f"/content/drive/MyDrive/PLN project/data/en_train&dev_translatedEs/train/translated_trainToEs_{emotion}.txt", 'r', encoding='utf-8') as f:
        train_X = f.readlines()
    with open(f"/content/drive/MyDrive/PLN project/data/en/train/{emotion}_labels.txt", 'r', encoding='utf-8') as f:
        train_y = [float(line.strip()) for line in f.readlines()]

    # Datos de test
    with open(f"/content/drive/MyDrive/PLN project/data/es/test/{emotion}.txt", 'r', encoding='utf-8') as f:
        test_X = f.readlines()
    with open(f"/content/drive/MyDrive/PLN project/data/es/test/{emotion}_labels.txt", 'r', encoding='utf-8') as f:
        test_y = [float(line.strip()) for line in f.readlines()]

    return (train_X, train_y), (test_X, test_y)


In [14]:
def train_emotion_regressor2(emotion,alpha=100):
    """
    Entrena y evalúa un regresor para una emoción específica
    """

    vectorizer = CountVectorizer(
        max_features=10000,
        ngram_range=(1,2),
        lowercase=True,
        strip_accents=None,
        binary=True
    )

    # Cargar datos
    (train_X, train_y), (test_X, test_y) = read_data2(emotion)

    X_train = vectorizer.fit_transform(train_X)
    X_test = vectorizer.transform(test_X)

    # Entrenar modelo
    model = linear_model.Ridge(alpha=alpha, fit_intercept=True)
    model.fit(X_train, train_y)

    # Evaluar
    preds = model.predict(X_test)
    mae = sklearn.metrics.mean_absolute_error(test_y, preds)

    # Analizar pesos
    analyze_weights(model, vectorizer, emotion)

    return model, vectorizer, mae

In [15]:
def train_all_emotions2():
    """
    Entrena y evalúa regresores para todas las emociones
    """
    emotions = ['joy', 'anger', 'sadness', 'fear']
    results = {} #set

    for emotion in emotions:
        print(f"\n-----Processing {emotion}")
        model, vectorizer, mae = train_emotion_regressor2(emotion)
        print(f"MAE{emotion}: {mae:.4f}")

        results[emotion] = {
            'model': model,
            'vectorizer': vectorizer,
            'mae': mae
        }

    return results

In [18]:
resultados_train_es = train_all_emotions2();


-----Processing joy

Top positive features for joy:
0.03801	feliz
0.03407	día
0.03192	gracias
0.02786	más
0.02758	estoy

Top negative features for joy:
-0.03939	no
-0.02538	pero
-0.02246	la
-0.02090	ser
-0.01861	sobre
MAEjoy: 0.2212

-----Processing anger

Top positive features for anger:
0.02483	fuming
0.02473	enojado
0.02223	mi
0.02108	por
0.02096	la gente

Top negative features for anger:
-0.01749	me encanta
-0.01749	encanta
-0.01742	es
-0.01517	las
-0.01266	pero
MAEanger: 0.2254

-----Processing sadness

Top positive features for sadness:
0.04155	depresión
0.03593	triste
0.02854	mi
0.02421	deprimente
0.02028	insatisfecho

Top negative features for sadness:
-0.03066	se
-0.02848	serio
-0.02464	el
-0.02279	blues
-0.02250	en el
MAEsadness: 0.2249

-----Processing fear

Top positive features for fear:
0.05112	nervioso
0.04248	ansiedad
0.03720	pesadilla
0.03261	me
0.03258	terrorismo

Top negative features for fear:
-0.02523	gracias
-0.02066	lo
-0.01692	noche
-0.01525	todo
-0.01419	son
M

In [20]:

  print("\n=== RESULTADOS con train en español===")
  for emotion, data in resultados_train_es.items():
        print(f"{emotion}: MAE = {data['mae']:.4f}")


=== RESULTADOS con train en español===
joy: MAE = 0.2212
anger: MAE = 0.2254
sadness: MAE = 0.2249
fear: MAE = 0.2147
