# Bag of Emotions
**Bag of Emotions (BoE)** is a way to represent text that counts the emotions associated with words instead of the number of words.

This is similar to BoW:

- Each document (e.g., a tweet) is represented as a vector.
- Each position in the vector represents a specific emotion (rather than a word).
- The value at that position shows how many times that emotion appears in the document, according to the emotional dictionary.

In [6]:
# 📌 This notebook assumes that corpus processing, tokenization and BoW construction was already performed on the notebook:
# 👉 'feature-extraction/bag_of_words.ipynb'

#The variables used here (such as `BoW_tr`, `tr_txt`, `V1`, `dict_indices1`) were built there.
#If you want to re-run the pipeline from scratch, check that file first.

> 🔗 **Note:** The corpus loading, tokenization and construction of the Bag of Words is at
> [`bag_of_words.ipynb`](./feature-extraction/bag_of_words.ipynb)

In [2]:

from sklearn import svm 
from sklearn.model_selection import GridSearchCV 

tr_y=list(map(int,tr_y)) 
parameters= {'C':[0.5, 0.12, 0.25, 0.5, 1, 2, 4]}
svm=svm.LinearSVC(class_weight='balanced',dual=False) 
grid=GridSearchCV(estimator=svm,param_grid=parameters,n_jobs=8,scoring='f1',cv=5) 
val_txt,val_y=get_texts_from_file('./mex20_val.txt','./mex20_val_labels.txt')

In [None]:
from collections import defaultdict
import spacy
import pandas as pd
import numpy as np

emolex_path = "Spanish-NRC-EmoLex.txt"
emolex = pd.read_csv(emolex_path, sep='\t')
emolex_dict = defaultdict(list) 

for _, row in emolex.iterrows():
    palabra = row["Spanish Word"]  
    emociones = row.iloc[1:-1]  
    for emocion, valor in emociones.items():
        if valor == 1:
            emolex_dict[palabra].append(emocion)

def lemma_emotions(tweets, emolex_dict):
    tweets_emociones = [] 
    for tweet in tweets:
        emociones = []
        doc = nlp(tweet.lower())  
        for token in doc:
            lemma = token.lemma_  
            if lemma in emolex_dict:
                emociones.extend(emolex_dict[lemma])  # Add emotions for the lemma
        tweets_emociones.append(emociones)
    return tweets_emociones

def build_boe(tweets_emociones):
    emociones_unicas = sorted(set(emocion for emociones in tweets_emociones for emocion in emociones))
    dict_indices_emociones = {emocion: i for i, emocion in enumerate(emociones_unicas)}
    BoE = np.zeros((len(tweets_emociones), len(emociones_unicas)), dtype=int)
    for i, emociones in enumerate(tweets_emociones):
        for emocion in emociones:
            index = dict_indices_emociones[emocion]
            BoE[i, index] += 1  
    return BoE, dict_indices_emociones

nlp = spacy.load("es_core_news_sm")

tweets_emociones_tr = lemma_emotions(tr_txt, emolex_dict)
tweets_emociones_val = lemma_emotions(val_txt, emolex_dict)

BoE_tr, dict_indices_emociones = build_boe(tweets_emociones_tr)
BoE_val, _ = build_boe(tweets_emociones_val)

BoE_tr.shape, BoE_val.shape

((5278, 10), (587, 10))

In [5]:
resultados = []
num_mostrar=10
for i, (tweet, emociones) in enumerate(zip(val_txt[:num_mostrar], tweets_emociones_val[:num_mostrar])):
    indices_emociones = {dict_indices_emociones[emocion] for emocion in emociones if emocion in dict_indices_emociones}
    resultado = f"Tweet {i+1}: {tweet}\nEmociones detectadas: {emociones}\n"
    resultados.append(resultado)
print("\n".join(resultados))

Tweet 1: Al perro que se te acerque le parto su madre a si de facil

Emociones detectadas: ['anticipation', 'joy', 'negative', 'positive', 'sadness', 'trust']

Tweet 2: @USUARIO @USUARIO Él supo sacar a su familia adelante en lo que sabe en el mundo existen muchas personas ardidas como tú 🤷🏻‍♀️

Emociones detectadas: ['positive', 'positive']

Tweet 3: @USUARIO Entonces para que quieres estar en sus paises?, mejor vente aca y chinguele cabrona de verdad, maldita sangana.

Emociones detectadas: ['positive', 'trust', 'anger', 'fear', 'negative', 'sadness', 'anger', 'fear', 'negative', 'sadness', 'negative']

Tweet 4: Que bueno que hoy juega México, porque tú vales verga, hija de la chingada.

Emociones detectadas: ['anticipation', 'joy', 'positive', 'surprise', 'trust', 'negative', 'joy', 'positive']

Tweet 5: Ojalá un día me valgas la misma verga que al vato de la cfe despelucando y ponchando a medio del puente del puente a Cd judicial

Emociones detectadas: ['anticipation', 'positive', 