# Emotion Detection in Written Sentences

## Dataset Preparation

Splitting the dataset into a 70/20/10 ratio for training, validation, and testing purposes

In [60]:
import pandas as pd
from sklearn.model_selection import train_test_split

data = pd.read_csv('data/dataset.csv')

X = data['text']
y = data['emotion']

# Splitting into train and test/validation, stratifying by the target variable
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.1, stratify=y, random_state=42)

# Splitting train and validation from train_val, stratifying by the target variable
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.22, stratify=y_train_val, random_state=42)

train_data = pd.concat([X_train, y_train], axis=1)
validation_data = pd.concat([X_val, y_val], axis=1)
test_data = pd.concat([X_test, y_test], axis=1)

train_data.to_csv('data/train.csv', index=False)
validation_data.to_csv('data/val.csv', index=False)
test_data.to_csv('data/test.csv', index=False)

## Exploratory Data Analysis (EDA)

Checking for null values and examining the balance of emotions in the dataset.

In [None]:
import matplotlib.pyplot as plt

df = pd.read_csv('data/dataset.csv')
train_df = pd.read_csv('data/train.csv')
val_df = pd.read_csv('data/val.csv')
test_df = pd.read_csv('data/test.csv')

null_values = df.isnull().sum()
print("Number of Null Values in the Dataset:")
print(null_values)

emotion_counts = df['emotion'].value_counts()
print("Distribution of Emotions in the Dataset:")
print(emotion_counts)


plt.figure(figsize=(8, 6))
plt.bar(emotion_counts.index, emotion_counts.values)
plt.xlabel('Emotion')
plt.ylabel('Frequency')
plt.title('Emotion Frequency in the Dataset')
plt.xticks(rotation=45)
plt.show()

Conclusions:

- There are no null values in the dataset, so it does not require handling of missing values.
- The dataset is well-balanced among the different emotional categories, allowing us to proceed with the data as it is, without the need for oversampling or undersampling techniques.