# ***Exploratory Data Analysis***

This notebook shows the process of performing exploratory data analysis (EDA).

### ***Import packages***

Before we begin, let's import all the necessary packages for this notebook:

In [None]:
import os
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

# Set plotting theme
sns.set_theme(style="whitegrid")

### ***Read data***

Next, let's read the data:

In [None]:
# Read data
annotated_df = pd.read_csv("annotated_data.csv")

# Add text length column
annotated_df["text_len"] = annotated_df.text.str.len()

# Print data
annotated_df

### ***Plot interesting elements in the data***

Next, let's creates some plots of the data. We begin with comparing the text length for hat and non-hate groups:

In [None]:
fig, ax = plt.subplots(1)
ax = sns.histplot(
    annotated_df, x="text_len", hue="hate_label", alpha=0.6, ax=ax, stat="count"
)
plt.xlabel("Text Length", labelpad=10, fontsize=22)
plt.xticks(fontsize=14)
plt.ylabel("Count", labelpad=10, fontsize=22)
plt.yticks(fontsize=14)
plt.title("Text Length", pad=10, fontsize=24)
sns.despine()
plt.setp(ax.get_legend().get_texts(), fontsize='22')
plt.setp(ax.get_legend().get_title(), fontsize='22')
plt.tight_layout()
plt.show()

Next, let's plot how many of the records in the data are hate & non-hate:

In [None]:
fig, ax = plt.subplots(1, )
ax = sns.countplot(annotated_df, x="hate_label", alpha=0.6)
ax.bar_label(ax.containers[0], fontsize=22)
plt.xlabel("")
plt.ylabel("")
plt.yticks([], fontsize=22)
plt.xticks(fontsize=22)
plt.title("Is Post Hate Speech?", pad=30, fontsize=24)
sns.despine(left=True, bottom=True)
plt.tight_layout()
plt.show()

Next, let's plot how many of the hate records we have per group and if they are implicit or explicit:

In [None]:
fig, ax = plt.subplots(1)
ax = sns.countplot(annotated_df, x="implicit_hate", hue="hate_target_label", alpha=0.6)
for container in ax.containers:
    ax.bar_label(container, fontsize=22)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), fontsize=22)
plt.xlabel("")
plt.ylabel("")
plt.yticks([])
plt.xticks(fontsize=22)
plt.title("Is Post Implicit Hate?", pad=30, fontsize=24)
sns.despine(left=True, bottom=True)
plt.tight_layout()
plt.show()