# Content Warning 

This project involves the use of lots of inappropriate and downright offensive/upsetting language. Some of the data analysis methods below print these texts to the notebook so they are visible to whoever opens it. Those who view the notebook please do so knowing this in advance. 

Topics: sexism, racism, misogyny, and lots more unfortunately. 

# Hidden Markov Models

Here I will be using hidden markov models to detect hate speech 

# Code : 

First we will import the necesary packages

In [188]:
import pandas as pd
import numpy as np
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from wordcloud import WordCloud
from nltk.stem import WordNetLemmatizer
import matplotlib.pyplot as plt
from collections import Counter
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\hunte\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\hunte\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\hunte\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


And now read in the data file, printing the size and the first five data entries 

In [189]:
# read in input file and output example of first five data entires
df = pd.read_csv("data/ethos/Ethos_Dataset_Binary.csv", sep=';', header=0)
print("size = ", df.size)
df.head()

size =  1996


Unnamed: 0,comment,isHate
0,You should know women's sports are a joke,1.0
1,You look like Sloth with deeper Down’s syndrome,1.0
2,You look like Russian and speak like Indian. B...,1.0
3,"Women deserve to be abused, I guess.",1.0
4,Women are made for making babies and cooking d...,1.0


Next we will clean and prepare the data to be analyzed, and again check output

In [190]:
# convert data to lowercase, remove all special characters, and tokenize
df['comment'] = df['comment'].str.lower()
df['words'] = df['comment'].str.replace(r'[^ \w\s]', '', regex=True)
df['words'] = df['words'].apply(nltk.word_tokenize)

# defining stop words and lemmatizer
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

# defining clean and process
def clean_and_process(row) :
    processed_words = []
    for word in row['words'] :
        if word not in stop_words :
            lemmatized = lemmatizer.lemmatize(word)
            processed_words.append(lemmatized)
    return (' '.join(processed_words))

df['processed'] = df.apply(clean_and_process, axis=1)

# print first five to check cleaning
df.head()

Unnamed: 0,comment,isHate,words,processed
0,you should know women's sports are a joke,1.0,"[you, should, know, womens, sports, are, a, joke]",know woman sport joke
1,you look like sloth with deeper down’s syndrome,1.0,"[you, look, like, sloth, with, deeper, downs, ...",look like sloth deeper down syndrome
2,you look like russian and speak like indian. b...,1.0,"[you, look, like, russian, and, speak, like, i...",look like russian speak like indian disgusting...
3,"women deserve to be abused, i guess.",1.0,"[women, deserve, to, be, abused, i, guess]",woman deserve abused guess
4,women are made for making babies and cooking d...,1.0,"[women, are, made, for, making, babies, and, c...",woman made making baby cooking dinner nothing ...


Next, split the data into training and validation sets

In [191]:
train = df.sample(frac = 0.25, random_state = 1)
test = df.drop(train.index)

Now, I will start by seperating the training data into hate / not hate. In the below I use bigrams to create probability maps for sequential words in both training hate classes. 

In [200]:
# vectorization of the "train" training data
vect = CountVectorizer(ngram_range=(2,2))
X_train = vect.fit_transform(train['processed'])
Y_train = train['isHate'] # unused

# defining a barrier for classifying test data as hate or not hate
barrier = 0.16666667
# seperating training data based on isHate
hate_yes = train[train['isHate'] > barrier]
hate_no = train[train['isHate'] <= barrier]

# transform both sets into bigram frequency vectors
X_train_hate_yes = vect.transform(hate_yes['processed'])
X_train_hate_no = vect.transform(hate_no['processed'])

# calculate the frequency of each bigram across all comments in each class
freq_hate_yes = np.array(X_train_hate_yes.sum(axis=0)).flatten()
freq_hate_no = np.array(X_train_hate_no.sum(axis=0)).flatten()
# total count of bigrams in each class
total_hate_yes = freq_hate_yes.sum()
total_hate_no = freq_hate_no.sum()

# proportions of hate to not hate in the training set
prop_hate_yes = len(hate_yes) / len(train)
prop_hate_no = len(hate_no) / len(train)
# probabilities of bigrams in each class
prob_bigram_hate_yes = freq_hate_yes / total_hate_yes
prob_bigram_hate_no = freq_hate_no / total_hate_no

# extract bigram feature names
feature_names = vect.get_feature_names_out()
# creating new dataframes for each class that map the bigrams to their probabilities
# within that class 
hate_yes_df = pd.DataFrame({'bigram': feature_names, 'probability': prob_bigram_hate_yes})
hate_no_df = pd.DataFrame({'bigram': feature_names, 'probability': prob_bigram_hate_no})


Next, define a method to predict whether a new sentence is hate speech given the above bigram dataframes 

In [201]:
def predict_hate_speech(sentence) :
    tokens = word_tokenize(sentence.lower())
    bigrams = [' '.join(bigram) for bigram in zip(tokens[:-1], tokens[1:])]
    log_prob_yes = np.log(prop_hate_yes)
    log_prob_no = np.log(prop_hate_no)
    
    # silence divide by zero warnings cause have laplace smoothing
    np.seterr(divide='ignore')

    # Calculate the probability of each bigram
    for bigram in bigrams :
        if bigram in hate_yes_df['bigram'].values :
            log_prob_yes += np.log(hate_yes_df.loc[hate_yes_df['bigram'] == bigram, 'probability'].values[0])
        else:
            log_prob_yes += np.log(1e-6)  # laplace smoothing for unseen bigrams
            
        if bigram in hate_no_df['bigram'].values :
            log_prob_no += np.log(hate_no_df.loc[hate_no_df['bigram'] == bigram, 'probability'].values[0])
        else :
            log_prob_no += np.log(1e-6)  # laplace smoothing for unseen bigrams
            
    return 1 if log_prob_yes > log_prob_no else 0

Next, definining a method to test the accuracy of the above model on a fresh dataset

In [202]:
def test_accuracy(dataframe) :
    correct_predictions = 0
    total_rows = len(dataframe)

    for index, row in dataframe.iterrows() :
        prediction = predict_hate_speech(row['processed'])
        actual = row['isHate']

        if prediction == actual :
            correct_predictions += 1
    
    if correct_predictions == 0 :
        accuracy = 0.0
    else :
        accuracy = correct_predictions / total_rows
    
    return accuracy

accuracy_score = test_accuracy(test)
print(f"Accuracy: {accuracy_score: .2f}")

Accuracy:  0.34


Found an accuracy of 34% on detecting hate speech using markov models with bigrams. Changing the barrier of seperation for hate / not hate drastically impacts the resulting bigram model and detection accuracy. Lowering the barrier to accomodate for lighter hate speech increases accuracy, but only at the cost of calling everything hate speech, not actually being selective.

34% is obviously awful given that with a barrier of 0.166667, the dataset is balanced and random guessing would result in around 50% accuracy. 

Will work through improving the model later, this is far as I can get now