![](https://www.pewtrusts.org/-/media/post-launch-images/2018/01/sln_jan23_1/sln_jan23_1_16x9.jpg)

# Introduction

The Me Too (or #MeToo) movement, with variations of related local or international names, is a movement against sexual harassment and sexual abuse where people publicize their allegations of sex crimes committed by powerful and/or prominent men. <br>

The dataset contains Twitter posts (tweets) made during the MeToo movement by various Twitter accounts and some of them as been classifed as hateful (1), whilst others are more benign (0). Our job is to build a simple classifier that can distingiush hateful and non-hateful tweets.

This notebook is based mostly on Sklearn documention and subsequent TDS article:

https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html <br>
https://towardsdatascience.com/naive-bayes-document-classification-in-python-e33ff50f937e <br>

Key takeaways:

* Naive-Bayes is simple and does at good job at classifying text when looking at overall precession.
* The data is imbalanced (lot more non-hate and than hate tweets), so NB has a low recall score (many false negatives)
* Version 1 used ONLY the text of the tweet
* Version 2+ uses the text of the tweet AND statistics about the tweet (likes, retweets etc).
* Version 2+ slightly improves classification score using Naive-Bayes.

We start by loading the data and do some simple EDA to get the lay of the land. Loading all tweets (about 700000) causes the kernel here on Kaggle to run out of memory, so instead we load about half the dataset (300000) which will be enough for a demonstration.

In [None]:
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.model_selection import train_test_split

pd.reset_option('^display.', silent=True)

# Load half the data and separate target from predictors
X = pd.read_csv('../input/hatred-on-twitter-during-metoo-movement/MeTooHate.csv', nrows=300000)
X.dropna(axis=0, subset=['text', 'category'], inplace=True)
y = X.category
X.drop(['category'], axis=1, inplace=True)

# Drop columns not used for modelling
cols_to_drop = ['status_id', 'created_at', 'location']
X.drop(cols_to_drop, axis=1, inplace=True)

# Split the data while maintaining the proportion of hate/non-hate (stratify) 
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

# Reset the index
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
X_test_stats = X_test.copy()

print("Total training samples:", len(X_train))
print("Total test samples:", len(X_test))

X_train.head(10)

In [None]:
# Show descriptive statistics of training set
X_train.describe()

In [None]:
# Show how many values are non-null for each feature
X_train.info()

In [None]:
# Print a random tweet as a sample
sample_index = 25
print(X_train.iloc[sample_index])

In [None]:
# Plot the target label and notice that it is imbalanced

y_train.value_counts().plot(kind='bar')

# Feature encoding

Our job here is to transform the text feature (the tweet) into vectors, that a classifier can understand. In this notebook we use the Naive Bayes classifier. Our classifier needs to be able to calculate how many times each word appears in each document and how many times it appears in each category. To make this possible, the data needs to look something like this: <br>

[0, 1, 0, …] <br>
[1, 1, 1, …] <br>
[0, 2, 0, …] <br>

Each row represents a document, and each column represents a word. The first row might be a document that contains a zero for “dumb,” a one for “the” and a zero for “hate”. That means that the document contains one instance of the word “the”, but no “dumb” or “hate.” <br>

We'll use Scikit Learn’s CountVectorizer to turn the tweets into count vectors. CountVectorizer creates a vector of word counts for each abstract to form a matrix. Each index corresponds to a word and every word appearing in the texts is represented.

Source: https://towardsdatascience.com/naive-bayes-document-classification-in-python-e33ff50f937e

In [None]:
# Convert the text feature into a vectors of tokens
from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(strip_accents='ascii', token_pattern=u'(?ui)\\b\\w*[a-z]+\\w*\\b',
                             lowercase=True, stop_words='english')
X_train_cv = cv.fit_transform(X_train.text)
X_test_cv = cv.transform(X_test.text)

# Scale numerical features (followers, retweets etc.)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
cols = ['favorite_count', 'retweet_count', 'followers_count', 'friends_count', 'statuses_count']
X_train_sc = scaler.fit_transform(X_train[cols])
X_test_sc = scaler.transform(X_test[cols])

# Merge the numerical features with our count vectors
import scipy.sparse as sp
train_count = sp.csr_matrix(X_train_cv)
train_num = sp.csr_matrix(X_train_sc)
X_train = sp.hstack([train_count, train_num])

test_count = sp.csr_matrix(X_test_cv)
test_num = sp.csr_matrix(X_test_sc)
X_test = sp.hstack([test_count, test_num])

# Save top words for training set
word_freq_df = pd.DataFrame(X_train_cv.toarray(), columns=cv.get_feature_names())

Next we'll print the top 20 words occuring our tweets. Not surprisingly, women and movement are the two most occuring words.

In [None]:
# Top 20 words occuring in tweets
pd.DataFrame(word_freq_df.sum()).sort_values(0, ascending=False).head(20)

# Model

Naive Bayes classification makes use of Bayes theorem to determine the probability that each sample (tweet) belongs to a certain category. If my tweet contains the words "hate", "go" and "away", what’s the probability that it falls in the category “hate” rather than “non-hate"? Naive Bayes sorts the samples in two groups based on the highest probability for each sample.

We'll use the **MultinomialNB** since it is suitable for classifying a multinomial (binomial) model from discreate features (e.g., word counts for text).

In [None]:
# Train a Naive-Bayes classifier to classify hate/non-hate tweets

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)

In [None]:
# Plot scores and make a confusion matrix for non-hate/hate predictions

from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import confusion_matrix
n_classes = 2
cm = confusion_matrix(y_test, predictions, labels=range(n_classes))

print(f'Number of samples to classify: {len(X_test.toarray())}\n')
print(f'Accuracy score: {accuracy_score(y_test, predictions)}')
print(f'Precision score: {precision_score(y_test, predictions)}')
print(f'Recall score: {recall_score(y_test, predictions)}\n')
print(f'Confusion matrix: \n{cm}')

A few remarks on the notation and the scores:

**Accuracy score:** Out of all tweets, how many did we label correctly? <br>
(True positives + true negatives) / total observations: (4866 + 64190) / 74712 <br>

**Precision score:** Out of all hate tweets, how many did we get right? <br>
True positives / (true positives + false positives): 4866 / (4866 + 1537)

**Recall score:** Out of all true hate tweets, how many did we label correctly? <br>
True positives / (true positives + false negatives): 4866 / (4866 + 4119)

Since most of the training tweets are true-positive (non-hate) tweets with about a 80-20 ratio in favor, our classifer is good at classifying the non-hate ones, but struggles to classify the ones with hatred (notice how many false positives we have).

In [None]:
# Normalize the confusion matrix and plot it

plt.figure(figsize=(6,6))
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm, square=True, annot=True, cbar=False,
            xticklabels=['non-hate', 'hate'], yticklabels=['non-hate', 'hate'])
plt.xlabel('Predicted label')
plt.ylabel('True label')

It shows that **MultinomialNB** struggles with tweets that are in fact hateful and oftentimes we predict tweets to be hateful, when they are in fact not. Note taht the second confusion matrix is normalized (rows sum to one), which makes our model looks more dire than it actually is. We do get most tweets right (91%).

Next we plot a simple ROC curve that shows the true positive rate vs the false positive rate.

In [None]:
# Plot the ROC curve for the MNB classifier
from sklearn.metrics import roc_curve
fpr, tpr, _ = roc_curve(y_test, predictions)
plt.figure(figsize=(8,8))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label='MNB')
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.show()

In [None]:
# Show how the first 50 test tweets were classified and their true label
testing_predictions = []
for i in range(len(X_test.toarray())):
    if predictions[i] == 1:
        testing_predictions.append('Hate')
    else:
        testing_predictions.append('Non-hate')
check_df = pd.DataFrame({'actual_label': list(y_test), 'prediction': testing_predictions, 'text':list(X_test_stats.text)})
check_df.replace(to_replace=0, value='Non-hate', inplace=True)
check_df.replace(to_replace=1, value='Hate', inplace=True)
check_df.iloc[:50]