# Insult Classification

In this exercise, we would like to filter out insulting comments on a web forum. 

To train our models, we have a list of historic comments with a judgement wether they're insulting or not.

In [15]:
import pandas as pd
path_to_insults = ''
data = pd.read_csv(path_to_insults + 'train-utf8.csv')
data.head(2)

Unnamed: 0,Insult,Date,Comment
0,1,20120618192155Z,You fuck your dad.
1,0,20120528192215Z,i really don't understand your point. It seem...


In [16]:
print ("%d comments, of which %d insults (%d%%)" % \
    (len(data), data.Insult.sum(), 100 * data.Insult.mean()))

3947 comments, of which 1049 insults (26%)


### Looking for known bad words

One way to do this, is to load Google's bad word list and flag comments that contain one or more words.

- Load `google_badlist.txt` from `data/insults/`
- Add a column to `data` with a flag (0 or 1) if the comment contains a bad word
- Compute the accuracy of this method - does this look good?
- What would a naive classifier's score be (i.e., always predicting 0 or 1)?
- Also compute the precision, recall, F1 score and AUC score
- What is your verdict?

In [17]:
filename = path_to_insults + 'google_badlist.txt'
filename

'google_badlist.txt'

In [18]:
badlist = pd.read_csv('google_badlist.txt', header=None)

In [19]:
badlist

Unnamed: 0,0
0,4r5e
1,5h1t
2,5hit
3,a55
4,anal
...,...
446,whore
447,willies
448,willy
449,xrated


In [20]:
data['Contains Bad Word'] = data.Comment.apply(lambda x: any(w in x for w in badlist[0]))

In [27]:
"Accuracy is true positives + true negatives divided by total number of cases"
true_positives = data[(data.Insult == 1) & (data['Contains Bad Word'] == True)].shape[0]
true_negatives = data[(data.Insult == 0) & (data['Contains Bad Word'] == False)].shape[0]
false_positives = data[(data.Insult == 0) & (data['Contains Bad Word'] == True)].shape[0]
false_negatives = data[(data.Insult == 1) & (data['Contains Bad Word'] == False)].shape[0]
accuracy = (true_positives + true_negatives) / len(data)
print(f"Accuracy: {accuracy:.2%}")
print("I mean, anything better than 50% is good, right? We can do better though...")

Accuracy: 65.59%
I mean, anything better than 50% is good, right? We can do better though...


In [30]:
total = len(data)
num_positive = true_positives + false_negatives
num_negative = total - num_positive

accuracy_pred0 = num_negative / total
accuracy_pred1 = num_positive / total

print("Naive classifier (always predicting 0): {:.2%} accuracy".format(accuracy_pred0))
print("Naive classifier (always predicting 1): {:.2%} accuracy".format(accuracy_pred1))
print("Ok, never mind, we can't do better than 50% accuracy. Let's just predict 1 all the time!!!")

Naive classifier (always predicting 0): 73.42% accuracy
Naive classifier (always predicting 1): 26.58% accuracy
Ok, never mind, we can't do better than 50% accuracy. Let's just predict 1 all the time!!!


In [31]:
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)
f1 = 2 * precision * recall / (precision + recall)
print(f"Precision: {precision:.2%}")
print(f"Recall: {recall:.2%}")
print(f"F1: {f1:.2%}")

Precision: 37.59%
Recall: 44.61%
F1: 40.80%


In [32]:
print("Ok, yes, this is bad.")

Ok, yes, this is bad.


### Learning bad words on the fly

Another way of doing this, is to learn the insulting words on the fly using `CountVectorizer`. 

Please refer to the scikit learn tutorial at 'http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html' if you need some help.

Here is what you need to do:

- Import `CountVectorizer` from `sklearn.feature_extraction.text`
- Train the `CountVectorizer` on the insults and create a feature set $X$ representing words in the comments
- Train `MultinomialNB` and `BernoulliNB` from `scikitsklearn`  on the new feature set $X$
- Using cross-validation, compute the accuracy, precision, recall, F1 and AUC of your model
- What is your verdict?

NOTE: The F1 score is another useful score to compute when one of the two classes is very rare. We didn't go over it in class but it's basically the harmonic mean between precision and recall and goes from 0 (min) to 1 (max).  You can see more here: 'https://en.wikipedia.org/wiki/F1_score' 

In [35]:
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import cross_val_score