# Hate speech classifier

### Training a classifier to recognise hate speech on Twitter

In this notebook, we are going to train a 'classifier' (a supervised machine learning algorithm) to recognise hate speech in a tweet. This is a technique of **Natural Language Processing (NLP)** similar to Sentiment Analysis.

This requires two phases:
- 1. **Train the classifier** (show the algorithm examples of both hateful and non-hateful tweets)
- 2. **Test the accuracy of the classifier**



First let's import the libraries we are going to need

In [1]:
import os
import pandas as pd
import re
import string
import numpy as np
import matplotlib
import operator 
import pickle

import nltk # nltk 3.4
print('The nltk version is {}.'.format(nltk.__version__))
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer 
from nltk import ngrams

from sklearn.utils import shuffle

The nltk version is 3.4.5.


[nltk_data] Downloading package stopwords to /Users/nick/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /Users/nick/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Tell python where it can find the data files (they're in a `data` sub-directory of the current folder)

In [2]:
#os.chdir("C:/Users/mednche/Desktop/Hate-speech-twitter-NLP/")
os.chdir(os.getcwd()+"/data/")
os.getcwd()

'/Users/nick/gp/n8-prp-ml-practicals/Hate-speech-classifier/data'

## Import pre-labeled Twitter dataset

Let's import the dataset used to train and test the classifier

In [3]:
data = pd.read_csv('TrainingTweets.csv', encoding='ISO-8859-1')
data.shape

(310, 2)

'data' a Pandas dataframe. Let's see what it looks like. 

In [4]:
# Display the last 5 rows of the dataframe
data.tail()

Unnamed: 0,tweet,class
305,well yes i mean you started off saying third l...,0
306,so my neighbours complained about my shed in t...,1
307,fucking fascist fucking liberal fucking racist...,1
308,fucking annoying when meat dairy and eggs are ...,0
309,i hate people i was wrong when i said 97.5 of ...,1


Note: you can see that each tweet has already been manually labeled in the column 'class'.
- 1 means hateful
- 0 means non-hateful

This label is essential for what we are going to do (supervised machine learning).

##   Ensure balance in dataset

Let's see how many hateful vs non-hateful tweets there are in the dataset

In [5]:
hate_tweets = data[data["class"] == 1]
print("{} hateful tweets".format(len(hate_tweets)))

nonhate_tweets = data[data["class"] == 0]
print("{} non-hateful tweets".format(len(nonhate_tweets)))

101 hateful tweets
209 non-hateful tweets


In [6]:
hate_tweets.head()

Unnamed: 0,tweet,class
0,Muslims go to fucking home bye bye #brexit',1
1,'Muslim scum terrorists',1
2,'Go home you immigrant',1
3,'Polish vermin',1
4,'Polish bastard',1


In [7]:
nonhate_tweets.head()

Unnamed: 0,tweet,class
59,'@16po @realDonaldTrump I don\'t_ terrorism is...,0
60,'I am absolutely dreading tomorrow. I hate it ...,0
61,'RT @washingtonpost: The Rev. William Barber d...,0
62,'RT @RedP1llReport: David Icke_ Political Corr...,0
63,'My best RTs this week came from: @SkimmySkinn...,0


There seems to be twice as many non-hateful tweets. This might affect the way the algorithm learns. We need to make the dataset balanced.

####  Select as many hateful as non-hateful tweets for an equal dataset

In [8]:
num = min(len(hate_tweets), len(nonhate_tweets))

# shuffle the table of hateful and non hateful tweets
hate_tweets = shuffle(hate_tweets)
nonhate_tweets = shuffle(nonhate_tweets)

data_balanced = hate_tweets[0:num].append(nonhate_tweets[0:num], ignore_index=True)

print('Number of tweets in balanced dataset: {}'.format(len(data_balanced)))

Number of tweets in balanced dataset: 202


In [9]:
data_balanced.head()

Unnamed: 0,tweet,class
0,what a stupid bitch,1
1,"""fucking faggot sitting in the cinema next to...",1
2,'my neighbour just called me a paki! some peop...,1
3,you fucking irish cow,1
4,'RT @nigel come out you black bastard',1


## Part I. Preprocessing

In this part we'll perform some preprocessing on the column 'tweet'. It will do things like turn the text all into lowercase, get rid of urls, remove usernames, etc. Have a look at the comments in the code below to see precisely what it's doing.

### Clean tweets

In [10]:
def cleanTweet(tweet):
    #Convert to lower case
    tweet = tweet.lower()
    #Convert www.* or https?://* to ''
    tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','',tweet)
    #Remove the RT before the @user 
    tweet = re.sub('rt','',tweet) 
    #Replace #word with word
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
    #Remove @username
    tweet = re.sub('@[^\s]+','',tweet) 
    #Remove additional white spaces
    tweet = re.sub('[\s]+', ' ', tweet)
    #Remove non ASCII characters (emojies)
    tweet= re.sub(r'[^\x00-\x7F]+','', tweet)
    #Remove punctuation 
    tweet = "".join(l for l in tweet if l not in string.punctuation)
    #Trim
    tweet = tweet.strip('\'"')
    #Remove beginning and end space
    tweet = tweet.strip()

    
    return tweet

#  apply cleaning function to each tweet of the pandas dataframe
data_balanced['tweet'] = data_balanced['tweet'].apply(cleanTweet)

In [11]:
data_balanced.head()

Unnamed: 0,tweet,class
0,what a stupid bitch,1
1,fucking faggot sitting in the cinema next to me,1
2,my neighbour just called me a paki some people...,1
3,you fucking irish cow,1
4,come out you black bastard,1


### Delete empty tweets

In [12]:
# replace empty tweets ('') by NA
data_balanced['tweet'].replace('', np.nan, inplace=True)
# Delete all NA rows
data_balanced.dropna(subset=['tweet'], inplace=True)

Note: How many empty tweets were removed in the process? (Hint: use the shape attribute of the pandas dataframe)

###  Tokenise

At the moment, the text of each tweet is a string. We would like to separate each word in that string so the model can 'read' them separately. 

In NLP, this is called 'tokenising': each tweet (intially a string of text) is chopped into a list of tokens (i.e. a list of words)

In [13]:
data_tokenised = data_balanced.copy()

data_tokenised['tweet'] = data_tokenised['tweet'].apply(nltk.word_tokenize)

In [14]:
data_tokenised.head()

Unnamed: 0,tweet,class
0,"[what, a, stupid, bitch]",1
1,"[fucking, faggot, sitting, in, the, cinema, ne...",1
2,"[my, neighbour, just, called, me, a, paki, som...",1
3,"[you, fucking, irish, cow]",1
4,"[come, out, you, black, bastard]",1


Note: you can see that there are many words in the tweets that don't bring any meaning such as 'it', 'i', 'of' 'to' etc. These are called stopwords and need to be removed so that the classifier can focus on words that matter when telling the difference between hate and non-hate.

### Remove stopwords

####  Import English stopwords

In [15]:
stops = set(stopwords.words('english'))
print(stops)

{'against', 'other', 'your', 'this', 'by', 'being', 't', 'at', 'and', 'does', 'some', 'has', 'on', 'of', 'be', 'mustn', 'both', 'do', 'the', 'yourself', 'that', 'our', 'than', 'up', 'ourselves', 'themselves', 'd', 'most', "mustn't", 'who', 'doing', 'for', "it's", 'what', 'then', 've', 'same', 'hadn', 'she', 'you', 'but', 'myself', 'o', 'herself', 'an', "she's", 'them', 'yours', 'just', "don't", 're', 'y', "isn't", 'to', 'll', "needn't", 'ma', 'were', 'isn', 'is', 'there', 'as', 'nor', 'doesn', "hadn't", 'aren', "shan't", "doesn't", 'where', 'weren', "wasn't", "wouldn't", 'all', 'shan', "that'll", 'hasn', 'its', "you're", 'because', 'ours', 'didn', 'about', 'each', 'will', 'mightn', 'below', "mightn't", 'between', 'too', 'through', 'which', 'can', 'ain', 'these', 'into', "couldn't", 'own', 'very', 'couldn', 'their', 'here', 'from', 'wasn', 'more', "hasn't", 'won', 'out', 'such', 'her', 'so', "should've", 'don', "aren't", 'if', 'how', "didn't", "won't", 'before', 'been', 'in', 'shouldn',

Some of these generic English stopwords could actually be useful in our context of hatespeech. For example 'them', 'out', 'off' could all be part of sentences like 'f**k off'. We will take these out of the list of stopwords. Also, we'll add some words to the stopword list based on some common spelling errors we observed in the tweets ('youre', 'dont', 'us').

In [16]:
# Remove these from stopwords
item_to_delete = ['you', 'out', 'off', 'them', 'themselves', 'yourself', 'from', 'same']
stopWords = [e for e in stops if e not in item_to_delete]

# Add these to stopwords
item_to_add = ["youre", "r", "you're", "us", "doesnt", "im", "hes", "u", "ya", "ww", 
               "dont", "https", "aint", "theres", "shouldnt", "thats", "amp", "wudnt", 
               "gonna", "ur", "cant"]
for e in item_to_add:
    stopWords.append(e)

print(sorted(stopWords))

['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'aint', 'all', 'am', 'amp', 'an', 'and', 'any', 'are', 'aren', "aren't", 'as', 'at', 'be', 'because', 'been', 'before', 'being', 'below', 'between', 'both', 'but', 'by', 'can', 'cant', 'couldn', "couldn't", 'd', 'did', 'didn', "didn't", 'do', 'does', 'doesn', "doesn't", 'doesnt', 'doing', 'don', "don't", 'dont', 'down', 'during', 'each', 'few', 'for', 'further', 'gonna', 'had', 'hadn', "hadn't", 'has', 'hasn', "hasn't", 'have', 'haven', "haven't", 'having', 'he', 'her', 'here', 'hers', 'herself', 'hes', 'him', 'himself', 'his', 'how', 'https', 'i', 'if', 'im', 'in', 'into', 'is', 'isn', "isn't", 'it', "it's", 'its', 'itself', 'just', 'll', 'm', 'ma', 'me', 'mightn', "mightn't", 'more', 'most', 'mustn', "mustn't", 'my', 'myself', 'needn', "needn't", 'no', 'nor', 'not', 'now', 'o', 'of', 'on', 'once', 'only', 'or', 'other', 'our', 'ours', 'ourselves', 'over', 'own', 'r', 're', 's', 'shan', "shan't", 'she', "she's", 'should', "sh

#### Remove stopwords from tweets

Now it's time to remove these stopwords from the tweets of our dataset

In [17]:
# make a copy of the processed data so far
data_tokenised_stpwd = data_tokenised.copy()

# Apply function that removes stopwords.
data_tokenised_stpwd['tweet'] = data_tokenised_stpwd['tweet'].apply(
    lambda x: [item for item in x if item not in stopWords])

In [18]:
data_tokenised_stpwd.head()

Unnamed: 0,tweet,class
0,"[stupid, bitch]",1
1,"[fucking, faggot, sitting, cinema, next]",1
2,"[neighbour, called, paki, people, respect]",1
3,"[you, fucking, irish, cow]",1
4,"[come, out, you, black, bastard]",1


Compare this list of tokens to the one we had prior to removing the stopwords.

### Stemming

In NLP, stemming is the process of turning words back into their stem, base or root form.

Examples:
- 'cats' --> 'cat'
- 'fishing', 'fished' --> 'fish'

This step is important so the classifier understands that the singular and the plural form of a noun carry a similar meaning.

In [19]:
# make a copy of the processed data so far
pre_processed_data = data_tokenised_stpwd.copy()

ps = PorterStemmer() 
pre_processed_data['tweet'] = pre_processed_data['tweet'].apply(
    lambda x: [ps.stem(word) for word in x])

In [20]:
pre_processed_data['tweet'].head()

0                            [stupid, bitch]
1          [fuck, faggot, sit, cinema, next]
2    [neighbour, call, paki, peopl, respect]
3                    [you, fuck, irish, cow]
4           [come, out, you, black, bastard]
Name: tweet, dtype: object

This is the end of the pre-processing of the 'tweet' column of the dataset. We now have tweets that have been cleaned, stemmed and tokenised.

For the model to learn anything, we need to give it a set of criteria to use in deciding whether a tweet is hateful or not. This kind of criteria is known as **feature**. We can define one or more feature(s) to train our classifier.

In Part 2., we'll see how to convert the words into features so that we can feed it to a classifier for training or inference.

____________________________________________________________

## Part 2. Prepare the data to train the classifier

###### What feature shall we give to the model?

We could give it a list of key words but text cannot be used by machine learning models. They expect their input to be numeric. So we need to transform words into numeric features in a meaningful way. 

To do so, we are going to set a list of words/features (called vocabulary) and provide the classifier with boolean values indicating whether each feature of the vocabulary is present or not.

It will look something like this:
- 'bastard' : True (present)
- 'road' : False (absent)
- etc...

### Vocabulary of features

Let's begin by creating a vocabulary of features: a set of all words in the dataset

In [21]:
vocab = [word for tweet in pre_processed_data['tweet'] for word in tweet]
print('Vocabulary size: {}'.format(len(vocab)))

Vocabulary size: 1286


In [22]:
print(vocab)

['stupid', 'bitch', 'fuck', 'faggot', 'sit', 'cinema', 'next', 'neighbour', 'call', 'paki', 'peopl', 'respect', 'you', 'fuck', 'irish', 'cow', 'come', 'out', 'you', 'black', 'bastard', 'brexit', 'time', 'fuck', 'off', 'you', 'polish', 'alien', 'fuck', 'nigger', 'fuck', 'off', 'back', 'europ', 'you', 'job', 'steal', 'immigr', 'you', 'muslim', 'prick', 'bastard', 'wog', 'road', 'annoy', 'fat', 'littl', 'wanker', 'you', 'problem', 'say', 'face', 'behind', 'back', 'faggot', 'fuck', 'paki', 'nigger', 'road', 'better', 'stop', 'shout', 'go', 'round', 'shut', 'them', 'least', 'nigger', 'lmfao', 'fuck', 'off', 'myriah', 'chavcentr', 'chav', 'central', 'hate', 'peopl', 'wrong', 'said', '975', 'peopl', 'cunt', 'much', 'much', 'higher', 'fuck', 'off', 'you', 'cunt', 'colour', 'fucker', 'leav', 'countri', 'shut', 'fuck', 'shut', 'nigger', 'whore', 'hope', 'get', 'rape', 'one', 'anim', 'might', 'chang', 'tune', 'kill', 'you', 'you', 'paki', 'bastard', 'enough', 'fuck', 'paki', 'man', 'time', 'leav'

This vocabulary contains a list of all unique words in our pre-processed tweets. You'll notice some of the words don't look very english. It's because they are the stem of the initial word (recall the stemming process).

The notion of hate in the English language is more complex than just the presence of a word. Sometimes it's the combination of 2 or more words that becomes hateful. For example 'shut up', 'f**k off' or 'send them home'. 

In NLP, these combinations of 2 or more words are called ngrams:
- bigram: ('back', 'off')
- trigram: ('send', 'them', 'home')

We need to add bigrams and trigrams to our vocabulary of features alongside single words/features.

In [23]:
def get_vocabulary(tweets):
    all_words = []
    for word_list in tweets:
        # unigrams
        all_words.extend(word_list)
        
        # bigrams
        bigrams = list(ngrams(word_list, 2))
        
        #trigrams 
        trigrams = list(ngrams(word_list, 3))
        
        all_words.extend(bigrams)
        all_words.extend(trigrams)
    
    return all_words

vocab = get_vocabulary(pre_processed_data['tweet'])
print('Vocabulary size: {}'.format(len(vocab)))

Vocabulary size: 3261


In [24]:
print(vocab)

['stupid', 'bitch', ('stupid', 'bitch'), 'fuck', 'faggot', 'sit', 'cinema', 'next', ('fuck', 'faggot'), ('faggot', 'sit'), ('sit', 'cinema'), ('cinema', 'next'), ('fuck', 'faggot', 'sit'), ('faggot', 'sit', 'cinema'), ('sit', 'cinema', 'next'), 'neighbour', 'call', 'paki', 'peopl', 'respect', ('neighbour', 'call'), ('call', 'paki'), ('paki', 'peopl'), ('peopl', 'respect'), ('neighbour', 'call', 'paki'), ('call', 'paki', 'peopl'), ('paki', 'peopl', 'respect'), 'you', 'fuck', 'irish', 'cow', ('you', 'fuck'), ('fuck', 'irish'), ('irish', 'cow'), ('you', 'fuck', 'irish'), ('fuck', 'irish', 'cow'), 'come', 'out', 'you', 'black', 'bastard', ('come', 'out'), ('out', 'you'), ('you', 'black'), ('black', 'bastard'), ('come', 'out', 'you'), ('out', 'you', 'black'), ('you', 'black', 'bastard'), 'brexit', 'time', 'fuck', 'off', 'you', 'polish', 'alien', ('brexit', 'time'), ('time', 'fuck'), ('fuck', 'off'), ('off', 'you'), ('you', 'polish'), ('polish', 'alien'), ('brexit', 'time', 'fuck'), ('time',

Note: some of the tokens are duplicates. This is  because either (1) they are repeated within a tweet or (2) they are present in multiple tweets. 

Don't worry though, we'll get unique features out of it soon, before training the classifier! 

But first, let's look at how frequent each of the features of the vocabulary is in our dataset.

### Most frequent features

Recall that features are tokens (unigrams) or combination of tokens (ngrams).

Let's have a look at how frequent each feature is in the dataset. (_If you can't see the graph, try running the code chunk again_).

In [25]:
fd = nltk.FreqDist(vocab)
fd.plot(20, cumulative=True)
#fd.xlabel('Most common features')

<Figure size 640x480 with 1 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1a1cdbc510>

In [26]:
print(fd.most_common(50))

[('fuck', 58), ('you', 54), ('off', 32), ('go', 20), ('hate', 17), ('paki', 15), (('fuck', 'off'), 15), ('bastard', 14), ('get', 14), ('bitch', 12), ('nigger', 12), ('immigr', 12), ('like', 11), ('polish', 10), ('cunt', 10), ('leav', 10), ('countri', 10), ('out', 9), ('back', 9), ('home', 9), ('faggot', 8), ('peopl', 8), ('terrorist', 8), ('muslim', 7), ('much', 7), ('mani', 7), ('work', 7), ('black', 6), (('off', 'you'), 6), (('fuck', 'off', 'you'), 6), ('road', 6), ('time', 5), ('job', 5), ('make', 5), ('even', 5), ('daddi', 5), (('you', 'fuck'), 4), ('brexit', 4), (('fuck', 'paki'), 4), ('shut', 4), ('them', 4), ('one', 4), ('take', 4), ('scum', 4), ('today', 4), ('white', 4), ('need', 4), ('day', 4), ('look', 4), ('love', 4)]


Is that what you expected?

### Feature selection

We're now going to select a sample of this vocabulary of features. We only want to keep the features that truely matter in identifying hate speech.

This step is important to reduce the  running time of our model as well as improve its accuracy.

#### Keeping the 50 most frequent words

Some rare features are only present in one or two tweet. We know that these are not going to be very useful to teach the model to recognise hate speech.  

Let's only keep the top 50 most frequent features in the dataset.

In [27]:
def get_word_features(wordlist, n):
    fd = nltk.FreqDist(wordlist)
    
    word_features = sorted(fd.items(), key=operator.itemgetter(1), reverse=True)[0:n] 
    word_features = [i[0] for i in word_features ]
    return word_features

# Only keep the top 50 most frequent words
chosen_features = get_word_features(vocab, 50)
print('Number of chosen features: {}/{}'.format(len(chosen_features), len(vocab)))

Number of chosen features: 50/3261


In [28]:
print(chosen_features[0:50])

['fuck', 'you', 'off', 'go', 'hate', 'paki', ('fuck', 'off'), 'bastard', 'get', 'bitch', 'nigger', 'immigr', 'like', 'polish', 'cunt', 'leav', 'countri', 'out', 'back', 'home', 'faggot', 'peopl', 'terrorist', 'muslim', 'much', 'mani', 'work', 'black', ('off', 'you'), ('fuck', 'off', 'you'), 'road', 'time', 'job', 'make', 'even', 'daddi', ('you', 'fuck'), 'brexit', ('fuck', 'paki'), 'shut', 'them', 'one', 'take', 'scum', 'today', 'white', 'need', 'day', 'look', 'love']


### Create input data for classifier

So far, we have chosen a sample of features that we think are important for the model to learn to identify hateful speech.

However, at this stage the classfier won't be able to know which features are responsible for a tweet being labelled as 'hateful'. Is it because of the word 'road' or the word 'bastard' in that tweet?

To be able to learn what counts as hateful and what doesn't, the classifier needs to know the 'hateful value' of each feature in the vocabulary.

In short, we need to tell the model:
- which features are typically present in hateful tweets and which are not,
- which features are typically present in non-hateful tweets and which are not.

This precious information is available in our dataset because it has been manually labelled. So far we have not used the 'class' column in our dataset. We are now going to make use of it!

The idea is to tell the model: 
- for each hateful tweet: these are the features present, and the ones not present. 
- for each non-hateful tweet: these are the features present, and the ones not present.

Let's extract the features present in each tweet:

In [29]:
def extract_features(document):
    document_words = set(document)
    feature_set = {}
    for feature in chosen_features:
        feature_set['contains({})'.format(feature)] = (feature in document_words)
    return feature_set

tweets = [tuple(x) for x in pre_processed_data.values]

feature_set = nltk.classify.apply_features(extract_features, tweets)

In [30]:
print('Number of tweets in training_set: {}'.format(len(feature_set)))

Number of tweets in training_set: 201


Lets' look at the first tweet. Notice at the end, we see that this is a hateful tweet (label = 1).

In [31]:
print(feature_set[0])

({'contains(fuck)': False, 'contains(you)': False, 'contains(off)': False, 'contains(go)': False, 'contains(hate)': False, 'contains(paki)': False, "contains(('fuck', 'off'))": False, 'contains(bastard)': False, 'contains(get)': False, 'contains(bitch)': True, 'contains(nigger)': False, 'contains(immigr)': False, 'contains(like)': False, 'contains(polish)': False, 'contains(cunt)': False, 'contains(leav)': False, 'contains(countri)': False, 'contains(out)': False, 'contains(back)': False, 'contains(home)': False, 'contains(faggot)': False, 'contains(peopl)': False, 'contains(terrorist)': False, 'contains(muslim)': False, 'contains(much)': False, 'contains(mani)': False, 'contains(work)': False, 'contains(black)': False, "contains(('off', 'you'))": False, "contains(('fuck', 'off', 'you'))": False, 'contains(road)': False, 'contains(time)': False, 'contains(job)': False, 'contains(make)': False, 'contains(even)': False, 'contains(daddi)': False, "contains(('you', 'fuck'))": False, 'conta

The method is pretty simple. For each tweet, we are looping through our 50 chosen_features and setting a boolean to True if the tweet contains that feature, False otherwise. 

This format is what the classifier needs as input. It is a series of 0s and 1s (numerical) as opposed to text data that they cannot understand.

We can now train the classifier with this training_set!

_________________________________________

## Part 3. Train the classifier

### Split data into train vs test datasets

We want to train the classifer and then test its classifying ability on a brand new dataset that it has never seen before. 

Generally, a 80/20 ratio is a fair split between training and testing set:
- training dataset (80% of the data)
- testing dataset (20% of the data)

Sklearn provides a function called train_test_split to do this easily. Let's split our feature_set into train_data and test_data:

In [32]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(feature_set, test_size=0.20, train_size=0.80)
print('Number of tweets in train data: {}'.format(len(train_data)))
print('Number of tweets in test data: {}'.format(len(test_data)))

Number of tweets in train data: 160
Number of tweets in test data: 41


### Train the model

There are many different types of model to use for classifying text data. The most common one is called Naive Bayesion Classifier and that is the one we are going to use here.

In [33]:
# Naive Bayesian
classifier1 = nltk.NaiveBayesClassifier.train(train_data)
# SHOW FEATURES
classifier1.show_most_informative_features(10)

# Save the model into a pickle file
f = open('classifier.pickle', 'wb')
pickle.dump(classifier1, f)
f.close()

Most Informative Features
          contains(fuck) = True                1 : 0      =      6.7 : 1.0
          contains(hate) = True                1 : 0      =      4.3 : 1.0
          contains(leav) = True                0 : 1      =      3.9 : 1.0
        contains(immigr) = True                1 : 0      =      3.2 : 1.0
           contains(get) = True                0 : 1      =      2.9 : 1.0
        contains(polish) = True                1 : 0      =      2.8 : 1.0
         contains(black) = True                1 : 0      =      2.8 : 1.0
          contains(much) = True                0 : 1      =      2.5 : 1.0
         contains(today) = True                0 : 1      =      2.5 : 1.0
          contains(look) = True                0 : 1      =      2.5 : 1.0


That's it! The model has been trained on the train_data.

We can see which features the model considers important to decide between hateful speech and non-hateful speech.

- Column 3 shows the ratio of occurence of each informative feature in both categories (hate vs nonhate).
- Column 2 shows the direction of the ratio (which label occurs more frequently). Hate is 1, non-hate is 0. The label on the left is the label most associated with the corresponding feature.

For example, tweets containing the word 'immigrants' are <span style="color:red">5.7 times</span> more likely to be hateful than not.

Now let's test the accuracy of our model on the test_data that we set aside earlier. These are tweets that the model has never seen before. We'll ask the model to classify them and see how its outcome compares with the true label of the tweet.

### Test the classifier

In [34]:
accuracy =  nltk.classify.util.accuracy(classifier1, test_data)
accuracy

0.8536585365853658

## Part 5. Use the classifier to identify hateful speech

Now try our classifier on a new tweet of your choice. First we need to preprocess the tweet (clean, tokenize, stem and remove stopwords). Then we need to extract its features to look like the right input for the classifier.

In [35]:
testTweet = 'Hello world!'

In [36]:
# Prepare the tweet
def preprocessTweet(tweet):
    
    # clean the tweet
    tweet = cleanTweet(testTweet)
    
    # tokenize the cleaned tweet
    tokenised_tweet = nltk.word_tokenize(tweet)
    
    # remove stop words
    tokenised_tweet_stpwd = [item for item in tokenised_tweet if item not in stopWords]
    
    # stem
    pre_processed_tweet = [ps.stem(word) for word in tokenised_tweet_stpwd]
    
    print('Preprocessed tweet: {}'.format(pre_processed_tweet))
    
    return pre_processed_tweet

preprocessed_tweet = preprocessTweet(testTweet)

Preprocessed tweet: ['hello', 'world']


In [37]:
# extract features
tweet_feature_set = extract_features(preprocessed_tweet) 
print(tweet_feature_set)

{'contains(fuck)': False, 'contains(you)': False, 'contains(off)': False, 'contains(go)': False, 'contains(hate)': False, 'contains(paki)': False, "contains(('fuck', 'off'))": False, 'contains(bastard)': False, 'contains(get)': False, 'contains(bitch)': False, 'contains(nigger)': False, 'contains(immigr)': False, 'contains(like)': False, 'contains(polish)': False, 'contains(cunt)': False, 'contains(leav)': False, 'contains(countri)': False, 'contains(out)': False, 'contains(back)': False, 'contains(home)': False, 'contains(faggot)': False, 'contains(peopl)': False, 'contains(terrorist)': False, 'contains(muslim)': False, 'contains(much)': False, 'contains(mani)': False, 'contains(work)': False, 'contains(black)': False, "contains(('off', 'you'))": False, "contains(('fuck', 'off', 'you'))": False, 'contains(road)': False, 'contains(time)': False, 'contains(job)': False, 'contains(make)': False, 'contains(even)': False, 'contains(daddi)': False, "contains(('you', 'fuck'))": False, 'conta

In [38]:
# Classify
verdict = classifier1.classify(tweet_feature_set)

if verdict == 0:
    print('Not hateful')
else:
    print('Hateful')



Not hateful


### Activity: try it!

Make some of your own tweets and see whether or not they are hateful.

Edit this code chunk:

In [39]:
testTweet = "This is a test of hatefullness"

And then run this one:

In [41]:
preprocessed_tweet = preprocessTweet(testTweet)
tweet_feature_set = extract_features(preprocessed_tweet) 
verdict = classifier1.classify(tweet_feature_set)

if verdict == 0:
    print('Not hateful')
else:
    print('Hateful with verdict '+str(verdict))

Preprocessed tweet: ['test', 'hateful']
Not hateful
