# Toxic Comment Classification Challenge

## Version 1.0

Have you ever experienced or read toxic comments when navigating on Twitter or looking at a video on Youtube ? 

>"Your work is bullshit"

or 

>"You should kill yourself"

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

The Conversation AI team, a research initiative founded by Jigsaw and Google (both a part of Alphabet) are working on tools to help improve online conversation. One area of focus is the study of negative online behaviors, like toxic comments (i.e. comments that are rude, disrespectful or otherwise likely to make someone leave a discussion). So far they’ve built a range of publicly available models served through the Perspective API, including toxicity. But the current models still make errors, and they don’t allow users to select which types of toxicity they’re interested in finding (e.g. some platforms may be fine with profanity, but not with other types of toxic content).

* In this competition, **the aim is to build a multi-headed model that’s capable of detecting different types of  toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s current models**. 

The dataset of comments is from Wikipedia’s talk page edits. Improvements to the current model will hopefully help online discussion become more productive and respectful.

Let's get started by loading the package we are going to use. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_auc_score
from sklearn.multiclass import OneVsRestClassifier
from sklearn.pipeline import Pipeline
from nltk.corpus import stopwords
from prettytable import PrettyTable

%matplotlib inline

For this study, the english stop-words will be used.

In [2]:
stop_words = set(stopwords.words('english'))

<a id='TOC'></a>

## Table of contents
1. [Introduction](#first-bullet)  
    1.1 [Dataset of comments](#first-bullet)  
    1.2 [Overview of the Toxic Comments Classifier V1](#second-bullet)  
2. [Preprocessing](#third-bullet)  
3. [Implement the multi-label classifier](#fourth-bullet)  
    3.1 [Training](#fourth-bullet)  
    3.2 [Predictions on the test set](#fifth-bullet)  

## 1 - Introduction <a class="anchor" id="first-bullet"></a>

### 1.1 - Dataset of comments

Let's start by importing and exploring the dataset.

The dataset (X, Y) is quite large :

* X contains 159571 sentences (strings)
* Y contains 6 independent binary labels corresponding to the type of the threat

Let's load the dataset using the code below. The dataset is already split between training (159571 examples) and testing (153164 examples).

In [3]:
train = pd.read_csv('../data/train.csv', index_col=0)
X_train = train['comment_text']
y_train = train.drop('comment_text', axis=1)
X_test = pd.read_csv('../data/test.csv', index_col=0)

The types of the threat are the following :

In [4]:
print("-- Type of threat --")
for i in range(6):
    print(list(y_train.columns)[i])

-- Type of threat --
toxic
severe_toxic
obscene
threat
insult
identity_hate


The following cell print examples of sentences from X_train corresponding to toxic labels.

In [5]:
for idx in np.where(y_train['toxic'])[0][:5]:
    print(X_train[idx])
    print('-------------------------------')

COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK
-------------------------------
Hey... what is it..
@ | talk .
What is it... an exclusive group of some WP TALIBANS...who are good at destroying, self-appointed purist who GANG UP any one who asks them questions abt their ANTI-SOCIAL and DESTRUCTIVE (non)-contribution at WP?

-------------------------------
Bye! 

Don't look, come or think of comming back! Tosser.
-------------------------------
You are gay or antisemmitian? 

Archangel WHite Tiger

Meow! Greetingshhh!

Uh, there are two ways, why you do erased my comment about WW2, that holocaust was brutally slaying of Jews and not gays/Gypsys/Slavs/anyone...

1 - If you are anti-semitian, than shave your head bald and go to the skinhead meetings!

2 - If you doubt words of the Bible, that homosexuality is a deadly sin, make a pentagram tatoo on your forehead go to the satanistic masses with your gay pals!


Beware of the Dark Side!
-------------------------------
FUCK YOUR FILTHY MOTHER I

In [6]:
idx = 300
print(f"Label index 'Toxic' in one-hot encoding format is {list(y_train.iloc[idx])}")

Label index 'Toxic' in one-hot encoding format is [1, 0, 0, 0, 0, 0]


The specificity of our dataset is that a comment could have multiple labels. For example, a comment could be *toxic* but also *insane*.

Back to [table of contents](#TOC)


### 1.2 - Overview of the Toxic Comments Classifier V1 <a class="anchor" id="second-bullet"></a>

#### Inputs and outputs
* The input of the model is a string corresponding to a sentence (e.g. "Beware of the Dark Side!"). 
* The output will be a probability vector of shape (1,6), (there are 6 possible types of comments to choose from)

#### Preprocessing

* Clean the comments removing or replacing useless word and contractions
* Score the relative importance of words using TF-IDF for each comment (this part will be included in the pipeline)

For each document, the number of times a word appears in it divided by the total number of words in the document is computed. Every document has its own term frequency.

>Term-frequency (TF)
$$tf_{i,j}=\frac{n_{i,j}}{\sum_{k} n_{i,j} }$$

Then, the IDF (Inverse Data Frequency) is computed, i.e. the log of the number of documents divided by the number of documents that contain the word w. Inverse data frequency determines the weight of rare words across all documents in the corpus.

>Inverse Data Frequency (IDF)
$$idf(w)=log(\frac{N}{df_t})$$

The IDF is computed once for all documents.

Lastly, the TF-IDF is simply the TF multiplied by IDF.

$$w_{i,j}=tf_{i,j}*log(\frac{N}{df_i})$$

The TF-IDF scores are computed for all the words in the corpus.

#### Multi-label classifier

* The strategy used here consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. This method is implemented in the ```OneVsRestClassifier``` class in **sklearn**. 
* The chosen estimator for the example is the multinomial Naive Bayes classifier (```MultinomialNB```) but others could be tested and compare.
* The classifier is evaluated using the accuracy score computed on a validation set. 

#### Subsmission

Back to [table of contents](#TOC)

## 2 - Preprocessing <a class="anchor" id="third-bullet"></a>

As explained in the overview, the first step is to clean the comments removing or replacing certain expressions.

Here is a comment :

```
Sorry if the word 'nonsense' was offensive to you. Anyway, I'm not intending to write anything in the article(wow they would jump on me for vandalism), I'm merely requesting that it be more encyclopedic so one can use it for school as a reference. I have been to the selective breeding page but it's almost a stub. It points to 'animal breeding' which is a short messy article that gives you no info. There must be someone around with expertise in eugenics? 93.161.107.169
```

Different contractions could be noticed here. For instance, "$I'm$" is the contraction of "$I am$". In this form, it could be considered as a different word, what we don't want.

The first step is to implement a function that will carry this out. Here are the different steps :

* Convert every sentence to lower-case
* Remove the unwanted expressions with regular expression operations
* Remove characters from both left and right based on the argument

These steps are implemented in the ```clean_one_comment()``` function below.

In [7]:
def clean_one_comment(comment):
    """
    Clean a sentence (string) by converting it into lower case, removing or replacing unwanted expressions 
    and removing characters from both left and right.
    
    Arguments:
    comment -- string, one training example from X
    
    Returns:
    cleaned_comment -- cleaned string
    """
    
    # Step 1: Transform the string into lower case words
    comment_lower = comment.lower()
    
    # Step 2: Remove or replace the unwanted expressions (more could be added)
    comment_inter = re.sub(r"i'm", "i am ", comment_lower)
    comment_inter = re.sub(r"what's", "what is ", comment_inter)
    comment_inter = re.sub(r"\'ve", " have ", comment_inter)
    comment_inter = re.sub(r"\'s", " ", comment_inter)
    comment_inter = re.sub(r"can't", "can not ", comment_inter)
    comment_inter = re.sub(r"n't", " not ", comment_inter)
    comment_inter = re.sub(r"\'re", " are ", comment_inter)
    comment_inter = re.sub(r"\'d", " would ", comment_inter)
    comment_inter = re.sub(r"\'ll", " will ", comment_inter)
    comment_inter = re.sub(r"\'scuse", " excuse ", comment_inter)
    comment_inter = re.sub('\W', ' ', comment_inter)
    comment_inter = re.sub('\s+', ' ', comment_inter)
    # ...
    
    # Step 3: Remove characters from both left and right 
    cleaned_comment = comment_inter.strip()
    
    return cleaned_comment

In [8]:
clean_comment = clean_one_comment(X_train[8])
print("Clean comment = \n", clean_comment)

Clean comment = 
 sorry if the word nonsense was offensive to you anyway i am not intending to write anything in the article wow they would jump on me for vandalism i am merely requesting that it be more encyclopedic so one can use it for school as a reference i have been to the selective breeding page but it almost a stub it points to animal breeding which is a short messy article that gives you no info there must be someone around with expertise in eugenics 93 161 107 169


**Expected Output**:

```
Clean comment = 
 sorry if the word nonsense was offensive to you anyway i am not intending to write anything in the article wow they would jump on me for vandalism i am merely requesting that it be more encyclopedic so one can use it for school as a reference i have been to the selective breeding page but it almost a stub it points to animal breeding which is a short messy article that gives you no info there must be someone around with expertise in eugenics 93 161 107 169
```

Let's apply this function on all the comments column.

In [9]:
X_train = X_train.apply(lambda x: clean_one_comment(x))

Back to [table of contents](#TOC)

## 3 - Implement the multi-label classifier <a class="anchor" id="fourth-bullet"></a>

### 3.1 - Training

Let's now implement the ```multi_head_model()``` function. The steps of this function are the following :
1. Extract the different Y labels into a list.
2. Define the pipeline.
    * Using the ```TfidfVectorizer``` function and the ```OneVsRestClassifier``` function.
    * The model used here for the example is a Naive Bayes classifier.
3. For each label, fit the pipeline to the training set and test it on the validation set.
4. Store the result in a table using ```PrettyTable```.

In [10]:
def multi_head_model(X, y):
    """
    Model to train a multi-label classifier.

    Arguments:
    X -- input data, numpy array of sentences as strings, of shape (m, 1)
    Y -- labels, binary numpy arrays of shape (m, p)

    Returns:
    pred -- vector of predictions, numpy-array of shape (m, p)
    """

    # Step 1: Extract the different Y labels into a list
    labels = list(y.columns)

    # Step 2: Define the pipeline (with TfidfVectorizer and one classifier with OneVsRestClassifier)
    pipeline = Pipeline([
        ('tfidf', TfidfVectorizer(stop_words=stop_words)),
        ('clf', OneVsRestClassifier(MultinomialNB(
            fit_prior=True, class_prior=None))),
    ])

    # Initialize the dictionary to store each trained classifier
    trained_classifier = dict()
    
    # Initialize the list to store the accuracy
    auc_classifier = list()

    # Step 3: For each label, fit the pipeline to the training set and test it on the validation set
    for label in labels:
        X_train, X_valid, y_train, y_valid = train_test_split(X, y[label], random_state=42, test_size=0.2, shuffle=True)
        print('... Training for label: {}'.format(label))
        pipeline.fit(X_train, y_train)
        prediction = pipeline.predict(X_valid)
        print('Validation AUC is {}'.format(roc_auc_score(y_valid, prediction)))
        trained_classifier.update({label: pipeline})
        auc_classifier.append(roc_auc_score(y_valid, prediction))
        
    # Step 4: Store the result in a table
    result_table = PrettyTable()
    column_names = ["Threat", "AUC"]
    result_table.add_column(column_names[0], labels)
    result_table.add_column(column_names[1], acc_classifier) 

    return trained_classifier, result_table

In [11]:
trained_classifier, result_table = multi_head_model(X_train, y_train)

... Training for label: toxic
Validation accuracy is 0.5892631582623512
... Training for label: severe_toxic
Validation accuracy is 0.4999841742102931
... Training for label: obscene
Validation accuracy is 0.5568016430791806
... Training for label: threat
Validation accuracy is 0.5067567567567568
... Training for label: insult
Validation accuracy is 0.5250434334862517
... Training for label: identity_hate
Validation accuracy is 0.4999841877233484


In [18]:
#print(result_table.get_html_string())

<table>
    <tr>
        <th>Threat</th>
        <th>AUC</th>
    </tr>
    <tr>
        <td>toxic</td>
        <td>0.5892631582623512</td>
    </tr>
    <tr>
        <td>severe_toxic</td>
        <td>0.4999841742102931</td>
    </tr>
    <tr>
        <td>obscene</td>
        <td>0.5568016430791806</td>
    </tr>
    <tr>
        <td>threat</td>
        <td>0.5067567567567568</td>
    </tr>
    <tr>
        <td>insult</td>
        <td>0.5250434334862517</td>
    </tr>
    <tr>
        <td>identity_hate</td>
        <td>0.4999841877233484</td>
    </tr>
</table>

The results are not good.

Back to [table of contents](#TOC)

### 3.2 - Predictions on the test set <a class="anchor" id="fifth-bullet"></a>

Let's now implement the function ```predict_multi_head``` to predict the labels of the comments in the test set. 

In [13]:
def predict_multi_head(X_test, set_of_classifiers):
    """
    Predict a threat for the comments in the test set.

    Arguments:
    X -- input data, numpy array of sentences as strings, of shape (m, 1)
    Y -- labels, binary numpy arrays of shape (m, p)

    Returns:
    predictions -- vector of predictions, numpy-array of shape (m, p)
    """
    
    # Step 1: Get the labels for the dictionary keys (set_of_classifiers)
    labels = list(set_of_classifiers.keys())
    
    # Initialize the array to append
    predictions = np.empty((0, len(X_test.comment_text)))
    
    # Step 2: For each label, predict the labels
    for label in labels:
        pred = set_of_classifiers[label].predict_proba(X_test.comment_text)[:,1]
        predictions = np.append(predictions, [pred], axis=0)
    
    return predictions

In [14]:
predictions = predict_multi_head(X_test, trained_classifier)

Let's store the result into a csv file.

In [16]:
submission = pd.DataFrame(predictions.T, columns=y_train.columns, index=X_test.index).reset_index()
submission.to_csv(path_or_buf="../submissions/submission_file.csv", sep=",", index=False)

<table>
    <tr>
        <th>Final score</th>
    </tr>
    <tr>
        <td>-</td>
    </tr>
</table>

So, this is clearly not the best methodology as people on the leaderboard achieve much higher scores (~0.98). 

>Let's try other classical machine learning and deep learning algorithms.

Back to [table of contents](#TOC)

**Note**

* Cleaning the comments doesn't change anything to the validation accuracy.