# Abusive Language Detection Model
By Ezra Abah

<hr>

## Table of Content
<ol>
    <li>Problem Definition</li>
<li>Data Discovery
    <ul>
        <li> About Data </li>
        <li> Data Exploration </li>
        <li> Text Cleaning</li>
    </ul>
</li>
<li> Pre-processing:Onehot encoding and word embedding
    <ul>
        <li> Label - Onehot Encoding </li>
        <li> Text - Tokenization </li>
        <li> GloVe word Embedding</li>
    </ul>
</li>

<li>Model Development
    <ul>
        <li> Model architecture</li>
        <li> Model training</li>
     </ul>
</li>
<li>Model evaluation and report</li>
</ol>


## 1. Problem Definition
The objective of this project is to  develop and evaluate abusive language detection models for the given dataset. To do this, this project sets off in section 2. by importing, exploring and cleaning the data. In section 3, the data is preprocessed for modelling. The modelling section, section 4, starts by creating a baseline model afterwhich subsequent models are trained, varying hyperparameters. Finally in section 5, the trained models are evaluated using accuracy, precision, F-1 score and Recall and the optimum hyperparameter is selected.


## 2. Data Set

### 2.1 About Data
This dataset, "agr_en_train.csv" is a Comma Separated Variable (.csv) file which contains about 12,000 observations and 3 columns namely: unique_id,text and aggression-level.

<ol>
    <li><b>unique_id</b>: This column contains unique id's for the individual observations</li>
    <li><b>text</b>: The text column contains texts collected from social media.</li>
    <li> <b>aggression-level</b>: This measures the level of aggression in the text. There are three aggression levels:
<ul>
        <li>'Overtly Aggressive’ denoted as OAG </li>
        <li>‘Covertly Aggressive’ denoted as CAG </li>
        <li>‘Non-aggressive’ denoted as NAG</ul></li>
    
</ol>



The full dataset (both the train and test sets) is licensed under Creative Commons Non-Commercial Share-Alike 4.0 licence CC-BY-NC-SA 4.0

### 2.2 Data Exploration

In [1]:
#Load Dataset

import numpy as np
import pandas as pd
df=pd.read_csv("agr_en_train.csv", names=["index","text","aggression_level"])

In [2]:
#view first 10 rows
df.head(10)

Unnamed: 0,index,text,aggression_level
0,facebook_corpus_msr_1723796,Well said sonu..you have courage to stand agai...,OAG
1,facebook_corpus_msr_466073,"Most of Private Banks ATM's Like HDFC, ICICI e...",NAG
2,facebook_corpus_msr_1493901,"Now question is, Pakistan will adhere to this?",OAG
3,facebook_corpus_msr_405512,Pakistan is comprised of fake muslims who does...,OAG
4,facebook_corpus_msr_1521685,"??we r against cow slaughter,so of course it w...",NAG
5,facebook_corpus_msr_462570,Wondering why Educated Ambassador is strugglin...,CAG
6,facebook_corpus_msr_465051,How does inflation react to all the after shoc...,NAG
7,facebook_corpus_msr_450994,Not good job.....this guis creating a problem ...,CAG
8,facebook_corpus_msr_326287,This is a false news Indian media is simply mi...,NAG
9,facebook_corpus_msr_430450,"no permanent foes, no permanent friends. inter...",NAG


In [3]:
#Observe size of dataset
shape = df.shape
print("The size of this data set is "+ str(shape))
df.groupby('aggression_level').count()

The size of this data set is (11999, 3)


Unnamed: 0_level_0,index,text
aggression_level,Unnamed: 1_level_1,Unnamed: 2_level_1
CAG,4240,4240
NAG,5051,5051
OAG,2708,2708


Dataset is large and will require a lot of time and computational power. To save time and memory, 3000 random sample data points are selected for this project. To equally represent each class 1000 data points was used from each aggressive level.

In [4]:
#sample of 3000 datapoints selected and stored as a dataframe named 'data'

#select 1000 datapoints from each category
data_CAG=df[df['aggression_level'] == "CAG"].sample(n=1000, replace=False, random_state=5)
data_NAG=df[df['aggression_level'] == "NAG"].sample(n=1000, replace=False, random_state=5)
data_OAG=df[df['aggression_level'] == "OAG"].sample(n=1000, replace=False, random_state=5)

#Joining all data sets into one
data=pd.concat([data_CAG,data_NAG,data_OAG],sort=False) #Join into one dataset "data"

data.reset_index(drop=True,inplace=True) #used to reset the index
print("The size of this data set is "+ str(data.shape))
data.groupby('aggression_level').count()

The size of this data set is (3000, 3)


Unnamed: 0_level_0,index,text
aggression_level,Unnamed: 1_level_1,Unnamed: 2_level_1
CAG,1000,1000
NAG,1000,1000
OAG,1000,1000


In [5]:
#view datatypes of individual columns

types = data.dtypes
print(types)

index               object
text                object
aggression_level    object
dtype: object


In [6]:
# Check for empty cells
data.isnull().sum()

index               0
text                0
aggression_level    0
dtype: int64

In [7]:
# Check for the lenght of the longest text
maxLen = len(max(data['text'], key=len).split(" "))
print("The longest text contains "+ str(maxLen)+" word(s).")

The longest text contains 716 word(s).


In [8]:
# Check for the lenght of the shortest text
minLen = len(min(data['text'], key=len).split())
print("The shortest text contains "+ str(minLen)+" word(s).")

The shortest text contains 1 word(s).


### 2.3 Text Cleaning

In [9]:
#Remove Punctuation

data['text'] = data['text'].str.replace('[^\w\s]','')
data['text'].head()

0    When they are crossing the lines over humanity...
1    Demonetization was a good step by BJP but not ...
2    who will take a right decision when a common p...
3    In 2015 the educated people voted for educated...
4    Still waiting for Rahul Gandhis grand expose o...
Name: text, dtype: object

In [10]:
# Change everything to Lowercase

data['text'] = data['text'].apply(lambda x: " ".join(x.lower() for x in x.split()))
data['text'].head()

0    when they are crossing the lines over humanity...
1    demonetization was a good step by bjp but not ...
2    who will take a right decision when a common p...
3    in 2015 the educated people voted for educated...
4    still waiting for rahul gandhis grand expose o...
Name: text, dtype: object

In [11]:
#Remove stopwords using nltk

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')
data['text'] = data['text'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))
data['text'].head()


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Admin\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


0    crossing lines humanity makes scarecrow way hu...
1            demonetization good step bjp planned well
2    take right decision common people purchase dru...
3    2015 educated people voted educated minister i...
4    still waiting rahul gandhis grand expose pm sp...
Name: text, dtype: object

In [12]:
# Lemmatization

from textblob import Word
data['text'] = data['text'].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
data['text'].head()

0    crossing line humanity make scarecrow way huma...
1            demonetization good step bjp planned well
2    take right decision common people purchase dru...
3    2015 educated people voted educated minister i...
4    still waiting rahul gandhi grand expose pm spe...
Name: text, dtype: object

## 3.0 Data Pre-processing
To pre-process the data for modelling, <br>
<ul>
    <li>Aggression levels will be encoded using one hot encoding</li>
    <li>Texts will be tokenized, padded and individual words will be embedded</li>
</ul>

In [13]:
#Encode aggression levels using sklearn's onehot encoder 

from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

values = array(data['aggression_level']) #save aggression levels as an array
print(values)

# encode with integers
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)

# encode using onehot
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
levels_onehot = onehot_encoder.fit_transform(integer_encoded)
print(levels_onehot)

['CAG' 'CAG' 'CAG' ... 'OAG' 'OAG' 'OAG']
[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 ...
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]]


In [14]:
# We store the reviews and ecoded levels in two arrays as follows:
texts = data['text'].values
levels = levels_onehot

In [15]:
#split data into training and testing data. 20% of data is used for testing and seed of 5 is assigned

from sklearn.model_selection import train_test_split

text_train, text_test, y_train, y_test = train_test_split(texts, levels, test_size=0.20,random_state=5) 

### Tokenize and pad text

In [16]:
#Tokenize texts using keras tokenizer

from keras.preprocessing.text import Tokenizer

#define tokeniser
tokenizer = Tokenizer(num_words=2000)

#Use tokenization only on the training data
tokenizer.fit_on_texts(text_train)

X_train = tokenizer.texts_to_sequences(text_train)
X_test = tokenizer.texts_to_sequences(text_test)

vocab_size = len(tokenizer.word_index) + 1  # Adding 1 because of reserved 0 index

print(text_train[0])
print(X_train[0])
vocab_size

Using TensorFlow backend.


doesnt befits man started career performing jagrata used sing whole night loudspeaker raise question azaan last 1 3 min also would first time would heard fajr azaan delhi clever man know could also rewarded like akshay kumar future involves religious debate
[162, 48, 418, 988, 1745, 1746, 211, 672, 238, 765, 268, 1167, 203, 313, 212, 136, 177, 853, 15, 49, 40, 10, 49, 542, 313, 93, 1747, 48, 17, 213, 15, 6, 446, 269, 90, 766]


8711

In [17]:
from keras.preprocessing.sequence import pad_sequences

X_train = pad_sequences(X_train, padding='post', maxlen=maxLen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxLen)

### Text word embedding
The embedding layer used is a 100d pretrained GloVe embedding trained on a twitter corpus of 27 Billion words.

In [18]:
# load the whole embedding into memory
embeddings_index = dict()
f = open('glove_data/glove.twitter.27B/glove.twitter.27B.100d.txt',encoding="utf8") # gotten from https://nlp.stanford.edu/projects/glove/
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype=np.float32)
    embeddings_index[word] = coefs
f.close()
print('Loaded %s word vectors.' % len(embeddings_index))

Loaded 1193514 word vectors.


In [19]:
# create a weight matrix for words in training file
embedding_matrix = np.zeros((vocab_size, 100))
for word, i in tokenizer.word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

## 4.0 Model Development

The selected architecture of the developed models was adapted from Yoon Kim's Study <i>Convolutional Neural Networks for Sentence Classification </i>. Using this approach, an end to end neural network is used such that the embedding matrix enters a single convolutional layer and after filtering is sent to a max pooling layer to reduce the features. The reduced vector is finally sent to a dropout and softmax.<br>
This architecture was selected because the study showed that this simple CNN with one layer of convolution performs remarkably well even with little tunning.
Tuning will be carried out manually by training and evaluating Four(4) models. In these models, all hyperparameters are held constant while the number of filters used is varied. For all models, validation set used is 0.1 of the training set. also, 10 epochs were used and a batch size of 50 was used.

In [20]:
#import libraries
from keras import layers
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalMaxPooling1D
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

In [21]:
#define CNN text classifier
def conv_classifier(conv_filters):

    model = Sequential()
    model.add(layers.Embedding(vocab_size, 100, weights=[embedding_matrix], input_length=maxLen,trainable=True))
    model.add(layers.Conv1D(conv_filters, 3, activation='relu')) #Kernel size=3, conv_filters will be varied
    model.add(layers.GlobalMaxPooling1D()) # done to reduce the dimensionality of the features 
    model.add(Dropout(0.5))
    model.add(layers.Dense(3, activation='softmax'))
    
    model.compile(optimizer='Adadelta', loss='binary_crossentropy', metrics=['accuracy'])
    model.summary()
    return model

### 4.1 Model 1
In this model, the values of hyperparameters used are obtained from Yoon kim, 2014. They were chosen because based on that study, these were the optimum hyperparameters found after carrying out a grid search.
<br>
<ul>
    <li>Transfer function: rectified linear(ReLU)</li>
    <li>Kernel size: 3 (3 was used although kim used [3,4,5] which takes very long to run)</li> 
    <li>Number of filters: 100</li>
    <li>Dropout rate: 0.5</li>
    <li>Batch Size: 50</li>
    <li>Optimizer: Adadelta</li>
    <li>Final activation: Softmax</li>
</ul>

In [22]:
#train model_1, filter=100.
model_1 = conv_classifier(100)
training = model_1.fit(X_train, y_train, epochs=10, verbose=True, validation_split = 0.1, batch_size=50)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 716, 100)          871100    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 714, 100)          30100     
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 100)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 303       
Total params: 901,503
Trainable params: 901,503
Non-trainable params: 0
_________________________________________________________________


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 2160 samples, validate on 240 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### 4.2 Model 2
In this model, hyperparameters used are as used in Model 1 however, number of filters is increased to 150

In [23]:
#train model_2, filters=150
model_2 = conv_classifier(150)
training = model_2.fit(X_train, y_train, epochs=10, verbose=True, validation_split = 0.1, batch_size=50)
#details about the model: https://keras.io/models/model/ 

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 716, 100)          871100    
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 714, 150)          45150     
_________________________________________________________________
global_max_pooling1d_2 (Glob (None, 150)               0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 150)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 453       
Total params: 916,703
Trainable params: 916,703
Non-trainable params: 0
_________________________________________________________________


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 2160 samples, validate on 240 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### 4.3 Model 3
In this model, hyperparameters used are as used in Model 1 however, number of filters is increased to 200

In [24]:
#train model_3, filters=200
model_3 = conv_classifier(200)
training = model_3.fit(X_train, y_train, epochs=10, verbose=True, validation_split = 0.1, batch_size=50)

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 716, 100)          871100    
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 714, 200)          60200     
_________________________________________________________________
global_max_pooling1d_3 (Glob (None, 200)               0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 603       
Total params: 931,903
Trainable params: 931,903
Non-trainable params: 0
_________________________________________________________________


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 2160 samples, validate on 240 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### 4.4 Model 4
In this model, hyperparameters used are as used in Model 1 however, number of filters is increased to 250 <br>

In [25]:
#train model_4 , filters=250
model_4 = conv_classifier(250)
training = model_4.fit(X_train, y_train, epochs=10, verbose=True, validation_split = 0.1, batch_size=50)

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 716, 100)          871100    
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 714, 250)          75250     
_________________________________________________________________
global_max_pooling1d_4 (Glob (None, 250)               0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 250)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 3)                 753       
Total params: 947,103
Trainable params: 947,103
Non-trainable params: 0
_________________________________________________________________


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 2160 samples, validate on 240 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


The prediction made by the models is an array of probalities of each class. To invert that array into a single integer representation, np.argmax is used to return the indices of the maximum values along each array. This is illustrated below using model_1

In [31]:
model_1.predict(X_test)

array([[0.13942306, 0.04752779, 0.8130492 ],
       [0.2528524 , 0.01347537, 0.7336722 ],
       [0.13010319, 0.02773136, 0.84216547],
       ...,
       [0.24501248, 0.5067857 , 0.2482018 ],
       [0.28610128, 0.49950546, 0.21439318],
       [0.1859524 , 0.69867915, 0.11536851]], dtype=float32)

In [34]:
y_pred = np.argmax(model_1.predict(X_test), axis=1)
y_pred

array([2, 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 2,
       2, 0, 1, 1, 1, 0, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 0, 1, 1, 0, 0, 1,
       2, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 0, 0, 2, 2, 1, 2,
       1, 1, 1, 1, 1, 2, 1, 2, 2, 0, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 1,
       2, 1, 0, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 0, 1, 2, 1, 2, 1, 1, 2, 1,
       1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 0, 1, 2, 0, 2, 2, 1, 1, 1, 0,
       2, 0, 2, 1, 1, 2, 2, 2, 1, 1, 1, 0, 2, 1, 1, 2, 1, 2, 0, 2, 1, 2,
       2, 1, 2, 2, 2, 1, 1, 1, 2, 1, 0, 0, 2, 2, 1, 1, 0, 2, 1, 2, 1, 1,
       0, 1, 1, 2, 2, 0, 0, 2, 1, 2, 2, 0, 1, 2, 1, 0, 2, 2, 2, 2, 2, 2,
       0, 1, 1, 2, 1, 0, 2, 1, 2, 2, 1, 0, 2, 2, 0, 2, 0, 2, 1, 1, 1, 1,
       2, 1, 2, 0, 2, 1, 1, 2, 1, 2, 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1,
       2, 2, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 2, 1, 0, 1, 2, 1, 2, 1,
       1, 2, 1, 1, 2, 1, 0, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 0, 0, 1,
       2, 2, 1, 2, 2, 2, 1, 0, 0, 2, 2, 2, 1, 2, 2,

## 5.0 Model Evaluation
In this section, the quality of models developed will be evaluated using the following metrics:
<ol>
    <li>Confusion Matrix</li>
    <li>Accuracy</li>
    <li>Precision</li>
     <li>Recall score</li>
    <li>F-1 score</li>
</ol>
<b>Confusion Matrix</b> is used here to summarise the prediction results such that the number of correct and incorrect predictions are summarized with count values and broken down by each class. This way, it is easy to visually examine how the model makes predictions<br>
<b>Accuracy</b> is used because it shows the proportion of correct predictions to the total number of input samples. however, this does not provide enough information to make this decision<br>
<b>Precision</b> is also selected because it tells us what proportion of predicted class is truly as predicted. The precision metric helps us to be sure of our prediction<br>
<b>Recall</b> intuitively shows us the ability of the model to find all the positive samples. A low recall indicates many False Negatives.<br>
<b>F1 Score</b> tells us how precise your classifier is (how many instances it classifies correctly), with respect to how robust it is (it does not miss a significant number of instances). With high precision but low recall, you classifier is extremely accurate, but it misses a significant number of instances that are difficult to classify.


In [42]:
# Define Model Evaluation Function
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix, plot_confusion_matrix

def evaluate (model):
    """This function is defined to evaluate a model and print the confusion matrix, accuracy, precision, recall and F-1 score"""
    
    y_pred = np.argmax(model.predict(X_test), axis=1)  #integer encoded y_pred values are used for evaluation computation
    Y_test = np.argmax(y_test, axis=1)  #integer encoded y_pred values are used for evaluation computation
    
    loss, accuracy = model.evaluate(X_test, y_test, verbose=False)   #calculates the accuracy in percentage using output test and prediction values 
    precision= precision_score(Y_test, y_pred , average="weighted")*100   #calculates the weighted precision in percentage using output test and prediction values 
    recall=recall_score(Y_test, y_pred , average="weighted")*100   #calculates the weighted recall in percentage using output test and prediction values 
    f1score= f1_score(Y_test, y_pred , average="weighted")*100   #calculates the weighted f1-score in percentage using output test and prediction values 
    Confusion_matrix= confusion_matrix(Y_test, y_pred )        #creates confusion matrix
    
    # Print accuracy, f1, precision, and recall scores
    print("Evaluation metrics for the model: ")
    print('')
    print('Confusion matrix= ')
    print(Confusion_matrix)
    print('')
    print("Testing Accuracy= {:.2f}%".format(accuracy*100))
    print("Precision Score= {:.2f}%".format(precision))
    print("Recall= {:.2f}%".format(recall))
    print("F-1 Score= {:.2f}%".format(f1score))

In [43]:
# Evaluate model_1
evaluate(model_1)

Evaluation metrics for the model: 

Confusion matrix= 
[[ 32  71  94]
 [ 22 137  45]
 [ 28  44 127]]

Testing Accuracy= 70.50%
Precision Score= 47.13%
Recall= 49.33%
F-1 Score= 46.08%


In [44]:
# Evaluate model_2
evaluate(model_2)

Evaluation metrics for the model: 

Confusion matrix= 
[[ 22  82  93]
 [  9 148  47]
 [ 19  52 128]]

Testing Accuracy= 69.33%
Precision Score= 48.13%
Recall= 49.67%
F-1 Score= 44.74%


In [45]:
# Evaluate model_3
evaluate (model_3)

Evaluation metrics for the model: 

Confusion matrix= 
[[ 77  58  62]
 [ 43 134  27]
 [ 56  47  96]]

Testing Accuracy= 70.39%
Precision Score= 50.64%
Recall= 51.17%
F-1 Score= 50.71%


In [46]:
# Evaluate model_4
evaluate (model_4)

Evaluation metrics for the model: 

Confusion matrix= 
[[ 44  75  78]
 [ 17 145  42]
 [ 25  49 125]]

Testing Accuracy= 69.67%
Precision Score= 52.05%
Recall= 52.33%
F-1 Score= 49.73%


### Evaluation sumary and discussion
The table below summarises the evaluation of the models using Accuracy, Precision, Recall and F-1:

| CNN Model          | Num of filters |Accuracy(%)  |Precision(%)| Recall(%)   |F-1 score(%)|
|--------------------|----------------|-------------|------------|-------------|------------|
| model_1            |    100         |   70.50     | 47.13      | 49.33       | 46.08      |
| model_2            |    150         | 69.33       | 48.13      | 49.67       | 44.74      |
| model_3            |    200         |  70.39      | 50.64      | 51.17       | 50.71      |
| model_4            |    250         |  69.67      | 52.05      | 52.33       | 49.73      |  

1. Confusion matrix: The table below show an interpretation of the confusion matrix.


|                    |CAG predictions| NAG predictions |OAG predictions|
|--------------------|-------------|-------------------|---------------|
| Actual CAG         | True CAG    |   False NAG       |  False OAG    |
| Actual NAG         | False CAG   |   True NAG        |  False OAG    | 
| Actual OAG         | False CAG   |   False NAG       |  True OAG     | 

As can be seen from the matrix of every model, most of the models performed poorly at predicting Covertly aggressive (CAG) text. model_3 however performs significantly better than the rest with 77 true predictions. This means that when the aggressiveness of a text is hidden, model_3 can best predict it. NAG prediction and not really needful because a false prediction will not really cause much harm so can be ignored. with OAG prediction, model_2 makes the most correct predictions having 128 true predictions. This is especially important because not being able to detect an overtly aggressive text can be detrimental. in this case, model_3 performs the poorest.

2. Accuracy:
model_1 shows the highest accuracy meaning that it is able to make the most true predictions however, this is not enough to make a decision.

3. Precision: 
Based on precision, model_4 is the most precise. A trend can be observed here as precision is seen to increases with increasing number of filters.

4. Recall: 
Similar to precision, model_4 was found to have the highest recall score. A trend can also be observed here as recall is seen to increases with increasing number of filters.

5. F1 Score: 
Based on F1 score, model_3 performs the least and model_3 performs best. This means to say that for a model with a good balance of pecision and recall, model_3 should be selected.

### Conclusion
In conclusion, the choice of a suitable model will depend on interest. With special interest in predicting covertly aggressive text, the confusion matrix shows that model_3 will perform best. If we are more intrested in detecting overtly aggressive text, model_2 will be the best fit. Accuracy and recall were seen to increase with increasing number of filters however, for a balance of accuracy and recall, model_3 perfomed best.