In this notebook, we will test a neural network model with a recurrent layer and a convolution layer.  
  
This model scored a public leaderboard score (AUC) of 0.9721 on Kaggle. It is higher than some of the most popular baseline models and thus should be considered a decent starting point. By examining its weakness, we would hopefully learn what should we focus on in the future.

# 1. Preparation
We need to first import the required library, download the data, and load the data into the memory.

## 1.1 Import

In [1]:
print('Importing required packages...')

from IPython.display import clear_output
import re
import pandas as pd
import numpy as np
np.random.seed()
import matplotlib.pyplot as plt
import nltk
from nltk.tokenize import TweetTokenizer
nltk.download('stopwords')
from nltk.stem.wordnet import WordNetLemmatizer 
nltk.download('wordnet')
from keras.preprocessing import sequence
from keras.preprocessing import text as ktxt
from keras.models import Sequential
from keras.layers import Dense, Embedding, GRU, Dropout
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn import metrics


def hint(message):
    """
    erase previous ipynb output and show new message
    """
    clear_output()
    print(message)

  

Importing required packages...
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ChuanLi\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\ChuanLi\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Using TensorFlow backend.


## 1.2 Loading the Data

In [2]:
hint('loading data...')
train = pd.read_csv('data/train.csv')
train, valid = train_test_split(train, test_size=0.2)

labels = [
    'toxic', 
    'severe_toxic', 
    'obscene', 
    'threat', 
    'insult', 
    'identity_hate'
]

Ytr = train[labels].values
Yva = valid[labels].values

hint('Label distribution between training and validation set:')
print(pd.DataFrame({
    'label': labels,
    'train': [np.mean(train[label]) for label in labels],
    'validation' : [np.mean(valid[label]) for label in labels],
}))

Label distribution between training and validation set:
           label     train  validation
0          toxic  0.095170    0.098543
1   severe_toxic  0.009988    0.010027
2        obscene  0.052916    0.053078
3         threat  0.003016    0.002914
4         insult  0.049210    0.049977
5  identity_hate  0.008875    0.008523


# 2. Pre-processing the Input
There are many ways to pre-process the raw strings into valid input for the model. Here we will do it by building a dictionary with all the comments from the training set, mapping the words to their index in the dictionary, and pad/crop the resulting sequences so that they have the same length.

## 2.1 Cleaning Input

In [3]:
tkzr = TweetTokenizer(preserve_case=False)
eng_stopwords = (
    'what', 'which', 'who', 'whom', 
    'this', 'that', 'these', 'those', 
    'am', 'is', 'are', 'was', 'were', 
    'be', 'been', 'being', 
    'have', 'has', 'had', 'having', 
    'do', 'does', 'did', 'doing', 
    'a', 'an', 'the', 
    'and', 'but', 'if', 'or', 
    'because', 'as', 'until', 'while', 
    'of', 'at', 'by', 'for', 'with', 
    'about', 'against', 'between', 
    'into', 'through', 'during', 'before', 'after', 
    'above', 'below', 'to', 'from', 
    'up', 'down', 'in', 'out', 'on', 'off', 
    'over', 'under', 'again', 'further', 
    'then', 'once', 'here', 
    'there', 'when', 'where', 'why', 
    'how', 'all', 'any', 'both', 'each', 
    'few', 'more', 'most', 'other', 'some', 
    'such', 'no', 'nor', 'not', 'only', 
    'own', 'same', 'so', 'than', 'too', 'very', 
    'can', 'will', 'just', 'don', 'should', 'now'
)
lmtzr = WordNetLemmatizer()
appos = {
    "aren't" : "are not",
    "can't" : "cannot",
    "couldn't" : "could not",
    "didn't" : "did not",
    "doesn't" : "does not",
    "don't" : "do not",
    "hadn't" : "had not",
    "hasn't" : "has not",
    "haven't" : "have not",
    "he'd" : "he would",
    "he'll" : "he will",
    "he's" : "he is",
    "i'd" : "I would",
    "i'd" : "I had",
    "i'll" : "I will",
    "i'm" : "I am",
    "isn't" : "is not",
    "it's" : "it is",
    "it'll":"it will",
    "i've" : "I have",
    "let's" : "let us",
    "mightn't" : "might not",
    "mustn't" : "must not",
    "shan't" : "shall not",
    "she'd" : "she would",
    "she'll" : "she will",
    "she's" : "she is",
    "shouldn't" : "should not",
    "that's" : "that is",
    "there's" : "there is",
    "they'd" : "they would",
    "they'll" : "they will",
    "they're" : "they are",
    "they've" : "they have",
    "we'd" : "we would",
    "we're" : "we are",
    "weren't" : "were not",
    "we've" : "we have",
    "what'll" : "what will",
    "what're" : "what are",
    "what's" : "what is",
    "what've" : "what have",
    "where's" : "where is",
    "who'd" : "who would",
    "who'll" : "who will",
    "who're" : "who are",
    "who's" : "who is",
    "who've" : "who have",
    "won't" : "will not",
    "wouldn't" : "would not",
    "you'd" : "you would",
    "you'll" : "you will",
    "you're" : "you are",
    "you've" : "you have",
    "'re": " are",
    "wasn't": "was not",
    "we'll":" will",
    "didn't": "did not"
}

def preprocess(comment):
  
    # credit to the author of this post:
    # https://www.kaggle.com/jagangupta/stop-the-s-toxic-comments-eda

    # remove special format
    comment = re.sub('\n\t', '', comment)

    # remove IP addresses
    comment = re.sub('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', ' specipaddress ', comment)

    # remove username
    comment = re.sub("\[\[User.*\]", ' specusername ', comment)
    comment = re.sub("\[\[User.*\|", ' specusername ', comment)

    # tokenization 
    tokens = tkzr.tokenize(comment)

    # aphostophe replacement
    tokens = [ appos[token] if token in appos else token for token in tokens]

    # remove stopwords
    tokens = [ token for token in tokens if not token in eng_stopwords ]

    # stemming
    tokens = [ lmtzr.lemmatize(token, 'v') for token in tokens]

    return " ".join(tokens)
  

hint('Cleaning train set...')
Xtr = train['comment_text'].apply(lambda c: preprocess(c))
hint('Cleaning test set...')
Xva = valid['comment_text'].apply(lambda c: preprocess(c))
hint('Done')

Done


## 2.2 Transforming Comments to Sequences

In [4]:
vocab_max = 20000

hint('Fitting the tokenizer...')
tokenizer = ktxt.Tokenizer(num_words=vocab_max)
tokenizer.fit_on_texts(Xtr)

hint('Tokenizing...')
Xtr = tokenizer.texts_to_sequences(Xtr)
Xva = tokenizer.texts_to_sequences(Xva)

hint('padding the sequences...')
max_comment_length = 200  # padded/cropped comment length
Xtr = sequence.pad_sequences(Xtr, maxlen=max_comment_length)
Xva = sequence.pad_sequences(Xva, maxlen=max_comment_length)

hint('Done')

Done


# 3. Training Model

In [5]:
model = Sequential()
model.add(Embedding(vocab_max, 100, input_length=max_comment_length))
model.add(Dropout(0.2))
model.add(Conv1D(filters=64, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(GRU(units=32))
model.add(Dense(16, activation='relu'))
model.add(Dense(len(labels), activation='sigmoid'))
model.compile(
    optimizer='adam', 
    loss='binary_crossentropy', 
    metrics=['accuracy']
)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 200, 100)          2000000   
_________________________________________________________________
dropout_1 (Dropout)          (None, 200, 100)          0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 200, 64)           19264     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 100, 64)           0         
_________________________________________________________________
gru_1 (GRU)                  (None, 32)                9312      
_________________________________________________________________
dense_1 (Dense)              (None, 16)                528       
_________________________________________________________________
dense_2 (Dense)              (None, 6)                 102       
Total para

Now training the model.

In [7]:
epochs = 2
batch_size = 64

history = model.fit(
    Xtr, Ytr, 
    epochs=epochs, 
    batch_size=batch_size,
    validation_data=(Xva, Yva)
)

Train on 127656 samples, validate on 31915 samples
Epoch 1/2
Epoch 2/2


Making prediction on the validation set.

In [8]:
hint("Making prediction...")
Yva_ = model.predict(Xva)
hint("Done")

Done


# 4. Result Analysis
## 4.1 Global Accuracy

In [9]:
total_sample = Xva.shape[0]
print("validation set sample count: %d\n" % total_sample)
prediction_total = total_sample*Yva.shape[1]
best_t = None
best_accuracy = 0
for t in [i*0.1 for i in range(1, 10)]:
    accuracy = np.sum(Yva == (Yva_ >= t))/prediction_total
    if accuracy > best_accuracy: 
        best_t = t
        best_accuracy = accuracy
    print("accuracy for threshold %.1f: %.2f%%" % (t, accuracy*100))
Yva_T = Yva_ >= best_t
correct = Yva == Yva_T
print("\nbest threshold: %.1f" % best_t)
print("best accuracy: %.2f%%" % (best_accuracy*100))

validation set sample count: 31915

accuracy for threshold 0.1: 96.55%
accuracy for threshold 0.2: 97.59%
accuracy for threshold 0.3: 98.00%
accuracy for threshold 0.4: 98.19%
accuracy for threshold 0.5: 98.26%
accuracy for threshold 0.6: 98.26%
accuracy for threshold 0.7: 98.17%
accuracy for threshold 0.8: 98.00%
accuracy for threshold 0.9: 97.66%

best threshold: 0.6
best accuracy: 98.26%


## 4.2 Accuracy by Classes

In [24]:
overview = pd.DataFrame(index=[
    'label‰ of all',
    'total wrong', 
    'P->N', 
    'N->P', 
    'P->N %', 
    'N->P %',
    'avg len',
])

def analyze_class(i):
    wrong = valid[correct[:, i] != 1]
    total_class_error = len(wrong)
    print("%d predicted incorrectly (%.2f%% of all samples)" % (
        total_class_error, 
        100*total_class_error/total_sample
    ))
        
    wrong_seqs = Xva[correct[:, i] != 1]
    lens = [ len(seq[seq != 0]) for seq in wrong_seqs]
    avg_len = np.mean(lens)
    print("Falsely predicted sequences have an average length of %d" % avg_len)

    PpN = valid[(Yva[:, i] == 1) & (Yva_T[:, i] == 0)]
    PpN_count = len(PpN)
    print("\n%d (%.2f%%) positive label were predicted to be negative" % (
        PpN_count, 
        100*PpN_count/total_class_error 
    ))
    if PpN_count > 4:
        print("Samples:")
        for sample in PpN.sample(5)['comment_text']:
            display(sample)
  
    NpP = valid[(Yva[:, i] == 0) & (Yva_T[:, i] == 1)]
    NpP_count = len(NpP)
    print("\n%d (%.2f%%) negative label were predicted to be positive" % (
        NpP_count, 
        100*NpP_count/total_class_error 
    ))
    if NpP_count > 4:
        print("Samples:")
        for sample in NpP.sample(5)['comment_text']:
            display(sample)
  
    overview[labels[i]] = [
        np.mean(Yva[:, i]*1000),
        total_class_error, 
        PpN_count,  
        NpP_count,
        100*PpN_count/total_class_error,
        100*NpP_count/total_class_error,
        avg_len
    ]
  
    print('\n')
  

### 4.2.1 Toxic

In [25]:
analyze_class(0)

1208 predicted incorrectly (3.79% of all samples)
Falsely predicted sequences have an average length of 34

899 (74.42%) positive label were predicted to be negative
Samples:


"Prior to Quickpolls, he would have been perma-blocked by now. Guess I'll remove the block again. As soon as I get time, which will probably be about November. If you've got any brains, you will let that particular troll sleep on. Best."

'Your merge was worthless; You basically removed everything in the article about the actual games.'

'u accuse me of that goodfellas thing again and i will hunt you down!'

'It also could forward to male homosexual orgies, of which there surely are several relevant articles. Of course, if someone finds and RS to figure out which is most relevant, that would help, i.e., all male Daisy chains, Bukkake, Gang bang, etc. per various sources better than urban dictionary or lots of porn sites. Jolly times!   (Talkie-Talkie)'

'Tony, you are unclear, unless your opinion is that the RfC was rubbish, and you have no faults what so ever.  The short quips and arguments from you on the RfC are hardly helpful.  Why is it so damn hard for you to simply answer a freaking question straight?  Are you really that incapable of dialog?'


309 (25.58%) negative label were predicted to be positive
Samples:


'"\n\nLucky...\nI\'m going through the images, and you are damn lucky this image does not say: ""DVD still"" like  does..."'

'"\n\nMessage\n\nHey Coolrunner87, you tedious little shit. Why don\'t you just grow a pair and stop wasting everyone\'s time with your whiny posturing about ""vandalism""? Fact is, RD Reynolds is a vile mound of goo and puss and is undeserving of a wikipedia entry. Maybe you should go and start an ""IWC wiki"" or some suchlike, and put all your overweight friends in there? Wikipedia is supposed to be about serious issues like politics, science, geography etc, not some morbidly obese basement dweller who makes up crap about professional wrestling! His books have no sources and contain more errors per page than there are zits on your face.   "'

'I do the cha cha like a faggot.'

'my dear winner, you lunatic remarks ab the serious points I was discussing makes you worthless as an editor\nand esp to call them vandalism ... you have some kind of moronic deviance that ill suits you to do anything on Wikipedia\nmany thanks for your leaving all edits alone in future with such idiotic diatribes ... thanks again'

'"\n\n Abusive Admins \n\nI was recently banned from wikipedia by  Crum falsely claimed multiple times that I was banned for using the ""epithet"" ""liar"".  This is verifiably false.  In none\nof\nmy\nposts did I use the ""epithet"" ""liar"".  \n\nYes, I did say that SlimVirgin lies.  My defense for making such an accusation is that in the immediate moments before I made that post I saw her lie about me.  But I didn\'t call her a liar.  I merely said she lies.  This happens to be true and is one of the many reasons SlimVirgin was  voted ""wikipedia\'s most abusive admin"".  Crum\'s accusation against me was false and his ban was baseless and abusive.   "'





### 4.2.2 Severe Toxic

In [26]:
analyze_class(1)

303 predicted incorrectly (0.95% of all samples)
Falsely predicted sequences have an average length of 33

279 (92.08%) positive label were predicted to be negative
Samples:


'fuk u retard i hate u i will kill u bettr watch ur bak cuz im teamin up wit my mothafuka boothy and we gonna rise the south on yo ass bitch'

'About fucking time you retarded little bitch!'

'YOURE A FRREAKING JEW, AND PLEAZE COMMIT SUICIDE.'

'fag \n\nyou fucking suck. people need to know the true meaning of thumbing'

'Fuck off!!!!- () Wolfkeeper (Talk)'


24 (7.92%) negative label were predicted to be positive
Samples:


'FUCK YOU I NEVER STOP FUCKING YOU HARRY POTTER FANS!!!!!!! HARRY POTTER SUCKZ! this fucking porn face!!!'

'I should try to track down your IP so I can burn your house down you irritating and loathesome little shit. Who the fuck who dyou think you are? I wish I knew who yo uwere so I could beat the everliving shit out of you, you fucking whiny, annoying loser. Stop watching STar Trek re-runs and get laid you fucking pussy ass pansy. I hope you fucking get AIDS and cancer at the same time.'

'Get off my fucking back.. You God-damn little Indian RETARD \n\nGet off my fucking back.. You God-damn little Indian RETARD. Go and bully around someone else. AND you can shove up your SPI up you Indian ASS.'

'Fuck off Mexican bastard! I will work endelessly to have your Wikipedia account banned indefinitely- you have made my life a misery and I hope you die.'







### 4.2.3 Obscene

In [27]:
analyze_class(2)

568 predicted incorrectly (1.78% of all samples)
Falsely predicted sequences have an average length of 36

408 (71.83%) positive label were predicted to be negative
Samples:


'George Remus \n\nGeorge Remus is not a fictional character. Do some FUCKING research and you will see he was a real person. A fictional book was written about him but that does not make Remus, himself, fictional.'

'Nipple \n\nNipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple Nipple N

'I DID F*** ALL HERE SO F*** OF WIKIPEDIA YOU ARROGANT SELF RIGHTEOUS C***S'



"Go to Hell, fatso \n\nHey, Dickwad! If you ignore me, I'll poke a hole in you and release all the fat! I WILL NOT hesitate to roast you in an oven like the little piggy you are! Unblock everyone you've blocked or suffer the consequences!!!"


160 (28.17%) negative label were predicted to be positive
Samples:


"Yes you stupid asshole, it does count. Just because it's not mentioned on the screen does not mean it's canon. Like WyrmKing said earlier, Nathan Petrelli's sons' names weren't mentioned on the script but the credits showed their names. Does that mean, they didn't have names? 800 series is T-800. Merchan-fucking-dise says it's T-800. The writers says it's T-800. The creature desginer, Stan Winston says it's T-800. It's fucking T-800. T-fucking-800."

'I agree \n\nDayewalker is just another republicant whore who worships Palin McCain'

'that dumbass you got there didnt improve anythign all he did was try and steal my work and block my from it because hekept sayign i had fake stuff which all came from other places and wiki its self so you tell him to go to hell'

"Absurd claims \n\nStop, you dimwit. Your accusations and insinuations may be well-intentioned but you are clueless. Sure, be bold but don't be fucking stupid. -"

"SHUT UP AND LEAVE ME ALONE  \n\nFAGGOT! SHUT UP. GO AHEAD BAN ME I DON'T CARE. JUST SHUT UP AND LEAVE MY ALONE."





### 4.2.4 Threat

In [28]:
analyze_class(3)

93 predicted incorrectly (0.29% of all samples)
Falsely predicted sequences have an average length of 29

93 (100.00%) positive label were predicted to be negative
Samples:


'HOPE YOUR HEAD GETS CUT OFF AND SOMEONE WIPS THERE ASS WITH IT AND THEN STABS YOU IN YOUR HEART'

"you are part of isis \n\nCouple of days ago u threaten me to give u Ebola and if I don't you were going to destroy the white house"

'you just wait \n\nyour death is near'

'I am going to kill you I am going to murder you'

"Who gives a shit? Anthony Bourdain is a tool. I just wish he had children so I could stomp on there testicals. Just kidding but he is a massive tool. He couldn't carry Emerils jockstrap. Next time I see him I'll knock his teeth out...."


0 (0.00%) negative label were predicted to be positive




### 4.2.5 Insult

In [29]:
analyze_class(4)

890 predicted incorrectly (2.79% of all samples)
Falsely predicted sequences have an average length of 29

664 (74.61%) positive label were predicted to be negative
Samples:


"oh my bolagne has a first name it's d-u-m-b, my bbolange has a second name it's s-h-i-t."

"Oh cool, you deleted other part, what is next? You will delete the whole article too? Good job! Can block me again, it's just that you can do. I-D-I-O-T"

'In German and European law you are the fool. Private datas are the property of the individual person.\nYou want to force somebody to open his personal datas. That is a violation of german data protection law. MP'

'hey jerk \n\ngo write about gardening or britney spears or toby keith and FUCKING stop ruining articles on topics you have no damn clue about.'

'what up, my comma deleting nigga?'


226 (25.39%) negative label were predicted to be positive
Samples:


'GO F**K YOURSELF TOO! LIKE I GIVE A SHIT ABOUT A BAN!'

"By the way, I see you can't help but whine to others about how my words are affecting the last few brain cells you seem to have held on to. Fuckin' pussy."

'"\n\nYou are an open source patent buster. You attacked the Collins patents FIRST, right? I give you facts and you spew politics. I don\'t have to stand for that. You are being stupid. Stupidity destroys. You are one idiot of many. You are worthless. Apologize for that and I will for mine. But I still hate you. That is a solid fact. There is something wrong with people like you, but that\'s your problem.   \n\nAlso, unlike Hod Lipson\'s bigoted crazy comments on how ""legal immigrants doing all the work"" (meaning dirty work as he had referred to) you, Feitas, Merkle Roco (who RUNS the ENTIRE nano department at NSF and he\'s an X communist scientist!) are taking our GOOD jobs, in fact STEALING THEM as they were stolen from me and as they are stealing the Collins patent.   \n\n... Oh, and you said you came here ""15 years ago to do work that no American was able to do."" What a pompous ass! You think you are BETTER than ANY American living here? Go to hell!   "'

'"\nFuck Wikipedia and fuck Wales. The article cannot be more of a perfect case of violating WP:BLPCRIME. If I were a defense attorney I would be wetting my tiny white pants. Wales is a media whore. It suits him to have that article here. Chan "'

"Absurd claims \n\nStop, you dimwit. Your accusations and insinuations may be well-intentioned but you are clueless. Sure, be bold but don't be fucking stupid. -"





### 4.2.6 Identity Hate

In [30]:
analyze_class(5)

272 predicted incorrectly (0.85% of all samples)
Falsely predicted sequences have an average length of 26

272 (100.00%) positive label were predicted to be negative
Samples:


'"\n\nGreat, the Ukrainian nationalists again. go back to your filthy hole, UPA dog! Fedor grew up in Russia, he only was in Ukraine for 2 yrs, he has a Russian passport. end of story. stop claiming what doesn\'t belong to you! now you bastards say that ancient RUS dukes were Ukrainian! You say that ancient Rus itself was Ukraine! oh, yeah, the whole world is Ukraine! keep telling yourself that, fucking Orange asshole! I am half-Ukrainian myself, people like you just piss me off! oh, and no, his surname is NOT ""100% Ukrainian""! You know how many people in Krasnodar, Stavropolie, and other areas of RUSSIA, have last names that end with ""-enko""? millions! myself included (Levchenko). fucking uneducated banderovitz nazi, you have no place here, go, save your antisemitic, suppremacist theories for others (hint, maybe your brothers in Poltava will listen to your preachings). You people are the reason for all the hatred between Russians and Ukrainians these days! its people like you that

'HOE \n\nYour A slut hoe basstatard homo rapeist'

'"HOLY SHIT I""M GAY AS HELL.\n\n"'

"Jews and Jews \n\nCan we (meaning male whites) truely co-exist with Jews? I mean niggers and spics are bad enough but the fucking jews go too far. I think there should be a genocide, and there should be an article on wikipedia that encourages the death of jews, for those who agree please message here, for those that don't, eat me. We Came In?"

'"\n\nThat is not a insult. That is a reality, that link is a lint to a racist group. And armenian grow up  their children with racism. They send their children to ""genocide camps"" and teach them how barbarian Turks are making genocide(!). If you dont believe me please ask an Armenian user.  "'


0 (0.00%) negative label were predicted to be positive




### 4.2.7 Overview

In [31]:
overview.astype(int)

Unnamed: 0,toxic,severe_toxic,obscene,threat,insult,identity_hate
label‰ of all,98,10,53,2,49,8
total wrong,1208,303,568,93,890,272
P->N,899,279,408,93,664,272
N->P,309,24,160,0,226,0
P->N %,74,92,71,100,74,100
N->P %,25,7,28,0,25,0
avg len,34,33,36,29,29,26


# 5. Observation and Conclusion


*   The classes that have more positive labels performed better. Giving these rare labels more weight may solve this.
*   False negative is more common than false positive, although the severity of this problem varies among the classes.
*   Quotes could be a potential cause of error (need further test).
*   It seems that the model cannot differentiate those comments that are filled with toxic content and those that have a small portion of it. Utilizing TFIDF may solve this.


