# Data Exploration
We start by printing some examples of each one of the different toxic examples types:
'toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'.

This way, we will better understand what each one of these comment hate types really mean.

Let's first import the necessary data.

In [1]:
# Import packages
%matplotlib inline
import os
import pandas as pd
import numpy as np

In [2]:
# Import dataset
parent_path = os.path.dirname(os.getcwd())
fname = 'train.csv'
csv_path = parent_path + '/data/raw/' + fname
data = pd.read_csv(csv_path)

Now, let's build the function that will be used to search, in loaded data, each type of toxic comment.

In [3]:
def find_comments(data, toxic_type, n, shuffle=True):
    """
    Finds n comments of given toxicity type in given data.
    :param data: pandas.DataFrame
        Comments and corresponding classification
    :param toxic_type: str
        One of
            'toxic', 'severe_toxic', 'obscene', 'threat', 
            'insult', 'identity_hate'
    :param n: int
        Number of comments to retrieve
    :param shuffle: bool, default True
        If False, the first n comments of given type are returned.
        Else, a random sample of size n is returned.
    :return: pandas.DataFrame
        n comments of given toxicity type
    """
    filt = data[data[toxic_type] == 1]
    if shuffle:
        filt = filt.iloc[np.random.permutation(len(filt))]
    
    return filt.head(n)

Finnaly, let's print 5 comments of each toxic comment type

In [4]:
toxic_types = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
for t in toxic_types:
    comments = find_comments(data, t, n=5)
    print('>>> Category: %s <<<' % t)
    for _, r in comments.iterrows():
        print('\n')
        print('[Comment ID: %s]' % r['id'])
        print(r['comment_text'])
        print(r.loc[toxic_types].to_dict())

>>> Category: toxic <<<


[Comment ID: f1d7893688a3e334]
i anal rape little iraqie children while i type crap in on wikipedia.
{'toxic': 1, 'severe_toxic': 1, 'obscene': 1, 'threat': 0, 'insult': 1, 'identity_hate': 1}


[Comment ID: a978ec8cda72adb3]
"

 Comments on Talk:Ashurism 

Hi Elias.  You made a few comments on the above talk page, such as ""Damn Garzo, no offence, and I do mean this in a positive way, but you're truly a pain in the ass )"", ""You are obviously, like most Wikipedians, an enemy of Freedom of Speech, you have issues with someone questioning your motives, and you are also a hypocrite"", and ""That's really, because you are an old man, Garzo. You know, a little stiff. Again, no offence, but you should stop taking yourself too seriously, you know, be a little bit more humble. "".  While none of these are, taken in context, especially offensive, your tone was flippant and unhelpful (and bordering on rude in the second example, with some unfair accusations of racism 