<h1>Bias in data : Analysis 2</h1> 

<h3>Analyze the comments dataset for all the comments datasets and  answer some of the following questions</h3>
<ul><li>Analyze the words most commonly associated with each of the three types of hostile speech</li>
    <li>Are certain words more likely to be associated with comments labelled as hostile speech? Are there certain words that are frequently associated with one type of hostile speech (like “personal attacks”) but not others (like “toxicity”)?</li>
    <li>Are these words representative of words that you would associate with hostile speech? Do you think these frequently labelled words are a good representation of hostile speech in online discussions outside of Wikipedia? Of offline discussions? Why or why not?</li>


<h2>
Step 1: Background
</h2>

<p>The goal of this assignment is to identify what, if any, sources of bias may exist in these datasets, and to develop testable hypotheses about how these biases might impact the behavior of machine learning models trained on the data, when those models are used for research purposes or to power data-driven applications. The purpose of this assignment is to demonstrate that you are able to perform a self-directed exploratory data analysis and think critically about the implications of your findings.</p>

<p>The corpus we use for the detox project is called the Wikipedia Talk corpus, and it consists of three datasets. Each dataset contains thousands of online discussion posts made by Wikipedia editors who were discussing how to write and edit Wikipedia articles. Crowdworkers labelled these posts for three kinds of hostile speech: “toxicity”, “aggression”, and “personal attacks”. Many posts in each dataset were labelled by multiple crowdworkers for each type of hostile speech, to improve accuracy.</p>

<p>Google data scientists used these <a href='https://figshare.com/projects/Wikipedia_Talk/16731'>annotated datasets</a> to train machine learning models as part of a project called <a href='https://conversationai.github.io/'>Conversation AI</a>. The models have been used in a variety of software products and made freely accessible to anyone through the Perspective API. </p>


<p>All data we have collected and generated for the <a href='https://meta.wikimedia.org/wiki/Research:Detox'>Wikipedia Detox</a> project is available under free licenses on the <a href ='https://figshare.com/projects/Wikipedia_Talk/16731'>Wikipedia Talk Corpus on Figshare</a>, per the <a href='https://foundation.wikimedia.org/wiki/Open_access_policy'>open access policy</a>. There are currently two distinct types of data included:<p>
   <ol><li> A corpus of all 95 million user and article talk diffs made between 2001–2015 which can be scored by our personal attacks model.</li>
   <li> An annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff) for personal attacks, aggression, and toxicity.</li></ol>
<h4>These datasets can be downloaded from the below links</h4>

<ul><li>https://figshare.com/articles/dataset/Wikipedia_Talk_Labels_Aggression/4267550</li>
<li>https://figshare.com/articles/dataset/Wikipedia_Talk_Labels_Toxicity/4563973</li>
<li>https://figshare.com/articles/dataset/Wikipedia_Talk_Labels_Personal_Attacks/4054689</li>
<li>https://figshare.com/articles/dataset/Wikipedia_Talk_Corpus/4264973</li></ul>

<h2>Step 2: Analysis of Personal Attacks, Toxicity and Aggression datasets </h2>
    
    
<h4><em>We start with importing all the packages we need for doing the data analysis. </em></h4>

In [210]:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import numpy as np
import collections
from collections import Counter 

<h4><em>Download annotated comments,annotations and demographic datasets for personal attack directly from the urls and save them as .tsv files</em></h4>

In [177]:
# download annotated comments,annotations and demographics for personal attack

personal_attacks_annotations_url = 'https://ndownloader.figshare.com/files/7554637' 
personal_attacks_annotated_comments_url = 'https://ndownloader.figshare.com/files/7554634' 
personal_attacks_worker_demographics_url = 'https://ndownloader.figshare.com/files/7640752'


def download_file(url, fname):
    urllib.request.urlretrieve(url, fname)

                
download_file(personal_attacks_annotations_url , 'personal_attacks_annotations.tsv')
download_file(personal_attacks_annotated_comments_url, 'personal_attacks_annotated_comments.tsv')
download_file(personal_attacks_worker_demographics_url, 'personal_attacks_worker_demographics.tsv')

<h4><em>Download annotated comments,annotations and demographic datasets for toxicity directly from the urls and save them as .tsv files</em></h4>

In [178]:
# download annotated comments,annotations and demographic details for toxicity

toxicity_annotations_url = 'https://ndownloader.figshare.com/files/7394539' 
toxicity_annotated_comments_url = 'https://ndownloader.figshare.com/files/7394542' 
toxicity_worker_demographics_url = 'https://ndownloader.figshare.com/files/7640581'

def download_file(url, fname):
    urllib.request.urlretrieve(url, fname)
                
download_file(toxicity_annotations_url , 'toxicity_annotations.tsv')
download_file(toxicity_annotated_comments_url, 'toxicity_annotated_comments.tsv')
download_file(toxicity_worker_demographics_url, 'toxicity_worker_demographics.tsv')

<h4><em>Download annotated comments,annotations and demographic datasets for aggression directly from the urls and save them as .tsv files</em></h4>

In [179]:
# download annotated comments,annotations and demographic details for aggression

aggression_annotations_url = 'https://ndownloader.figshare.com/files/7394506' 
aggression_annotated_comments_url = 'https://ndownloader.figshare.com/files/7038038' 
aggression_worker_demographics_url = 'https://ndownloader.figshare.com/files/7640644'

def download_file(url, fname):
    urllib.request.urlretrieve(url, fname)
                
download_file(aggression_annotations_url , 'aggression_annotations.tsv')
download_file(aggression_annotated_comments_url, 'aggression_annotated_comments.tsv')
download_file(aggression_worker_demographics_url, 'aggression_worker_demographics.tsv')

<h4><em> Read all the .tsv files as tab seperated files for personal attacks and save them as python dataframes </em></h4>

In [180]:
personal_attacks_comments = pd.read_csv('personal_attacks_annotated_comments.tsv', sep = '\t', index_col = 0)
personal_attacks_demographics = pd.read_csv('personal_attacks_worker_demographics.tsv',  sep = '\t')
personal_attacks_annotations = pd.read_csv('personal_attacks_annotations.tsv',  sep = '\t');

In [181]:
toxicity_comments = pd.read_csv('toxicity_annotated_comments.tsv', sep = '\t', index_col = 0)
toxicity_demographics = pd.read_csv('toxicity_worker_demographics.tsv',  sep = '\t')
toxicity_annotations = pd.read_csv('toxicity_annotations.tsv',  sep = '\t');

In [182]:
aggression_comments = pd.read_csv('aggression_annotated_comments.tsv', sep = '\t', index_col = 0)
aggression_demographics = pd.read_csv('aggression_worker_demographics.tsv',  sep = '\t')
aggression_annotations = pd.read_csv('aggression_annotations.tsv',  sep = '\t');

<strong> Lets understand the different columns in the comments dataset</strong>

<ul><li><strong>rev_id: </strong> MediaWiki revision id of the edit that added the comment to a talk page (i.e. discussion).</li>
<li><strong>comment: </strong> Comment text. Consists of the concatenation of content added during a revision/edit of a talk page. MediaWiki markup and HTML have been stripped out. To simplify tsv parsing, \n has been mapped to NEWLINE_TOKEN, \t has been mapped to TAB_TOKEN and " has been mapped to ".</li>
<li><strong>year: </strong> The year the comment was posted in.</li>
<li><strong>logged_in: </strong> Indicator for whether the user who made the comment was logged in. Takes on values in {0, 1}.</li>
<li><strong>ns: </strong>Namespace of the discussion page the comment was made in. Takes on values in {user, article}.</li>
<li><strong>sample: </strong> Indicates whether the comment came via random sampling of all comments, or whether it came from random sampling of the 5 comments around a block event for violating WP:npa or WP:HA. Takes on values in {random, blocked}.</li>
<li><strong>split: </strong> For model building in our paper we split comments into train, dev and test sets. Takes on values in {train, dev, test}.</li></ul>

<h3> Potential sources of bias:</h3>

<ul><li><strong>Features missing values: </strong> Dataframe.count() gives the count of records that have values populated for each column in the dataset. If the dataset has missing values for features for a large number of examples, then that could be an indicator that certain characteristics of the data set are under-represented. Here we observe that no values are missing for any of the features in comments dataset</li>
    
<p><li><strong>Omitted variable bias: </strong> The research paper highlights that machines are being trained purely on features extracted only from the comment text instead of including features based on the authors’ past behavior and the discussion context. This means that we may be omitting some features which could be directly correlated with the response variable. For example, people’s cultural backgrounds and personal sensibilities play a significant role in whether they perceive content as personal attack. So considering information beyond the text, such as demographic
information about the speaker, can improve the accuracy for personal attack detection. Having some background information about the user of a post may be very predictive. A user who is known to write hate speech messages may do so again. A user who is not known to write such messages is unlikely to do so in the future. We do not have this information available. Also, without taking the context into account, the models will be not trained to generalize to unseen examples.</li></p>

<strong>We do the following in the below chunk of code: </strong>
<ul> 
    <li>Get an idea of what the comments dataset looks like using the dataframe.head()</li>
    <li>Get the count of missing values in the dataset using dataframe.count(). We infer from the results that none of the values are missing for any of the features</li>
    <li>Repeat bullet points 1 and 2 for all comments datasets</li>
<ul>

In [183]:
personal_attacks_comments.head()

Unnamed: 0_level_0,comment,year,logged_in,ns,sample,split
rev_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
37675,`-NEWLINE_TOKENThis is not ``creative``. Thos...,2002,False,article,random,train
44816,`NEWLINE_TOKENNEWLINE_TOKEN:: the term ``stand...,2002,False,article,random,train
49851,"NEWLINE_TOKENNEWLINE_TOKENTrue or false, the s...",2002,False,article,random,train
89320,"Next, maybe you could work on being less cond...",2002,True,article,random,dev
93890,This page will need disambiguation.,2002,True,article,random,train


In [184]:
toxicity_comments.head()

Unnamed: 0_level_0,comment,year,logged_in,ns,sample,split
rev_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2232.0,This:NEWLINE_TOKEN:One can make an analogy in ...,2002,True,article,random,train
4216.0,`NEWLINE_TOKENNEWLINE_TOKEN:Clarification for ...,2002,True,user,random,train
8953.0,Elected or Electoral? JHK,2002,False,article,random,test
26547.0,`This is such a fun entry. DevotchkaNEWLINE_...,2002,True,article,random,train
28959.0,Please relate the ozone hole to increases in c...,2002,True,article,random,test


In [185]:
aggression_comments.head()

Unnamed: 0_level_0,comment,year,logged_in,ns,sample,split
rev_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
37675,`-NEWLINE_TOKENThis is not ``creative``. Thos...,2002,True,article,random,train
44816,`NEWLINE_TOKENNEWLINE_TOKEN:: the term ``stand...,2002,True,article,random,train
49851,"NEWLINE_TOKENNEWLINE_TOKENTrue or false, the s...",2002,True,article,random,train
89320,"Next, maybe you could work on being less cond...",2002,True,article,random,dev
93890,This page will need disambiguation.,2002,True,article,random,train


In [186]:
personal_attacks_comments.count()

comment      115864
year         115864
logged_in    115864
ns           115864
sample       115864
split        115864
dtype: int64

In [187]:
toxicity_comments.count()

comment      159686
year         159686
logged_in    159686
ns           159686
sample       159686
split        159686
dtype: int64

In [188]:
aggression_comments.count()

comment      115864
year         115864
logged_in    115864
ns           115864
sample       115864
split        115864
dtype: int64

<strong>The below code chunk does the following</strong>
<ul><li>Remove newline and tab tokens from <strong>personal_attacks_comments</strong> dataset</li>
    <li>Label a comment as an attack if the majority of annotators did so. Here we assume a classifier threshold of 0.5, that is when more than 50% of annotators quote a comment as attack, we consider it an attack</li>
    <li>Join the comments with labels</li>
    <li>Display comments from the dataframe where attack is True using dataframe.query</li>
    <li>Display the dataframe using dataframe.head() function</li>
</ul>

In [189]:
personal_attacks_comments['comment'] = personal_attacks_comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
personal_attacks_comments['comment'] = personal_attacks_comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))

In [190]:
labels = personal_attacks_annotations.groupby('rev_id')['attack'].mean() > 0.5
personal_attacks_comments['attack'] = labels
personal_attacks_comments.head()

Unnamed: 0_level_0,comment,year,logged_in,ns,sample,split,attack
rev_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
37675,`- This is not ``creative``. Those are the di...,2002,False,article,random,train,False
44816,` :: the term ``standard model`` is itself le...,2002,False,article,random,train,False
49851,"True or false, the situation as of March 200...",2002,False,article,random,train,False
89320,"Next, maybe you could work on being less cond...",2002,True,article,random,dev,False
93890,This page will need disambiguation.,2002,True,article,random,train,False


<strong>The below code chunk does the following</strong>
<ul><li>Remove newline and tab tokens from <strong>toxicity_comments</strong> dataset</li>
    <li>Label a comment as toxic if the majority of annotators did so. Here we assume a classifier threshold of 0.5, that is when more than 50% of annotators quote a comment as attack, we consider it an attack</li>
    <li>Join the comments with labels</li>
    <li>Display comments from the dataframe where attack is True using dataframe.query</li>
    <li>Display the dataframe using dataframe.head() function</li>
</ul>

In [191]:
toxicity_comments['comment'] = toxicity_comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
toxicity_comments['comment'] = toxicity_comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))

In [192]:
labels = toxicity_annotations.groupby('rev_id')['toxicity'].mean() > 0.5
toxicity_comments['toxicity'] = labels
toxicity_comments.head()

Unnamed: 0_level_0,comment,year,logged_in,ns,sample,split,toxicity
rev_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2232.0,This: :One can make an analogy in mathematical...,2002,True,article,random,train,False
4216.0,` :Clarification for you (and Zundark's righ...,2002,True,user,random,train,False
8953.0,Elected or Electoral? JHK,2002,False,article,random,test,False
26547.0,`This is such a fun entry. Devotchka I once...,2002,True,article,random,train,False
28959.0,Please relate the ozone hole to increases in c...,2002,True,article,random,test,False


<strong>The below code chunk does the following</strong>
<ul><li>Remove newline and tab tokens from <strong>aggression_comments</strong> dataset</li>
    <li>Label a comment as aggressive if the majority of annotators did so. Here we assume a classifier threshold of 0.5, that is when more than 50% of annotators quote a comment as attack, we consider it an attack</li>
    <li>Join the comments with labels</li>
    <li>Display comments from the dataframe where attack is True using dataframe.query</li>
    <li>Display the dataframe using dataframe.head() function</li>
</ul>

In [193]:
aggression_comments['comment'] = aggression_comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
aggression_comments['comment'] = aggression_comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))

In [194]:
labels = aggression_annotations.groupby('rev_id')['aggression'].mean() > 0.5
aggression_comments['aggression'] = labels
aggression_comments.head()

Unnamed: 0_level_0,comment,year,logged_in,ns,sample,split,aggression
rev_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
37675,`- This is not ``creative``. Those are the di...,2002,True,article,random,train,False
44816,` :: the term ``standard model`` is itself le...,2002,True,article,random,train,False
49851,"True or false, the situation as of March 200...",2002,True,article,random,train,False
89320,"Next, maybe you could work on being less cond...",2002,True,article,random,dev,False
93890,This page will need disambiguation.,2002,True,article,random,train,False


<p>We call the function <strong>kFreqWord</strong> by passing in the cleaned up comments that are perceived personal attacks and also an interger k which represents count of words most frequently occurring in the comments</p>

In [228]:
def kFreqWord(str_list,k):
    l = str_list.tolist()
    split_it =[i.split()[0] for i in l]
    count = Counter(split_it)
    most_occur = count.most_common(k) 
    return most_occur

<p>The function <strong>identityInComment</strong> takes in 2 arguments, list of identity words and list of comments that are perceived as personal attacks. Identity words represent the frequently targeted groups. The function returns number of times identity words are repeated in a comment. For example, if a comment has words "black" and "woman", then count of the identify words = 2 in a single comment</p>

In [229]:
def identityInComment(str_list,identity_list):
    
    count_wrd =0 
    count_comment = 0
    wrd_list = []
    for s in str_list:
        count_comment+=1
        #print("master str : ",s)
        for wrd in identity_list:
            if(wrd in s):
                wrd_list.append(wrd)
                count_wrd+=1
                
    return count_wrd,count_comment,list(set(wrd_list))

<h3>Analyze the words most commonly associated with each of the three types of hostile speech</h3>

<ul><p><li>We call the function <strong>identityInComment</strong> by passing in the cleaned up comments that are perceived personal attacks and also the list of identity words representing targeted groups that appear in these comments

We can infer from above code that some of the common words associated with each of the 3 types of hostile speech are <strong>'deaf', 'blind', 'muslim', 'gay', 'black', 'woman', 'sexuality', 'feminist','destroy', 'loser', 'kill', 'hate', 'attack'</strong></li></p>

<p><li>The function <strong>kFreqWord</strong> takes in 2 arguments and provies the list of top k frequently used words in the comments.</li></p>
</ul>

<strong>The below code chunk does the following</strong>
<ul><li>Get the comments from the dataframe where attack column is True using dataframe.query into the list 'str_list'</li>
    <li>Define the list that contains some identity words as 'identity_list'</li>
    <li>Define the list that contains some words pertaining to hatespeech as 'hatespeech_list'</li>
    <li>Call the function identityInComment by passing in the arguments str_list and identity_list and get the values for count of identity words that appeared in these many number of comments and also the actual identity words</li>
    <li>Call the function identityInComment by passing in the arguments str_list and hatespeech_list and get the values for count of hatespeech words that appeared in these many number of comments and also the actual hatespeech words</li>
    <li>Provides the list of top 50 frequently used words from the comments</li>
</ul>

In [235]:
str_list = personal_attacks_comments.query('attack')['comment']
identity_list = ['black', 'muslim', 'feminist', 'woman', 'gay','deaf','blind','African','Asian','sexuality']
hatespeech_list = ['kill','hate','loser','destroy','attack','terror']
print()
count_wrd,count_comment,lst = identityInComment(str_list,identity_list)  
print("Number of times identity words appeared in comments that are perceived as attacks ==>", count_wrd , " times identity words in" , count_comment, " comments ") 
print("Identity words that appeared in the list: ",lst)
print()
count_wrd_hate,count_comment_hate,lst = identityInComment(str_list,hatespeech_list) 
print("Number of times hate-speech words appeared in comments that are perceived as attacks ==>", count_wrd_hate , " times hateful words in" , count_comment_hate, " comments ") 
print("Hate-speech words that appeared in the list: ",lst)
print()
topk_occur = kFreqWord(str_list,50)
print("Most frequently occuring words in comments: ",topk_occur) 


Number of times identity words appeared in comments that are perceived as attacks ==> 717  times identity words in 13590  comments 
Identity words that appeared in the list:  ['African', 'Asian', 'deaf', 'blind', 'muslim', 'gay', 'black', 'woman', 'sexuality', 'feminist']

Number of times hate-speech words appeared in comments that are perceived as attacks ==> 1707  times hateful words in 13590  comments 
Hate-speech words that appeared in the list:  ['destroy', 'loser', 'kill', 'terror', 'hate', 'attack']

Most frequently occuring words in comments:  [('==', 3499), ('`', 1622), ('I', 408), ('You', 303), ('Fuck', 120), ('you', 114), ('YOU', 90), ('fuck', 77), ('FUCK', 74), ('This', 74), ('What', 67), (':', 67), ('Hey', 65), ('.', 65), ('i', 64), ('::', 63), (',', 63), ('Why', 62), ('Go', 60), ("You're", 57), ('Your', 52), ('hey', 42), ('Oh', 41), ('The', 40), ("I'm", 35), ('and', 35), ('How', 35), ('Please', 34), ('Are', 34), ('If', 31), ('And', 31), ('Stop', 30), ('Do', 29), ('==You'

<strong>The below code chunk does the following</strong>
<ul><li>Get the comments from the dataframe where toxicity column is True using dataframe.query into the list 'str_list'</li>
    <li>Define the list that contains some identity words as 'identity_list'</li>
    <li>Define the list that contains some words pertaining to hatespeech as 'hatespeech_list'</li>
    <li>Call the function identityInComment by passing in the arguments str_list and identity_list and get the values for count of that appeared in these many number of comments and also the actual identity words</li>
    <li>Call the function identityInComment by passing in the arguments str_list and hatespeech_list and get the values for count of hatespeech words that appeared in these many number of comments and also the actual hatespeech words</li>
</ul>

In [236]:
str_list = toxicity_comments.query('toxicity')['comment']
identity_list = ['black', 'muslim', 'feminist', 'woman', 'gay','deaf','blind','African','Asian','sexuality']
hatespeech_list = ['kill','hate','loser','destroy','attack','terror']
print()
count_wrd,count_comment,lst = identityInComment(str_list,identity_list)
print("Number of times identity words appeared in comments that are perceived as toxic ==>", count_wrd , " times identity words in" , count_comment, " comments ")  
print("Identity words that appeared in the list: ",lst)
print()
count_wrd_hate,count_comment_hate,lst = identityInComment(str_list,hatespeech_list) 
print("Number of times hate-speech words appeared in comments that are perceived as toxic ==>", count_wrd_hate , " times hateful words in" , count_comment_hate, " comments ")
print("Hate-speech words that appeared in the list: ",lst)
print()
topk_occur = kFreqWord(str_list,50)
print("Most frequently occuring words in comments: ",topk_occur) 


Number of times identity words appeared in comments that are perceived as toxic ==> 812  times identity words in 15362  comments 
Identity words that appeared in the list:  ['African', 'Asian', 'deaf', 'blind', 'muslim', 'gay', 'black', 'woman', 'sexuality', 'feminist']

Number of times hate-speech words appeared in comments that are perceived as toxic ==> 1891  times hateful words in 15362  comments 
Hate-speech words that appeared in the list:  ['destroy', 'loser', 'kill', 'terror', 'hate', 'attack']

Most frequently occuring words in comments:  [('==', 3888), ('`', 1861), ('I', 454), ('You', 313), ('you', 122), ('Fuck', 119), ('YOU', 101), ('This', 85), ('What', 81), ('fuck', 81), (':', 80), ('.', 76), ('FUCK', 74), ('Hey', 71), ('::', 69), (',', 67), ('i', 66), ('Why', 63), ("You're", 63), ('Go', 62), ('The', 51), ('Your', 50), ("I'm", 46), ('Oh', 44), ('If', 41), ('and', 40), ('hey', 40), ('And', 39), ('Stop', 39), (':I', 35), (':::', 33), ('How', 33), ('Are', 33), ('Please', 31)

<strong>The below code chunk does the following</strong>
<ul><li>Get the comments from the dataframe where aggression column is True using dataframe.query into the list 'str_list'</li>
    <li>Define the list that contains some identity words as 'identity_list'</li>
    <li>Define the list that contains some words pertaining to hatespeech as 'hatespeech_list'</li>
    <li>Call the function identityInComment by passing in the arguments str_list and identity_list and get the values for count of that appeared in these many number of comments and also the actual identity words</li>
    <li>Call the function identityInComment by passing in the arguments str_list and hatespeech_list and get the values for count of hatespeech words that appeared in these many number of comments and also the actual hatespeech words</li>
</ul>

In [237]:
str_list = aggression_comments.query('aggression')['comment']
identity_list = ['black', 'muslim', 'feminist', 'woman', 'gay','deaf','blind','sexuality']
hatespeech_list = ['kill','hate','loser','destroy','attack'] 
print()
count_wrd,count_comment,lst = identityInComment(str_list,identity_list) 
print("Number of times identity words appeared in comments that are perceived as aggressive ==>", count_wrd , " identity words in" , count_comment, " comments")
print("Identity words that appeared in the list: ",lst)
print()
count_wrd_hate,count_comment_hate,lst = identityInComment(str_list,hatespeech_list) 
print("Number of times hate-speech words appeared in comments that are perceived as aggressive ==>", count_wrd_hate , " hateful words in" , count_comment_hate, " comments")
print("Hate-speech words that appeared in the list: ",lst)
print()
topk_occur = kFreqWord(str_list,50)
print("Most frequently occuring words in comments: ",topk_occur) 


Number of times identity words appeared in comments that are perceived as aggressive ==> 712  identity words in 14782  comments
Identity words that appeared in the list:  ['deaf', 'blind', 'muslim', 'gay', 'black', 'woman', 'sexuality', 'feminist']

Number of times hate-speech words appeared in comments that are perceived as aggressive ==> 1841  hateful words in 14782  comments
Hate-speech words that appeared in the list:  ['destroy', 'loser', 'kill', 'hate', 'attack']

Most frequently occuring words in comments:  [('==', 3705), ('`', 1889), ('I', 452), ('You', 321), ('Fuck', 120), ('you', 114), ('YOU', 91), ('This', 81), ('fuck', 77), ('::', 76), ('What', 75), ('FUCK', 74), (':', 73), (',', 71), ('.', 71), ('Why', 70), ('Hey', 67), ('i', 63), ("You're", 61), ('Go', 61), ('Your', 55), ('The', 46), ('hey', 42), ('Oh', 42), ("I'm", 41), ('Please', 40), ('Are', 37), ('And', 36), ('How', 36), ('Stop', 35), ('and', 35), ('If', 35), (':::', 32), ('Who', 30), ('*', 29), ('==You', 28), ('Do',

<h3>Are certain words more likely to be associated with comments labelled as hostile speech? Are there certain words that are frequently associated with one type of hostile speech (like “personal attacks”) but not others (like “toxicity”)?</h3>

<p>Frequently occuring top 50 words do not contain hate speech except a couple. This implies that the occurance of hate-speech words is very less frequent as compared to non-hate speech words. But within the words flagged for hate-speech, Identity terms such as 'deaf', 'blind', 'muslim', 'gay', 'black', 'woman', 'sexuality', 'feminist' appear multiple times.</p>

<p>The words <strong>'Asian', 'American'</strong> are identity terms associated with personal attacks and toxicity, but not aggression. As  <a href='https://cdt.org/wp-content/uploads/2017/12/FAT-conference-draft-2018.pdf'>article</a> points out, there is very low agreement between coders’ annotations of text as hate speech. There are very few details in the hate speech detection literature about how texts have been annotated, which makes it difficult to evaluate how error or bias may be occurring. Often, context and minor semantic differences separate hate speech from benign speech. We need clear, consistent definitions of the type of speech to be identified.Translating an abstract definition into a clearer and more concrete one can make annotation easier, but machines trained on these narrow definitions may miss some of the targeted speech, may be easier to evade, and may  disproportionately target one or more subtypes of the targeted speech</p>


<p><li><strong>Unintended bias </strong> It can be observed that the frequently targeted groups, represented by the identity words such as “black”, “muslim”, “feminist”, “woman”, “gay” etc, are over-represented in abusive and toxic comments. This implies the training data used to train machines exhibit the same trend. When the training data used to train machine learning models contain these comments, ML models adopt the biases that exist in these underlying distributions.These identity terms of targeted groups appear far more often in abusive comments. It is much rarer for these words to appear in a positive, affirming statements.</li></p>

<p><li><strong>False positives: </strong> Flagging identity terms as hate-speech results in False Positives.There is little agreement on what actually constitutes hate speech. Translating an abstract definition into a clearer and more concrete one can make annotation easier, but doing so comes with its own risks. Tools that rely on narrow definitions will miss some of the targeted speech, may be easier to evade, and may be more likely to disproportionately target one or more subtypes
of the targeted speech. The general rule that false negatives and false positives should be balanced. However, this
assumption ignores the particular stakes of decisions that affect a person’s human rights, liberty interests,
or access to benefits </li></p>



<h3>Are these words representative of words that you would associate with hostile speech? Do you think these frequently labelled words are a good representation of hostile speech in online discussions outside of Wikipedia? Of offline discussions? Why or why not?</h3>

<ul><li>It can be observed that the frequently targeted groups, represented by the identity words such as “black”, “muslim”, “feminist”, “woman”, “gay” etc, are over-represented in abusive and toxic comments. These words are not representative of words that you would associate with hostile speech. This implies the training data used to train machines exhibit the same trend. When the training data used to train machine learning models contain these comments, ML models adopt the biases that exist in these underlying distributions.These identity terms of targeted groups appear far more often in abusive comments. It is much rarer for these words to appear in a positive, affirming statements.</li>

<p><li>The frequently labelled words are definitely not a good representation of hostile speech outside Wikipedia. The raw data used for training models is very limited and is not representative of all the words that can be used to express hatefulness. White supremacists have also used innocuous terms, including the names of companies (“Google,” “Skype,” and “Yahoo”) as stand-ins for racial and ethnic slurs according to the <a href='https://cdt.org/wp-content/uploads/2017/12/FAT-conference-draft-2018.pdf'>article</a>. Users seeking to convey hateful messages could quickly adapt and begin using different novel terms and phrases. Non-English languages are underrepresented and have lower accuracy as they are not well represented on the internet, since the models have fewer examples of those languages to learn from. So, outside Wikipedia which supports very few languages,these words are not a good representation. Also, models have been trained based on wikipedia corpus which mostly supports informative topics, so outside this domain, the model may not perform well.</li></ul></p>

<h3> Other sources of bias</h3>

<p><li><strong>Implicit or Experimenter's bias: </strong>For training classifiers, we need to create a corpus that contains a sufficient number and variety of examples of personal attacks. In order to ensure representativeness and overall prevalence of personal attack comments, comments are randomly sampled from the full corpus as well as from the blocked dataset that contains comments made by users who were blocked for violating Wikipedia’s policy on personal attacks. But this could be linked to Experimenter's bias because here the experimenter assumes that the comments from blocked dataset are indeed attacks. But thinking deeply, the comments from blocked dataset could also contain biased data. Automation tools could have scored the comments in the blocked dataset and induced some sort bias into the data. Besides this, even if the comments are attacks, they need not be personal attacks. The overall prevalence of personal attacks in the subset of corpus sampled randomly will be under-represented in the sample data. </li></p>

<p><li><strong>Selection bias and false positives: </strong>This type of bias occurs when a model itself influences the generation of data that is used to train it. Blocked dataset contains comments made by users who were blocked for violating Wikipedia’s policy on personal attacks, but some of these comments could have been wrongly scored by automation tools based on Machine's learning that has induced biases and are just false positives. The below counts indicate that not all comments from the blocked dataset would necessarily be attacks. The count says 65126 comments from the blocked dataset may not be attacks at all in the first place. </li></p></ul>

In [None]:
df1= personal_attacks_comments
print("# records from blocked dataset considered an attack: ",len(df1[np.logical_and(df1['sample'] == 'blocked',df1['attack'] == True)]))
print("# records from blocked dataset considered not an attack: ",len(df1[np.logical_and(df1['sample'] == 'blocked',df1['attack'] == False)]))

In [None]:
df1= toxicity_comments
print("# records from blocked dataset considered toxic: ",len(df1[np.logical_and(df1['sample'] == 'blocked',df1['toxicity'] == True)]))
print("# records from blocked dataset considered not toxic: ",len(df1[np.logical_and(df1['sample'] == 'blocked',df1['toxicity'] == False)]))

In [None]:
df1= aggression_comments
print("# records from blocked dataset considered aggressive: ",len(df1[np.logical_and(df1['sample'] == 'blocked',df1['aggression'] == True)]))
print("# records from blocked dataset considered not aggressive: ",len(df1[np.logical_and(df1['sample'] == 'blocked',df1['aggression'] == False)]))