### Perspective API Exploration

First, we have a dataset of Wikipedia comments made available by Jigsaw, a subsidiary of Google that created the Perspective tool. The dataset includes a unique comment id, the text of the comment, and a series of binary labels applied by human raters: "toxic," "severe_toxic," "obscene," "threat," "insult," and "identity_hate." I have appended the "score" column, which represents the toxicity score assigned to the comment text by the live version of the Perspective API. The data is available under a CC0 license.

In [1]:
import pandas as pd
import time
import re
import numpy as np
import itertools
from itertools import compress

df = pd.read_csv('labeled_and_scored_comments.csv')

In [2]:
df.sort_values(['score'])

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
1158,61a85266de70cca0,Можешь говорить по русски.,0.000004,0,0,0,0,0,0
5246,030a9ba0b32b4456,Rodzina Krusenstern Potomkowie Admirała Adama ...,0.000009,0,0,0,0,0,0
11040,a6814193ac6b4705,The opening paragraph states that the game is ...,0.000091,0,0,0,0,0,0
24192,f0d6dee96e057a5b,The book is used as a source in a number of ar...,0.000116,0,0,0,0,0,0
2332,cbb9268d3c7429c1,Vals Vienés \n |Rumba \n |Cha Cha Chá(ciclo 7),0.000131,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
19994,1429a2c930688317,you fucking piece of shit,0.997278,1,1,1,0,1,0
28728,d82e45eb103ad07b,fuck off you stupid aspy asshole,0.997736,1,1,1,0,1,0
25535,e9b92c6c6b01aeef,FUCK YOU YOU PIECE OF SHIT \n FUCK YOU YOU PIE...,0.997982,1,1,1,0,1,0
25945,e783fd267f3a9d3b,FUCK WIKIPEDIA ON WHEELS! \n\nFuck off wikiped...,0.998136,1,1,1,0,1,0


I've also included a function to make calls to the Perspective API for your own testing. You will need to generate your own API key according to the instructions in the assignment.

In [2]:
from googleapiclient.discovery import build
import json

def get_toxicity_score(comment):
    
  API_KEY = 'AIzaSyD9vTwnb83jjlfKlcr9foDBp3ClFtH81_A' # Put your API key here
    
  client = build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
  )

  analyze_request = {
  'comment': { 'text': comment },
  'requestedAttributes': {'TOXICITY': {}}
  }
    
  response = client.comments().analyze(body=analyze_request).execute()
  toxicity_score = response["attributeScores"]["TOXICITY"]["summaryScore"]["value"]
    
  return toxicity_score

We can call this function with original comments like this:

In [4]:
get_toxicity_score("I love you")

0.05566647

Due to the constraint of one request per second, we should insert a time.sleep() statement between consecutive API calls, which will pause execution for one second.

In [6]:
comment_list = ['hello', 'how are you', 'fine thanks']

for comment in comment_list:
    score = get_toxicity_score(comment)
    print(comment, score)
    time.sleep(1)

hello 0.054463096
how are you 0.07340407
fine thanks 0.031898104


### Exploring the unlabeled data 

We can observe that the comments with the highest score contain some identificable words as 'fuck':

In [8]:
df.sort_values(['score']).tail(10)

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
22535,f9634e62820eed00,"FUCK YOU!!!!=\n FUCK YOU, YOU STUPID BASTARD...",0.996251,1,1,1,0,1,0
33772,bd4b78896967f1e5,"Hello ScrapIronIV, where is you problem you mo...",0.996289,1,1,1,0,1,0
18792,26fed2949c2dadc1,you son of a bitch RIP you motherfucker,0.996669,1,0,1,0,1,0
35203,b567469c10620312,FUCK YOU FUCK YOU FUCK YOUa,0.996707,1,1,1,0,1,0
6964,e88321c89a619419,Stupid fucking bitch rot in hell piece of shit,0.996872,1,1,1,0,1,0
19994,1429a2c930688317,you fucking piece of shit,0.997278,1,1,1,0,1,0
28728,d82e45eb103ad07b,fuck off you stupid aspy asshole,0.997736,1,1,1,0,1,0
25535,e9b92c6c6b01aeef,FUCK YOU YOU PIECE OF SHIT \n FUCK YOU YOU PIE...,0.997982,1,1,1,0,1,0
25945,e783fd267f3a9d3b,FUCK WIKIPEDIA ON WHEELS! \n\nFuck off wikiped...,0.998136,1,1,1,0,1,0
17833,364b6c24210e2345,Fuck you you fucking pig!\n\nYou motherfucking...,0.998329,1,0,1,0,1,0


Let's going to explore how the comments without labels but with high scores look like:

In [3]:
df_no_label = df[(df['toxic'] == 0) & (df['severe_toxic'] == 0) & (df['obscene'] == 0) & (df['threat'] == 0) & (df['insult'] == 0) & (df['identity_hate'] == 0)]
df_no_label.sort_values(['score']).tail(10)

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
16945,4459bdf71635b1c3,LOL PENIS! \n\nLOL!\n\nI AM TALKING TO YOU SO ...,0.948086,0,0,0,0,0,0
520,281a24af9612bca8,mattythewhite is a prick.,0.948086,0,0,0,0,0,0
1359,71b808d9d85cd9c0,""" \n\n == Propose rename to """"Pretentious assh...",0.948338,0,0,0,0,0,0
18710,286c1d0f6a3e9e85,Khurki is a dumb Assk Jeeves?,0.949928,0,0,0,0,0,0
40576,988d4bc7265812f7,I'm going to go with both but I could be wrong...,0.949928,0,0,0,0,0,0
3075,9ed499e6a13328e2,jesus is completely fake plz go away fags,0.955174,0,0,0,0,0,0
2426,d72fbec157ade08e,Shout out to https://en.wikipedia.org/wiki/Use...,0.959467,0,0,0,0,0,0
4436,59581cdba9297d11,== YOU suck == \n YOU SUCK ~~~~Insert non-form...,0.967137,0,0,0,0,0,0
3884,e4c69d7040f12681,"""Whatever everyone knowz """"B"""" is his ONE&ONLY...",0.973927,0,0,0,0,0,0
1057,56902f9d41d0696c,fuck y'all all of yall,0.977703,0,0,0,0,0,0


In [62]:
with pd.option_context('display.max_colwidth', None):
  display(df['comment_text'][(df['id'] == '988d4bc7265812f7')])


40576    I'm going to go with both but I could be wrong.\nPeople like you are the one's you allow 32 link-backs into Chavez's site and turning into a roving propaganda mobile that even Jimbo Whales was ashamed of.\nDude, look at the number of edits you have. You're a fucking capitol loser, there's nothing else to say.
Name: comment_text, dtype: object

It's possible to identify some toxic comments that were not labeled, but were accuratelly scored by the algorithm. To explore this in detail, we can analyze a subdata set with the toxic comments (we choose a threshold of 0.6 to subset by the column *score* and get 1490 comments from the original data) and make a new auxiliar Data Frame with the words that receive a score over 0.5 (in order to see some examples of the words the algorithm is reacting to).

In the auxiliar Data Frame, we decide to save the score and the word when it get's a score over 0.5, but also the *id* of the comment where that word appears.

In [67]:
%%time
aux = pd.DataFrame(columns=['id','word', 'score'])

for ident in df_no_label['id'][(df['score'] > 0.6)]:
    for comment in df_no_label['comment_text'][(df['id'] == ident)]:
        word_list = re.findall(r'\w+', comment)
        for word in word_list:
            if len(word)>3:
                try:
                    score = get_toxicity_score(word)
                    if score > 0.5:
                        to_append = [ident,word, score]
                        i=len(aux)+1
                        aux.loc[i]=to_append
                    time.sleep(1)
                except:
                    pass
print(aux)
aux.to_csv(r'C:\Users\62san\Documents\HumanCentDC\wordsWoLabel.csv', index=False,header=True )
#HERE
            

                   id        word     score
1    0002f87b16116a7f        crap  0.835521
2    001068b809feee6b       bitch  0.970205
3    001068b809feee6b     bitchin  0.853505
4    001068b809feee6b       bitch  0.970205
5    001068b809feee6b       bitch  0.970205
..                ...         ...       ...
757  95de918f08e76af7        kill  0.806061
758  954a400ad9b1d60b       idiot  0.950550
759  954a400ad9b1d60b  homoerotic  0.601283
760  954a400ad9b1d60b  homoerotic  0.601282
761  954a400ad9b1d60b  homoerotic  0.601282

[761 rows x 3 columns]
CPU times: total: 1h 39min 23s
Wall time: 4h 44min 54s


In [48]:
len(df_no_label['comment_text'][(df['score'] > 0.6)])

1490

We can read and save the new data frame using the following variable:

In [4]:
words_no_label = pd.read_csv('wordsWoLabel.csv')

We can see clearly that some of the comments without labels have insults, so the labeler made mistakes.

In [38]:
words_no_label.sort_values(['score'],ascending=False). head()

Unnamed: 0,id,word,score
201,df4bcee7d70a0aaf,motherfucker,0.9935
42,3b86539670d4da49,motherfucker,0.9935
17,1a63a8cf1d674778,fucked,0.980861
674,bf78970ee3f5a620,fucked,0.980861
360,f1ac66faba2e5bae,fucked,0.980861


### Cleaning the data 

As we noticed, there are some toxic comments that did not received any label. To improve the accuracy of this analysis, some cleaning over these comments has to be done. The first thing we need to do is to decide a threshold to select which data of the df_no_label should be ignored to test the toxicity algorithm. We are going to chose 0.84 to include words like 'dumb', but to ignore 'homosexual' and 'kill'.

In [6]:
comment_list = ['idiot','dumb', 'penis', ' homosexual', 'kill']

for comment in comment_list:
    score = get_toxicity_score(comment)
    print(comment, score)
    time.sleep(1)

idiot 0.9505499
dumb 0.84548205
penis 0.89693767
 homosexual 0.642407
kill 0.8060606


We can take a look of the words, most of them insults, that received an score over 0.84:

In [10]:
words_no_label['word'][(words_no_label['score'] > 0.84)].unique()

array(['bitch', 'bitchin', 'bitches', 'stupidity', 'stupid', 'bullshit',
       'fucked', 'morons', 'cocks', 'fucks', 'fucking', 'Anus', 'vaginas',
       'DICKS', 'idiot', 'nigger', 'motherfucker', 'faggot', 'Fools',
       'fool', 'Cock', 'asshole', 'moronic', 'dick', 'penis', 'sucks',
       'loser', 'dumb', 'shit', 'Fuck', 'fuck', 'Bullshit', 'jackass',
       'jerk', 'fuckin', 'Dumbass', 'suck', 'Idiotic', 'COCK', 'Anal',
       'assholes', 'shits', 'Fucking', 'idiotic', 'anal', 'idiots',
       'Bitch', 'goddamn', 'Cunt', 'SHIT', 'whore', 'sucking', 'bitchy',
       'Assholes', 'Stupid', 'dammit', 'Crap', 'FUCK', 'Ignorant',
       'scumbag', 'moron', 'imbeciles', 'ASSYRIAN', 'pigfucks', 'blowjob',
       'jerks', 'scum', 'faggotMONGO', 'STUPID', 'anus', 'Shit', 'idiocy',
       'douchebag', 'Dickheads', 'shithead', 'MassiveFaggotHater',
       'Shitty', 'shitgirl', 'cunt', 'Nigger', 'Stupidity'], dtype=object)

We can get the *id* of the comments without labels that contains the words in *words_no_label* and we would ignore them in this test by doing the following:

In [5]:
to_substract = words_no_label['id'][(words_no_label['score'] > 0.84)]
to_substract = to_substract.unique()
len(to_substract)

226

In [29]:
len(df)

41338

Finally, we have a data frame called *data* to test the toxicity algorithm.

In [6]:
data = df[~df.id.isin(to_substract)]
data.head()

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0001ea8717f6de06,Thank you for understanding. I think very high...,0.075638,0,0,0,0,0,0
1,000247e83dcc1211,:Dear god this site is horrible.,0.450459,0,0,0,0,0,0
2,0002f87b16116a7f,"""::: Somebody will invariably try to add Relig...",0.667964,0,0,0,0,0,0
3,0003e1cccfd5a40a,""" \n\n It says it right there that it IS a typ...",0.068434,0,0,0,0,0,0
4,00059ace3e3e9a53,""" \n\n == Before adding a new product to the l...",0.151724,0,0,0,0,0,0


In [31]:
len(data)

41112

### Toxic comments

We can notice that there are 'toxic' comments that are not 'sever_toxic', but not the opposite.

In [117]:
len(data[(data['toxic']>0) & (data['severe_toxic']==0)])

3564

In [39]:
len(data[(data['toxic']==0) & (data['severe_toxic']>0)])

0

To explore all the possible combinations of the labeled data, we are going to create a new column *cat* in our Data Frame *data* to have all the information of the six categories of toxic comments together  in a singular string per row:

In [7]:
data['cat'] = data['toxic'].apply(str) + data['severe_toxic'].apply(str) + data['obscene'].apply(str) + data['threat'].apply(str) + data['insult'].apply(str) + data['identity_hate'].apply(str) 
data.tail()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['cat'] = data['toxic'].apply(str) + data['severe_toxic'].apply(str) + data['obscene'].apply(str) + data['threat'].apply(str) + data['insult'].apply(str) + data['identity_hate'].apply(str)


Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate,cat
41333,9480bff99f91a69e,Lesbian Teenager \nHow about the fact she was ...,0.797672,1,0,0,0,0,0,100000
41334,9480115b93762c35,"Crack, also called cocaine is very good for yo...",0.240977,0,0,0,0,0,0,0
41335,947f206c14a85d9d,REDIRECT Talk:The Test (The O.C.),0.012671,0,0,0,0,0,0,0
41336,947e3f5f2d613605,A question concerning your edit to my user-pag...,0.227474,0,0,0,0,0,0,0
41337,94768f82798de227,Cited sources? I looked in Category:Abnormal p...,0.341041,0,0,0,0,0,0,0


We have 34 unique combinations of categories:

In [123]:
len(data['cat']. unique())

34

In [122]:
data['cat']. unique()

array(['000000', '100000', '101011', '101010', '101000', '101111',
       '000001', '100011', '000010', '100010', '111010', '111011',
       '111111', '100100', '101001', '111110', '100001', '001010',
       '001000', '101110', '111000', '100110', '101100', '000011',
       '110000', '100101', '110011', '001011', '001001', '000100',
       '110010', '110101', '110100', '000110'], dtype=object)

### Analysis

To start our analysis of the 34 possible combinations of the labels, we can create a new Data Frame, with the mean score per category using groupby:

In [8]:
df_analysis = pd.DataFrame(data.groupby('cat')['score'].mean())
df_analysis.head(3)

Unnamed: 0_level_0,score
cat,Unnamed: 1_level_1
0,0.175321
1,0.670554
10,0.690323


We can add more information to that Data Frame, for example, the quantity of unique comments that the algorithm recognizes as toxic for a certain threshold. In the next cell we are counting how many comments (using their id) in each category received a score higher than *j* wich is a float that takes values from 0 to 1 with steps of 0.05.

In [9]:
a= 0
b= 1
c= 0.05

for j in np.arange(a,b,c):
    df_analysis = pd.merge(df_analysis,data[(data['score']>j)].groupby('cat', as_index=False)['id'].nunique(), how='left', on = 'cat', suffixes=['_thr_' + "{:.2f}".format(j-c), '_thr_' + "{:.2f}".format(j)])

df_analysis.head(3)

Unnamed: 0,cat,score,id_thr_0.00,id_thr_0.05,id_thr_0.10,id_thr_0.15,id_thr_0.20,id_thr_0.25,id_thr_0.30,id_thr_0.35,...,id_thr_0.50,id_thr_0.55,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95
0,0,0.175321,36934,33640,20628,14308,9915,7811,6385,4833,...,2281,1690,1264.0,941.0,613.0,391.0,232.0,108.0,30.0,4.0
1,1,0.670554,9,9,9,9,9,9,9,9,...,8,7,6.0,6.0,5.0,3.0,2.0,,,
2,10,0.690323,79,79,79,79,79,77,76,75,...,68,63,57.0,51.0,40.0,32.0,24.0,16.0,6.0,


We can add to the same Data Frame the count of unique comments per category and rename the columns to make it easier to understan:

In [10]:
df_analysis=pd.merge(df_analysis,data.groupby('cat', as_index=False)['id'].nunique(), on=['cat'], how='inner')

In [11]:
df_analysis = df_analysis.rename({'score': 'mean_score', 'id': 'unique_id_count'}, axis=1)
df_analysis.head()

Unnamed: 0,cat,mean_score,id_thr_0.00,id_thr_0.05,id_thr_0.10,id_thr_0.15,id_thr_0.20,id_thr_0.25,id_thr_0.30,id_thr_0.35,...,id_thr_0.55,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95,unique_id_count
0,0,0.175321,36934,33640,20628,14308,9915,7811,6385,4833,...,1690,1264.0,941.0,613.0,391.0,232.0,108.0,30.0,4.0,36934
1,1,0.670554,9,9,9,9,9,9,9,9,...,7,6.0,6.0,5.0,3.0,2.0,,,,9
2,10,0.690323,79,79,79,79,79,77,76,75,...,63,57.0,51.0,40.0,32.0,24.0,16.0,6.0,,79
3,11,0.715606,7,7,7,7,7,7,7,7,...,6,6.0,5.0,4.0,3.0,2.0,1.0,1.0,,7
4,100,0.612155,7,7,7,7,7,7,6,6,...,4,4.0,4.0,4.0,2.0,1.0,,,,7


It is important to notice that some comments in a certain category did not received scores higher than certain number and this fact can provide us important information, so, we need to pay attention to the NAN values in our Data Frame. To do this, we can create a new column *Total_NA* to count this values per category:

In [12]:
df_analysis['Total_NA'] = len(df_analysis.columns) -df_analysis.count(axis=1) 
df_analysis.head()

Unnamed: 0,cat,mean_score,id_thr_0.00,id_thr_0.05,id_thr_0.10,id_thr_0.15,id_thr_0.20,id_thr_0.25,id_thr_0.30,id_thr_0.35,...,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95,unique_id_count,Total_NA
0,0,0.175321,36934,33640,20628,14308,9915,7811,6385,4833,...,1264.0,941.0,613.0,391.0,232.0,108.0,30.0,4.0,36934,0
1,1,0.670554,9,9,9,9,9,9,9,9,...,6.0,6.0,5.0,3.0,2.0,,,,9,3
2,10,0.690323,79,79,79,79,79,77,76,75,...,57.0,51.0,40.0,32.0,24.0,16.0,6.0,,79,1
3,11,0.715606,7,7,7,7,7,7,7,7,...,6.0,5.0,4.0,3.0,2.0,1.0,1.0,,7,1
4,100,0.612155,7,7,7,7,7,7,6,6,...,4.0,4.0,4.0,2.0,1.0,,,,7,3


#### Data with NA values

We are going to create a new Data Frame with the categories that have NAN values:

In [13]:
df_NA = df_analysis[(df_analysis['Total_NA'] >0)]

We can change the order of the columns, replace NANs values with 0 and sort them using the mean score:

In [14]:
cols = df_analysis.columns.tolist()
cols = cols[:2] + cols[-1:] + cols[-2:-1] + cols[14:-2]
df_NA= df_NA[cols].fillna(0)
df_NA.sort_values(['mean_score'])

Unnamed: 0,cat,mean_score,Total_NA,unique_id_count,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95
5,110,0.587106,8,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,100,0.612155,3,7,4.0,4.0,4.0,2.0,1.0,0.0,0.0,0.0
15,100101,0.618837,1,2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,1,0.670554,3,9,6.0,6.0,5.0,3.0,2.0,0.0,0.0,0.0
2,10,0.690323,1,79,57.0,51.0,40.0,32.0,24.0,16.0,6.0,0.0
3,11,0.715606,1,7,6.0,5.0,4.0,3.0,2.0,1.0,1.0,0.0
6,1000,0.725935,1,81,68.0,60.0,53.0,46.0,29.0,17.0,6.0,0.0
11,100001,0.782687,1,40,38.0,37.0,31.0,24.0,19.0,10.0,5.0,0.0
13,100011,0.822567,1,49,47.0,45.0,44.0,37.0,30.0,25.0,17.0,0.0
9,1011,0.8324,1,7,7.0,7.0,6.0,6.0,5.0,4.0,2.0,0.0


To make it easier to watch, we can get the proportion of comments upper each score by doing the following:

In [15]:
for column in df_NA.columns[4:]:
    df_NA[column] =  df_NA[column]/ df_NA['unique_id_count']
df_NA.head(3)   

Unnamed: 0,cat,mean_score,Total_NA,unique_id_count,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95
1,1,0.670554,3,9,0.666667,0.666667,0.555556,0.333333,0.222222,0.0,0.0,0.0
2,10,0.690323,1,79,0.721519,0.64557,0.506329,0.405063,0.303797,0.202532,0.075949,0.0
3,11,0.715606,1,7,0.857143,0.714286,0.571429,0.428571,0.285714,0.142857,0.142857,0.0


Finally, let's going to add a new columns called *category* that contains the full string of each label in the original Data Frame.
To make it easier to read, we are ordering the categories (rows) per their mean score:

In [16]:
columns = df.columns.tolist()
columns = columns[3:]

aux_list = []
for index, value in df_NA['cat'].items():
    aux_list.append('_'.join(list(compress(columns,list(map(int, value))))))
    
df_NA['category'] = aux_list

df_NA.sort_values(['mean_score'])

Unnamed: 0,cat,mean_score,Total_NA,unique_id_count,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95,category
5,110,0.587106,8,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,threat_insult
4,100,0.612155,3,7,0.571429,0.571429,0.571429,0.285714,0.142857,0.0,0.0,0.0,threat
15,100101,0.618837,1,2,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,toxic_threat_identity_hate
1,1,0.670554,3,9,0.666667,0.666667,0.555556,0.333333,0.222222,0.0,0.0,0.0,identity_hate
2,10,0.690323,1,79,0.721519,0.64557,0.506329,0.405063,0.303797,0.202532,0.075949,0.0,insult
3,11,0.715606,1,7,0.857143,0.714286,0.571429,0.428571,0.285714,0.142857,0.142857,0.0,insult_identity_hate
6,1000,0.725935,1,81,0.839506,0.740741,0.654321,0.567901,0.358025,0.209877,0.074074,0.0,obscene
11,100001,0.782687,1,40,0.95,0.925,0.775,0.6,0.475,0.25,0.125,0.0,toxic_identity_hate
13,100011,0.822567,1,49,0.959184,0.918367,0.897959,0.755102,0.612245,0.510204,0.346939,0.0,toxic_insult_identity_hate
9,1011,0.8324,1,7,1.0,1.0,0.857143,0.857143,0.714286,0.571429,0.285714,0.0,obscene_insult_identity_hate


In general what we can observe from this Data Frame is that the Perspective API is more likely to give a lower score to comments that include statements of identity hate.

Let's going to show the two comments in the category 'toxic_threat_identity_hate' what is the first category (with the lowest mean score) in this Data Frame that relates identity hate and toxicity:

In [25]:
with pd.option_context('display.max_colwidth', None):
  display(data[['comment_text', 'score']][(data['cat'] == '100101')])

Unnamed: 0,comment_text,score
8031,of killing all the jews,0.926779
31352,Mmmm... yes... a pint of palestinian blood would be a fine dessert right now. - talk/email,0.310894


Most of us can agree that the second comment, about 'palestinian blood' should have received a similar score than the comment about jews, because the intention of those two comments was almost the same, but that's something that the API Perspective is missing. Something that we expected for this algorithm is that the word 'blood' would receive a lower score than 'killing', but we can notice that the algorithm gives a higher score to the word 'jews' than the score given to the word 'palestinian'. This fact is a hint about certain bias that might influence the algorithm.

We can suppose that the data used to train this algorithm has a lot more of toxic comments against jews than agains palestinians. The API Perspective reacted almost the same when we used the word 'killing' followed by both 'jews' and 'palestinian' (the difference between these scores was almost 0.05, that's still a significant difference!). But when we compared 'palestinian blood' vs 'jewish blood' we got a difference in the scores of almost 0.24! 

In [33]:
comment_list = ['killing', 'jews', 'killing jews', 'killing palestinian', 'palestinian', 'blood', 'palestinian blood', 'jewish blood']

for comment in comment_list:
    score = get_toxicity_score(comment)
    print(comment, score)
    time.sleep(.25)

killing 0.6350774
jews 0.31089434
killing jews 0.89104855
killing palestinian 0.8594624
palestinian 0.105334
blood 0.31089434
palestinian blood 0.36487597
jewish blood 0.60506815


We can infer that this bias is educational and related to the home country of the researches and people involved in the design and training of the API Perspective algorithm. Historically, jews were involved in violent conflicts and suffered discrimination in some european countries, but Israel had a conflict with the Unated States.

#### Data with scores higher than 0.9

Let's going to create a Data Frame wiith the categories that does not have any NAN value, in other words, the categories with scores higher than 0.9: 

In [17]:
df_fullDet = df_analysis[(df_analysis['Total_NA'] == 0)] 
df_fullDet.head(3)

Unnamed: 0,cat,mean_score,id_thr_0.00,id_thr_0.05,id_thr_0.10,id_thr_0.15,id_thr_0.20,id_thr_0.25,id_thr_0.30,id_thr_0.35,...,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95,unique_id_count,Total_NA
0,0,0.175321,36934,33640,20628,14308,9915,7811,6385,4833,...,1264.0,941.0,613.0,391.0,232.0,108.0,30.0,4.0,36934,0
8,1010,0.829294,42,42,42,42,42,42,42,42,...,39.0,38.0,37.0,34.0,29.0,24.0,11.0,5.0,42,0
10,100000,0.700379,1419,1419,1416,1412,1403,1380,1364,1333,...,1040.0,952.0,828.0,699.0,544.0,358.0,152.0,29.0,1419,0


We can repeat some of the steps we did with the previous data to make this Data Frame easier to understan. We can start by dividing the thresholds columns by the unique_id_count to see the proportions:

In [18]:
for column in df_fullDet.columns[2:-2]:
    df_fullDet[column] =  df_fullDet[column]/ df_fullDet['unique_id_count']
df_fullDet.head(3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fullDet[column] =  df_fullDet[column]/ df_fullDet['unique_id_count']


Unnamed: 0,cat,mean_score,id_thr_0.00,id_thr_0.05,id_thr_0.10,id_thr_0.15,id_thr_0.20,id_thr_0.25,id_thr_0.30,id_thr_0.35,...,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95,unique_id_count,Total_NA
0,0,0.175321,1.0,0.910814,0.55851,0.387394,0.268452,0.211485,0.172876,0.130855,...,0.034223,0.025478,0.016597,0.010586,0.006281,0.002924,0.000812,0.000108,36934,0
8,1010,0.829294,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.928571,0.904762,0.880952,0.809524,0.690476,0.571429,0.261905,0.119048,42,0
10,100000,0.700379,1.0,1.0,0.997886,0.995067,0.988724,0.972516,0.96124,0.939394,...,0.732911,0.670895,0.58351,0.4926,0.383369,0.25229,0.107118,0.020437,1419,0


Then we can select some of the scores in wich we are interested, for example, scores over 0.55 and add the column with the full categorie name:

In [19]:
cols = df_fullDet.columns.tolist()
cols = cols[:2]  + cols[-2:-1] + cols[13:-2]
df_Det = df_fullDet[cols]


aux_list = []
for index, value in df_Det['cat'].items():
    aux_list.append('_'.join(list(compress(columns,list(map(int, value))))))
    
df_Det['category'] = aux_list

df_Det.sort_values(['mean_score'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_Det['category'] = aux_list


Unnamed: 0,cat,mean_score,unique_id_count,id_thr_0.55,id_thr_0.60,id_thr_0.65,id_thr_0.70,id_thr_0.75,id_thr_0.80,id_thr_0.85,id_thr_0.90,id_thr_0.95,category
0,0,0.175321,36934,0.045757,0.034223,0.025478,0.016597,0.010586,0.006281,0.002924,0.000812,0.000108,
10,100000,0.700379,1419,0.786469,0.732911,0.670895,0.58351,0.4926,0.383369,0.25229,0.107118,0.020437,toxic
8,1010,0.829294,42,0.952381,0.928571,0.904762,0.880952,0.809524,0.690476,0.571429,0.261905,0.119048,obscene_insult
14,100100,0.831654,30,0.9,0.9,0.9,0.9,0.8,0.666667,0.533333,0.433333,0.1,toxic_threat
12,100010,0.840549,319,0.965517,0.949843,0.9279,0.884013,0.827586,0.758621,0.589342,0.357367,0.07837,toxic_insult
16,100110,0.851562,4,1.0,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.25,toxic_threat_insult
17,101000,0.877162,471,0.989384,0.978769,0.968153,0.940552,0.904459,0.842887,0.721868,0.526539,0.159236,toxic_obscene
24,110000,0.881349,11,0.909091,0.909091,0.909091,0.909091,0.818182,0.818182,0.818182,0.727273,0.363636,toxic_severe_toxic
18,101001,0.904922,5,1.0,1.0,1.0,1.0,1.0,0.8,0.8,0.6,0.2,toxic_obscene_identity_hate
19,101010,0.928619,997,0.994985,0.992979,0.990973,0.97994,0.967904,0.943831,0.901705,0.776329,0.467402,toxic_obscene_insult


There's a category that relates toxic_obscene_identity_hate labels, let's going to explore the comments to get some ideas of other possible sources of bias:

In [39]:
with pd.option_context('display.max_colwidth', None):
  display(data[['comment_text', 'score']][(data['cat'] == '101001')])

Unnamed: 0,comment_text,score
417,YAY FAGGOTS FAAAAGGGGOOOTSSS ARE AWESOME YAY YAY YAY FAGOT FAGOT FAGOT AWESOME FUN YAY YAY FOR FAGGOTS! YAY YAY YAY!,0.932329
12179,".\nSand-nigger, referring to the fact that they live in the desert",0.795681
21176,even if we hate furfags and llolis and kid fuckers,0.930538
23117,"We should point out her filthy political philosophy\n\nWho gives a damn if she's shagging some toyboy? We should be pointing out that she believes homosexuals deserve instantaneous and painful death, much like in Uganda.",0.892208
37816,IRISH BIGOT PROTESTANT BASTARD...WE HAVE OUR EYE ON YOU...YOU'RE NOTHING BUT A CUNT..A BIGOT DISCRIMINATING GAY BASTARD...GO FUCK YOURSELF IN THE EAR..BIGOT PROTESTANT ORANGE CUNT,0.973856


Let's put attention on the comment 21176. It has some interesting vocabulary related to pedophile and other philes. It's interesting to notice that the sentence 'kid fuckers' received a higher score than the complete comment, although all the comment has a depraved vocabulary.

In [71]:
get_toxicity_score('kid fuckers')

0.96950203

That means that the other main objects of the sentence: 'llolis' and 'furfags' are receiving a lower score from the API Perspective algorithm. The word 'llolis' is wrongly written (when we try to get the score for 'llolis', the algorithm fails), it should be 'lolis' and refers to "young girls, usually underage ones whose appearance is minor and childlike" this is a word that comes from the book Lolita which relates the obssesion of a middle-aged man with a 12-year-old girl and the word 'lolita' is usually related to pedophile in the pop culture.
The male version of  'lolita' and 'lolicon' (a man sexually attracted by young girls) are 'shota' (the young boy) and 'shotacon' (the person attracted to that boy).

In [77]:
comment_list = ['lolita','lolis', 'lolicon','shotacon', 'toyboy']

for comment in comment_list:
    score = get_toxicity_score(comment)
    print(comment, score)
    time.sleep(1)

lolita 0.070014544
lolis 0.12140233
lolicon 0.06714418
shotacon 0.17853539
toyboy 0.24017887


From this, we can conclude that the API fails to recognize some vocabulary. That's something we might expect if these words were part of a special vocabulary used by aggressors, but they are not. That's vocabulary relatively easy to find in the pop culture between young generations.

A possible hypothesis of the low score of words like 'lolita' could be that the training comments with these kind of vocabulary that refers to children as sexualized objects hardly ever contains violence or insults in the common web sites (illegal porn pages can be the exception) and obscene language without aggressive words can make it difficult to the algorithm to identify depravity. 'Obscene' is a category that appeared in the *df_NA* Data Frame in wich just 7% of the comments received a score upper 0.90 and 0 comments received a score upper 0.95.

So, the kind of bias that we are showing here could be a bias originated in the sources of the training data collection.

In [74]:
get_toxicity_score('furfags')

0.7093121

Also, the algorithm might fail if we look for the singular and plural of some words. For example 'furfag' that refers to 'A member of the furry subculture, especially a homosexual male'. The reason for that could be that that's an uncommon word and there were not enoght training data that contained the plural and singular form:

In [69]:
try: 
    get_toxicity_score('furfag')
except: 
    print('<HttpError 400 when requesting https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=AIzaSyD9vTwnb83jjlfKlcr9foDBp3ClFtH81_A&alt=json returned Attribute TOXICITY does not support request languages:...')

<HttpError 400 when requesting https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=AIzaSyD9vTwnb83jjlfKlcr9foDBp3ClFtH81_A&alt=json returned Attribute TOXICITY does not support request languages:...


A final observation is that the API Perspective algorithm also supports Spanish, but might give different scores for the same word between languages and if the word is wrongly written (in this case, without the accent mark):

In [72]:
comment_list = ['pedophile', 'pedófilo', 'pedofilo']

for comment in comment_list:
    score = get_toxicity_score(comment)
    print(comment, score)
    time.sleep(1)

pedophile 0.75314766
pedófilo 0.800372
pedofilo 0.5750231
