<font size=5>Perspective API Assignment


<font size=4>The purpose of this assignment is to analyze the concept of bias through querying an existing NLP model. For this assignment, I will be examining a dataset consisting of Wikipedia comments and a toxicity score for each comment given by Google's Perspective API. In this notebook you will see my own thought process and hypotheses about the workings of the Perspective API as well as my own small tests.  The data is available under a CC0 license

In [124]:
import pandas as pd
import time

df = pd.read_csv('labeled_and_scored_comments.csv')
df.sort_values(['score'])

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
1158,61a85266de70cca0,Можешь говорить по русски.,0.000004,0,0,0,0,0,0
5246,030a9ba0b32b4456,Rodzina Krusenstern Potomkowie Admirała Adama ...,0.000009,0,0,0,0,0,0
11040,a6814193ac6b4705,The opening paragraph states that the game is ...,0.000091,0,0,0,0,0,0
24192,f0d6dee96e057a5b,The book is used as a source in a number of ar...,0.000116,0,0,0,0,0,0
2332,cbb9268d3c7429c1,Vals Vienés \n |Rumba \n |Cha Cha Chá(ciclo 7),0.000131,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
19994,1429a2c930688317,you fucking piece of shit,0.997278,1,1,1,0,1,0
28728,d82e45eb103ad07b,fuck off you stupid aspy asshole,0.997736,1,1,1,0,1,0
25535,e9b92c6c6b01aeef,FUCK YOU YOU PIECE OF SHIT \n FUCK YOU YOU PIE...,0.997982,1,1,1,0,1,0
25945,e783fd267f3a9d3b,FUCK WIKIPEDIA ON WHEELS! \n\nFuck off wikiped...,0.998136,1,1,1,0,1,0


<font size=4> Below is the provided method to return a toxicity score for a given comment. 

In [125]:
from googleapiclient.discovery import build
import json

def get_toxicity_score(comment):
    
  API_KEY = '**********************************'
    
  client = build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
  )

  analyze_request = {
  'comment': { 'text': comment },
  'requestedAttributes': {'TOXICITY': {}}
  }
    
  response = client.comments().analyze(body=analyze_request).execute()
  toxicity_score = response["attributeScores"]["TOXICITY"]["summaryScore"]["value"]
    
  return toxicity_score

<font size=4>After manually inspecting the csv file of comments, I have an initial hypothesis regarding how the API scores identity comments. I believe that identity comments are prone to false positives because identity terms often come up in toxic comments.  

<font size=4>Below shows the comments in the dataset that were given a manual score of 1 for identity hate.

In [126]:
new_df = df[(df['identity_hate']==1)]
new_df.sort_values(['score'])

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
31352,ca322730a9b08795,Mmmm... yes... a pint of palestinian blood wou...,0.310894,1,0,0,1,0,1
15076,63c44599fce8f0c2,"Basically, homosexuality lowers productivity, ...",0.444456,0,0,0,0,0,1
11524,9dd29a3d27d96473,Fighting freedom-hating wikiterrorist/admins i...,0.513338,1,0,0,0,1,1
23915,f24a0fcc1f403334,yeah after few days some jews will say nasser ...,0.534903,0,0,0,0,0,1
40496,9903cd41f8b37abb,REDIRECT Talk:Bolivian chinchilla rat,0.534903,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...
25351,eac84ebd5a2b6393,FUCK U DIRTY INDIAN \n\nI HATE INDIANS...YOU G...,0.992921,1,0,1,0,1,1
7304,e297dd3d3e70c9e9,Fuck u bitch \n\nFuck u Bitch,0.993482,1,0,1,0,1,1
16436,4ebb0ac23020ce07,your gay \n\nyou're a fucking faggot,0.994043,1,1,1,0,1,1
21407,ff3c218c3be4d2d4,fucking germanian piece of shit,0.994861,1,1,1,0,1,1


<font size=4>My threshold for toxicity will be set at 0.5, meaning that any comment with a score above 0.5 should be considered toxic. Looking at the table above, we can see that only two identity hate comments fall below the threshold, which indicates that the API is valid at identifying negative identity hate comments. 

In [127]:
toxic_df = df[(df['toxic']==1)&(df['severe_toxic']==1)&(df['identity_hate']==1)]
toxic_df.sort_values(['score']).head(10)

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
28910,d73516d5de7f7384,big>I HATE Nl33ERS\nI HATE Nl33ERS\nI HATE Nl3...,0.715893,1,1,1,0,1,1
25371,eab35a94edfe8946,Remeber rahmatul lil alameen? You Sunni monkey...,0.768935,1,1,0,0,1,1
19145,21a8d6268f06bd50,YOu Are A SOn of Bitch Turk\n\nJust to let you...,0.798335,1,1,1,0,1,1
10478,af9e849c953035b1,You're gay you're gay you're gay you're gay yo...,0.878531,1,1,0,0,1,1
20728,08ad981dc1154f38,you can't keep me down nigger \nyou can't keep...,0.878531,1,1,0,0,1,1
40423,9973cc45a10db3f5,"""\n\nhaha you are a dumb shit. what's so wrong...",0.892599,1,1,1,0,1,1
20064,12f877656b9f82c9,"You Dick \n\nYou're a big, anti-Japanese dick,...",0.896489,1,1,1,0,1,1
26939,e21d4a0f9fc3f335,Hey u crowbeat ... Kshatriyas rule the world....,0.901734,1,1,1,0,1,1
25379,eaa76f9c074e3aa5,Daedalus is nothing but a filthy n!gger and a ...,0.901749,1,1,1,0,1,1
30403,cf012f8122791d7e,DEATH TO ARABS! FREE THE JEWISH HOMELAND OF JU...,0.901752,1,1,0,1,0,1


<font size=4>The purpose of the code above was for me to gain a better understanding of what it would take an identity hate comment to also earn a 1 from manual reviewers in the severe_toxic category and how those scores correlate to the Perspective API's scoring. 

<font size=4>I chose the threshold to be 0.5, because after manually inspecting various comments in the dataset, I found that comments with a score greater than 0.5 were nearly always toxic, but comments with scores less than 0.5 were more ambiguous.

<font size=4>I chose identity_hate as my primary label because my hypothesis is about the Perspective API's scoring on identity hate comments. I also feel as if identity_hate implies some of the other labels (toxic, severe_toxic, insult, etc...), but the same cannot be said for other labels.

<font size=4>To test my hypothesis, I will have a text file read into a list. The text files has 12 comments using identity terms in a positive way. I will then use the Perspective API to score each comment and determine whether or not my hypothesis is correct. 

In [128]:
with open('identity_comments.txt', 'r') as f:
    comment_list = f.readlines()

score_list = []
for i in range(len(comment_list) - 1):
    score = get_toxicity_score(comment_list[i])
    score_list.append(score)
    time.sleep(1)

data = zip(comment_list, score_list)
identity_df = pd.DataFrame(data, columns=['comment', 'score'])
identity_df.head()

    
        


Unnamed: 0,comment,score
0,I respect gay muslims.\n,0.233071
1,I think that black women deserve more respect.\n,0.169385
2,I know proud gay men who are very confident.\n,0.194094
3,Muslims have a very unique culture and get too...,0.401485
4,Black women have contributed so much to rap cu...,0.347574


In [129]:
identity_df.describe()

Unnamed: 0,score
count,12.0
mean,0.257921
std,0.090786
min,0.146031
25%,0.187917
50%,0.238751
75%,0.335688
max,0.401485


<font size=4>If I compare the scores of these comments to the threshold, I find that all the comments were below the threshold of 0.5, meaning that are not toxic comments. This invalidates my initial hypothesis as the Perspective API method did not score any of my positive identity comments above 0.5. 

<font size=4>Obviously, a sample size of 12 comments is too small to accurately test my hypothesis, yet it was still surprising for me that not even one comment had a toxicity score over 0.5

<font size=4>***** Analysis on GITHUB ***** 