### Perspective API Exploration

First, we have a dataset of Wikipedia comments made available by Jigsaw, a subsidiary of Google that created the Perspective tool. The dataset includes a unique comment id, the text of the comment, and a series of binary labels applied by human raters: "toxic," "severe_toxic," "obscene," "threat," "insult," and "identity_hate." I have appended the "score" column, which represents the toxicity score assigned to the comment text by the live version of the Perspective API. The data is available under a CC0 license.

Reading in CSV file of comments

In [2]:
import pandas as pd
import time

df = pd.read_csv('labeled_and_scored_comments.csv')

Sorting the dataset of comments by score, from lowest to highest.

In [3]:
df.sort_values(['score'])

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
1158,61a85266de70cca0,Можешь говорить по русски.,0.000004,0,0,0,0,0,0
5246,030a9ba0b32b4456,Rodzina Krusenstern Potomkowie Admirała Adama ...,0.000009,0,0,0,0,0,0
11040,a6814193ac6b4705,The opening paragraph states that the game is ...,0.000091,0,0,0,0,0,0
24192,f0d6dee96e057a5b,The book is used as a source in a number of ar...,0.000116,0,0,0,0,0,0
2332,cbb9268d3c7429c1,Vals Vienés \n |Rumba \n |Cha Cha Chá(ciclo 7),0.000131,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
19994,1429a2c930688317,you fucking piece of shit,0.997278,1,1,1,0,1,0
28728,d82e45eb103ad07b,fuck off you stupid aspy asshole,0.997736,1,1,1,0,1,0
25535,e9b92c6c6b01aeef,FUCK YOU YOU PIECE OF SHIT \n FUCK YOU YOU PIE...,0.997982,1,1,1,0,1,0
25945,e783fd267f3a9d3b,FUCK WIKIPEDIA ON WHEELS! \n\nFuck off wikiped...,0.998136,1,1,1,0,1,0


I've also included a function to make calls to the Perspective API for your own testing. You will need to generate your own API key according to the instructions in the assignment.

In [4]:
from googleapiclient.discovery import build
import json

def get_toxicity_score(comment):
    
  API_KEY = 'XXXXXX' # Put your API key here
    
  client = build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1"
  )

  analyze_request = {
  'comment': { 'text': comment },
  'requestedAttributes': {'TOXICITY': {}}
  }
    
  response = client.comments().analyze(body=analyze_request).execute()
  toxicity_score = response["attributeScores"]["TOXICITY"]["summaryScore"]["value"]
    
  return toxicity_score

Extracting all comments with the "toxic" label from the dataset. This label was chosen to evaluate the dataset because
it's the broadest out of all the labels. This assignment will only be looking at how the model scores comments as toxic vs. how manual reviewers have scored comments as toxic. 

In [5]:
toxic_comments = df[(df['toxic']==1)]

Determining a threshold for the model

In [6]:
toxic_comments.describe()

Unnamed: 0,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
count,3943.0,3943.0,3943.0,3943.0,3943.0,3943.0,3943.0
mean,0.833664,1.0,0.09612,0.522698,0.029673,0.4877,0.095612
std,0.168672,0.0,0.294793,0.499548,0.169705,0.499912,0.294096
min,0.054399,1.0,0.0,0.0,0.0,0.0,0.0
25%,0.771053,1.0,0.0,0.0,0.0,0.0,0.0
50%,0.890928,1.0,0.0,1.0,0.0,0.0,0.0
75%,0.954737,1.0,0.0,1.0,0.0,1.0,0.0
max,0.998329,1.0,1.0,1.0,1.0,1.0,1.0


Threshold of 0.7 chosen due to most (approximately 75%) of the toxic comments having a score above 0.77.

In [7]:
toxic_comments.head()

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
8,00091c35fa9d0465,"== Arabs are committing genocide in Iraq, but ...",0.635929,1,0,0,0,0,0
24,0013fed3aeae76b7,DJ Robinson is gay as hell! he sucks his dick ...,0.989706,1,0,1,0,1,1
29,0017d4d47894af05,":Fuck off, you anti-semitic cunt. |",0.959459,1,0,1,0,1,0
37,001d739c97bc2ae4,How dare you vandalize that page about the HMS...,0.564152,1,0,0,0,0,0
38,001eff4007dbb65b,"::No, he is an arrogant, self serving, immatur...",0.956893,1,0,1,0,1,0


From looking at the first few rows, I could immediately see that two comments with the highest rated scores contain words that
are sexual or sexuality related (lines 24 & 29). 

Checking the scores of these words using the API. Since they're not relatively low, I think they're important to consider in my analysis. 

In [8]:
comment_list = ['gay', 'dick', 'cunt']
score_list = []

for comment in comment_list:
    score = get_toxicity_score(comment)
    time.sleep(1)
    score_list.append(score)

print(score_list)

[0.61826205, 0.59863794, 0.5026305]


In [9]:
sum (score_list)

1.71953049

This allows me to form my hypothesis, stated below:

## ***Hypothesis: Comments containing sexual or sexuality-related terms are more likely to be scored as "toxic" by the Perspective model. Perspective is more biased when scoring comments containing sexual or sexuality-related terms.***

Extracting all comments from the original dataset with sexual/sexuality-related terms.

In [10]:
contain_certain_words = df.loc[df.comment_text.str.contains(r'\b(?:gay|queer|homo|binary|lesbian|homosexual|homophobic|asexual|cunt|pansexual|hetero|heterosexual|transsexual|dick|anus|LGBTQ|vag|vagina|demisexual|polysexual|penis|ass|butt|cock|sack|transgender|sex|sexual|sexuality|sexually|sexual orientation|pangender|homophobia|intergender|gender-fluid|femme|cisgender|biphobia|bigender|abrosexual|agender|arrse|arse|orgasm|fag|faggot|balls|bellend|dickhead|knobend|slut|whore|tits|boobs|boobies)\b')]

Terms were chosen by looking at the first few rows of the toxic comments (shown below), and by looking at websites such as https://www.portlandoregon.gov/article/730061 and https://www.fluentin3months.com/dirty-words/. Only English terms were considered. 

In [11]:
toxic_comments.head(60)

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
8,00091c35fa9d0465,"== Arabs are committing genocide in Iraq, but ...",0.635929,1,0,0,0,0,0
24,0013fed3aeae76b7,DJ Robinson is gay as hell! he sucks his dick ...,0.989706,1,0,1,0,1,1
29,0017d4d47894af05,":Fuck off, you anti-semitic cunt. |",0.959459,1,0,1,0,1,0
37,001d739c97bc2ae4,How dare you vandalize that page about the HMS...,0.564152,1,0,0,0,0,0
38,001eff4007dbb65b,"::No, he is an arrogant, self serving, immatur...",0.956893,1,0,1,0,1,0
59,03c1c7a2649a7d87,":::Jeez Ed, you seem like a fucking shitty dou...",0.992469,1,0,1,0,1,0
61,03c45b72d38c1f75,FUCK OFF FUCK OFF FUCK OFF FUCK OFF FUCK OFF F...,0.989706,1,0,1,0,0,0
62,03c6344729c4e665,Azerbaijanis Proud of Ramil Safarov \n If I h...,0.980994,1,0,1,1,1,1
71,03d4fd1aab9247a2,who is DJMASACRE? \n\n what the heck is a sock...,0.851035,1,0,0,0,0,0
80,03df07faf434d387,""" \n :Yeah your right. And dude I'm not fuckin...",0.878531,1,0,0,0,0,0


In [12]:
contain_certain_words.describe()

Unnamed: 0,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
count,1306.0,1306.0,1306.0,1306.0,1306.0,1306.0,1306.0
mean,0.77028,0.643185,0.107198,0.496937,0.028331,0.421899,0.1317
std,0.245123,0.479243,0.309483,0.500182,0.16598,0.494052,0.338294
min,0.0199,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.627064,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.881638,1.0,0.0,0.0,0.0,0.0,0.0
75%,0.959269,1.0,0.0,1.0,0.0,1.0,0.0
max,0.996048,1.0,1.0,1.0,1.0,1.0,1.0


Determine how many of these comments are considered toxic by the model, according to our threshold. From these results, we could see that **918** comments are toxic. 

In [13]:
threshold = 0.7

prediction = contain_certain_words['score'] > threshold
prediction.value_counts()

True     918
False    388
Name: score, dtype: int64

From the comments that contain these certain terms, extract the ones that were actually labeled as "toxic" by manual reviewers.

In [14]:
toxic_by_humans = contain_certain_words[(contain_certain_words['toxic']==1)]

From the output below, we could see that **840** comments are toxic according to human input.

In [15]:
toxic_by_humans.describe()

Unnamed: 0,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
count,840.0,840.0,840.0,840.0,840.0,840.0,840.0
mean,0.908398,1.0,0.166667,0.734524,0.044048,0.629762,0.19881
std,0.105231,0.0,0.3729,0.44185,0.205323,0.483156,0.399342
min,0.082855,1.0,0.0,0.0,0.0,0.0,0.0
25%,0.88026,1.0,0.0,0.0,0.0,0.0,0.0
50%,0.948086,1.0,0.0,1.0,0.0,1.0,0.0
75%,0.97457,1.0,0.0,1.0,0.0,1.0,0.0
max,0.996048,1.0,1.0,1.0,1.0,1.0,1.0


Based on this, we could say that the model over-estimated the number of toxic comments that contained sexual/sexuality-related
terms. The difference between the number of toxic comments rated by the model and manual reviewers is 78 (918 and 840, respectfully.) These results **support** my initial **hypothesis** that **the Perspective model is slightly biased towards comments containing sexual/sexuality-related terms**.