# HCD Assignment 2 

## Setting up a Perspective AI Key

First, we have a dataset of Wikipedia comments made available by Jigsaw, a subsidiary of Google that created the Perspective tool. The dataset includes a unique comment id, the text of the comment, and a series of binary labels applied by human raters: "toxic," "severe_toxic," "obscene," "threat," "insult," and "identity_hate." I have appended the "score" column, which represents the toxicity score assigned to the comment text by the live version of the Perspective API. The data is available under a CC0 license.

In [59]:
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

df = pd.read_csv('labeled_and_scored_comments.csv')

In [2]:
df.sort_values(['score'])

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
1158,61a85266de70cca0,Можешь говорить по русски.,0.000004,0,0,0,0,0,0
5246,030a9ba0b32b4456,Rodzina Krusenstern Potomkowie Admirała Adama ...,0.000009,0,0,0,0,0,0
11040,a6814193ac6b4705,The opening paragraph states that the game is ...,0.000091,0,0,0,0,0,0
24192,f0d6dee96e057a5b,The book is used as a source in a number of ar...,0.000116,0,0,0,0,0,0
2332,cbb9268d3c7429c1,Vals Vienés \n |Rumba \n |Cha Cha Chá(ciclo 7),0.000131,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
19994,1429a2c930688317,you fucking piece of shit,0.997278,1,1,1,0,1,0
28728,d82e45eb103ad07b,fuck off you stupid aspy asshole,0.997736,1,1,1,0,1,0
25535,e9b92c6c6b01aeef,FUCK YOU YOU PIECE OF SHIT \n FUCK YOU YOU PIE...,0.997982,1,1,1,0,1,0
25945,e783fd267f3a9d3b,FUCK WIKIPEDIA ON WHEELS! \n\nFuck off wikiped...,0.998136,1,1,1,0,1,0


I've also included a function to make calls to the Perspective API for your own testing. You will need to generate your own API key according to the instructions in the assignment.

In [3]:
from googleapiclient.discovery import build
import json

def get_toxicity_score(comment):
    
  API_KEY = 'AIzaSyD4unD8JPD-QkCUodQsWB37WltAT9xk7XY' # Put your API key here
    
  client = build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  )

  analyze_request = {
  'comment': { 'text': comment },
  'requestedAttributes': {'TOXICITY': {}}
  }
    
  response = client.comments().analyze(body=analyze_request).execute()
  toxicity_score = response["attributeScores"]["TOXICITY"]["summaryScore"]["value"]
    
  return toxicity_score

We can call this function with original comments like this:

## Exploring the Scored Dataset

After downloading the CSV file from Canvas, I played around a little bit, searching toxicity score of some normal words, such as "hello","how's it going". 

instruction: Compare the scores to the labels assigned by manual reviewers. You may wish to do some data parsing to find out things like the most common words used in comments where toxicity or abuse was present. Where did the model make mistakes? Visual inspection of the comments may spark some ideas for how to test the API for potential biases.  

In [4]:
get_toxicity_score("I Love You")

0.038053125

In [5]:
get_toxicity_score("hello")

0.02397547

In [6]:
get_toxicity_score("How's it going")

0.024662184

In [7]:
get_toxicity_score("I hate women")

0.8086813

In [8]:
get_toxicity_score("I hate men")

0.67902285

In [9]:
get_toxicity_score("fuck you")

0.95473075

I wanted to see which score has the most frequency. Surprisingly, 0.310894 was the most common, occuring 476 times, which was less than 0.5.

In [10]:
df['score'].value_counts().head()

0.310894    476
0.515460    163
0.534903    144
0.860626    141
0.695427    141
Name: score, dtype: int64

In [11]:
df['comment_text'].value_counts().head()

Thank you for understanding. I think very highly of you and would not revert without discussion.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

In [12]:
df['comment_text'].value_counts().head(1)

Thank you for understanding. I think very highly of you and would not revert without discussion.    1
Name: comment_text, dtype: int64

I also wanted to see which was the most common test, but it didn't really work. So I pulled out most common words that were used on comments.

In [13]:
from collections import Counter
cnt = Counter()
for text in df['comment_text']:
    for word in text.split():
        cnt[word] += 1
        
cnt.most_common(150)

[('the', 115302),
 ('to', 74441),
 ('of', 56293),
 ('and', 53227),
 ('a', 52575),
 ('I', 47409),
 ('is', 42646),
 ('you', 40143),
 ('that', 36143),
 ('in', 33849),
 ('for', 24438),
 ('it', 23903),
 ('on', 21678),
 ('"', 21308),
 ('not', 21165),
 ('be', 20857),
 ('this', 18404),
 ('have', 17832),
 ('as', 17810),
 ('are', 17031),
 ('your', 14407),
 ('with', 14380),
 ('was', 13569),
 ('or', 12136),
 ('but', 10943),
 ('an', 10687),
 ('The', 10233),
 ('from', 10055),
 ('article', 9994),
 ('my', 9899),
 ('by', 9884),
 ('at', 9510),
 ('about', 8879),
 ('do', 8229),
 ('if', 8032),
 ('can', 7837),
 ('has', 7701),
 ('would', 7319),
 ('will', 7289),
 ('page', 6923),
 ('me', 6629),
 ('what', 6609),
 ('so', 6533),
 ('like', 6480),
 ('If', 6431),
 ('been', 6280),
 ('just', 6191),
 ('all', 6098),
 ('any', 6082),
 ('-', 6066),
 ('they', 6014),
 ('should', 5969),
 ('You', 5775),
 ('which', 5755),
 ('one', 5601),
 ('there', 5515),
 ('more', 5499),
 ('he', 5483),
 ('no', 5387),
 ('some', 5331),
 ('other'

By this, I could see a lot of pronouns and prepositions and conjunctions, which aren't useful to determine if it contains any toxicity in comments. However, I saw the word "fuck", which toxicity score was over 0.9.

## Designing and Performing Tests

In this case, simple 0.5 wouldn't be an accurate threshold to determine if the comment is toxic and abusive.

In [14]:
df.describe()

Unnamed: 0,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
count,41338.0,41338.0,41338.0,41338.0,41338.0,41338.0,41338.0
mean,0.244467,0.095384,0.009168,0.05305,0.003024,0.049809,0.009725
std,0.257221,0.293749,0.095313,0.224137,0.054907,0.217553,0.098134
min,4e-06,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.074772,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.128969,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.310894,0.0,0.0,0.0,0.0,0.0,0.0
max,0.998329,1.0,1.0,1.0,1.0,1.0,1.0


I would set the point x, 0.8 at which scores above 0.8 are considered toxic or abusive. One example of toxic comment is anything contains the word "fuck".


In [15]:
fword = df[df['comment_text'] == 'you fucking piece of shit']
print(fword)

                     id               comment_text     score  toxic  \
19994  1429a2c930688317  you fucking piece of shit  0.997278      1   

       severe_toxic  obscene  threat  insult  identity_hate  
19994             1        1       0       1              0  


In [16]:
df = df[df['comment_text'].str.contains('fuck')]
print(df)

                     id                                       comment_text  \
59     03c1c7a2649a7d87  :::Jeez Ed, you seem like a fucking shitty dou...   
62     03c6344729c4e665  Azerbaijanis Proud of Ramil Safarov  \n If I h...   
80     03df07faf434d387  " \n :Yeah your right. And dude I'm not fuckin...   
149    0bd23029a3cbda62  :I'll have a look presently. Half the academic...   
304    16e010a6cdefef2c  " \n You're a fucking idiot. Just because it t...   
...                 ...                                                ...   
41134  957c327ace7d9896  how can it be almost certain when it has been ...   
41181  95390a4e0cd90ad3  Fuck You Asshole \n\nFuck You, you cum-guzzlin...   
41221  950bb8240a93ef11  "\n\n Censorship \n\nWas Final Fantasy VII cen...   
41299  94a5024323152cd1  ==Why does it bother you, fuckface?89.123.100....   
41332  9481cd7393b583c9  RE: \n\nIt's a fucking album cover, how the fu...   

          score  toxic  severe_toxic  obscene  threat  insult  

In [17]:
df.head()

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
59,03c1c7a2649a7d87,":::Jeez Ed, you seem like a fucking shitty dou...",0.992469,1,0,1,0,1,0
62,03c6344729c4e665,Azerbaijanis Proud of Ramil Safarov \n If I h...,0.980994,1,0,1,1,1,1
80,03df07faf434d387,""" \n :Yeah your right. And dude I'm not fuckin...",0.878531,1,0,0,0,0,0
149,0bd23029a3cbda62,:I'll have a look presently. Half the academic...,0.462907,0,0,0,0,0,0
304,16e010a6cdefef2c,""" \n You're a fucking idiot. Just because it t...",0.977299,1,0,1,0,1,0


Hypothesis

H0 = perspective will determine pieces of content as abusive and toxic (score>0.8) when the content includes toxic words "fuck" and "shit".

H1 = perspective will NOT determine pieces of content as abusive and toxic (score>0.8) when the content includes toxic words "fuck" and "shit".


In [18]:
df = df[df['comment_text'].str.contains('fucking')]
print(df)

                     id                                       comment_text  \
59     03c1c7a2649a7d87  :::Jeez Ed, you seem like a fucking shitty dou...   
80     03df07faf434d387  " \n :Yeah your right. And dude I'm not fuckin...   
149    0bd23029a3cbda62  :I'll have a look presently. Half the academic...   
304    16e010a6cdefef2c  " \n You're a fucking idiot. Just because it t...   
332    1a44cad92a0c1c96  ==Son of a bitch== \n\n Hey you fucking neo-Na...   
...                 ...                                                ...   
41062  95ddf8be1b12c559  In defence of China! \n\nHey Koreans, who the ...   
41102  95a54e43c59f1dc9  "\nThis article as a whole is subject to a con...   
41116  958e4d243a743156  "\n\n idiot \n\n  i do not understand what use...   
41181  95390a4e0cd90ad3  Fuck You Asshole \n\nFuck You, you cum-guzzlin...   
41332  9481cd7393b583c9  RE: \n\nIt's a fucking album cover, how the fu...   

          score  toxic  severe_toxic  obscene  threat  insult  

In [19]:
df = df[df['comment_text'].str.contains('fuck you')]
print(df)

                     id                                       comment_text  \
2593   eca0e12c9c0311cd  ===Apology=== \n I would like to sincerely apo...   
7019   e778d4f09a32b559  Fuck You \n\nNigga, fuck you.\n\nFlocka aint a...   
7847   da399c0707552a86  "\nI'm sorry you don't see that what you said ...   
8923   c9286243b103a14b  CaliforniaAliBaba is a Bitch\nGo fuck yourself...   
14226  7213c3afe948d001  2010's \n\nArthur Rubin i want to thank but i ...   
18810  26b8716d9cef3664  i lost my civility when you started being outr...   
20127  11f276328553138d  I hate this site.  But you know, I'll do every...   
23089  f69a0d56ea0d6bd1  "Fuck You==\nhey fuck you, who the fuck checks...   
24983  eca8f3b9e049f84e  you shut your whore mouth \n\n An open letter ...   
27074  e163ed263c824a2a  "\nEveryone who doesn't think that what I did ...   
28085  dbb2d936e3756f67  fuck you! you fucking twat dont u dare insult ...   
28687  d870a866ad68e393  what the hell man? why did you delete m

In [20]:
df = df[df['comment_text'].str.contains('fucking shit')]
print(df)

                     id                                       comment_text  \
14226  7213c3afe948d001  2010's \n\nArthur Rubin i want to thank but i ...   

          score  toxic  severe_toxic  obscene  threat  insult  identity_hate  
14226  0.981304      1             0        1       0       1              0  


In [34]:
#below is my test results- trying to check 20 comments that contains 'fuck','shit'
get_toxicity_score("shit")


0.7007861

In [35]:
get_toxicity_score("as fuck")

0.8960455

In [63]:
get_toxicity_score("to fuck for")

0.8546526

In [37]:
get_toxicity_score("fuck it!")

0.9107452

In [39]:
get_toxicity_score("i don't give a fuck")

0.86967754

In [40]:
get_toxicity_score("fuck my life")

0.9189384

In [43]:
get_toxicity_score("mf")

0.355017

In [42]:
get_toxicity_score("wtf")

0.5535644

In [44]:
get_toxicity_score("holy fuck")

0.8269591

In [45]:
get_toxicity_score("holy shit")

0.8129004

In [46]:
get_toxicity_score("shut the fuck up")

0.9390311

In [47]:
get_toxicity_score("fuck yeah")

0.83020335

In [48]:
get_toxicity_score("fuck a duck")

0.95068264

In [49]:
get_toxicity_score("fucking a right")

0.867151

In [50]:
get_toxicity_score("fuck that shit")


0.9850823

In [51]:
get_toxicity_score("fucking idiot")

0.9863082

In [52]:
get_toxicity_score("a fucking shitty")

0.97926265

In [53]:
get_toxicity_score("you idiot")

0.9320454

In [54]:
get_toxicity_score("fuck")

0.9017833

In [27]:
import scipy.stats as stats
import pandas as pd

In [28]:
data = pd.read_csv('labeled_and_scored_comments.csv')
data.head()

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0001ea8717f6de06,Thank you for understanding. I think very high...,0.075638,0,0,0,0,0,0
1,000247e83dcc1211,:Dear god this site is horrible.,0.450459,0,0,0,0,0,0
2,0002f87b16116a7f,"""::: Somebody will invariably try to add Relig...",0.667964,0,0,0,0,0,0
3,0003e1cccfd5a40a,""" \n\n It says it right there that it IS a typ...",0.068434,0,0,0,0,0,0
4,00059ace3e3e9a53,""" \n\n == Before adding a new product to the l...",0.151724,0,0,0,0,0,0


In [29]:
t_statistic, p_value = stats.ttest_1samp(a = data["toxic"], popmean=0.8)

In [30]:
print(t_statistic , p_value)

-487.69807347653835 0.0


In [31]:
t_statistic, p_value = stats.ttest_1samp(a = data["score"], popmean=0.5)

In [32]:
print(t_statistic , p_value)

-201.98347260135824 0.0


In [33]:
print(t_statistic)

-201.98347260135824


p value = 0, which means we reject H0: perspective will determine pieces of content as abusive and toxic (score>0.8) when the content includes toxic words "fuck" and "shit".



## Analyzing the Data

instruction: Analyze your results. If you looked at performance on the labeled dataset, how did false positive rates and false negative rates compare against different types of content? Were there any cases where you disagreed with the labels? Were there any trends that you noticed in the model scores? What about any original queries that you submitted?


False positive should be where actual is 0, but the predicted is 1. False negative, on the otherhand, should be where actual is 1 but the predicted is 0. 

Toxicity score of the comment "wtf" was only 0.5535644, which I disagreed with. I thought the score would be high even the comments were abbreviations. I realized if the comments contain the actual whole word "fuck", the scores tend to appear higher.

However, a low sample size might make it difficult to determine if the toxicity scores are actually high because of specific word choice on comments.
