# Testing Perspective API for Potential Bias With Regards to Profanity Censorship

## 0. Set Up Environment
### 0.1 Set up access to API

To set up access to the API, I installed and imported googleapiclient for Python; documentation is accessible [here](https://googleapis.github.io/google-api-python-client/docs/). 
To use the Perspective API, I created a Google Cloud account and requested access to the API; documentation for the API is accessible [here](https://developers.perspectiveapi.com/s/docs?language=en_US). 
Once the access to the API was set up, authorization needs to be performed by creating an API Key as a credential. 

In [85]:
try: 
    from googleapiclient import discovery
except:
    !pip install --upgrade google-api-python-client google-auth-httplib2 google-aut-oauthlib
    from gooogleapiclient import discovery

In [86]:
API_KEY = 'my-api-key'

client = discovery.build(
    "commentanalyzer",
    "v1alpha1",
    developerKey=API_KEY,
    discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
    static_discovery=False,
)

### 0.2 Import relevant data science libraries

Import the following libraries to help perform data transformation and visualization.
- **JSON** will be used for code formatting and printing. JSON documentation can be found [here](https://docs.python.org/3/library/json.html#module-json)
- **Pandas** will be used to work with our data in a dataframe. Pandas documentation can be found [here](https://pandas.pydata.org/docs/). 
- **NumPy** will be used to perform computations on our data. NumPy documentation can be found [here](https://numpy.org/doc/stable/)
- From **SciKit Learn**, the *accuracy score* will be imported from Metrics to compute the accuracy of the API. SciKit Learn Metrics documentation can be found [here](https://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation).

In [87]:
import json
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score

## 1. Testing 

### 1.1 Determine a threshold for toxicity based on score

To determine the threshold of toxicity based on the API score, I ran some pre-labeled comments through the API and printed the scores. The comments used to set the threshold came from sample pre-labeled data. 

In [88]:
#accepts a list of comments and returns a dictionary with key: comment and value: api toxicity score
def toxicity_of_list_as_dict(given_comments):
    comment_with_toxicity = {}
    for comment in given_comments:
       
        analyze_request = {
            'comment': {'text': comment},
            'requestedAttributes': {'TOXICITY': {}}
        }
        response = client.comments().analyze(body=analyze_request).execute()
        toxicity_value = response.get('attributeScores').get('TOXICITY').get('summaryScore').get('value')
        comment_with_toxicity[comment] = toxicity_value
        
    return comment_with_toxicity

In [89]:
comments_to_determine_threshold = [
    'no hes not he have his own show whihc is a cheap copy of ragu romeo show and hes only popular because of stupid people who auditions and watch this gutter show',
    'GIVE BLOWJOBS TO HOT WOMEN FROM ALDERLEY EDGE CHESHIRE',
    'cause shes just a devil with evil on her mind and my penis in Materialscientists birth place',
    'Thats helpful MOS be damned Thecodingproject thinks its 10x worse ‚Äî',
    'Intolerance in India    india is a Generator of liars like you',
    'wat the heck   what the hell is wrong with you  Next time you vandalize my page im going over there to Fayetteville Arkansas and beating the crap out of you',
    'REDIRECT TalkBlowing Smoke Mad Men',
    'Thank you and Merry Christmas as well and congratulations on both of your teams making it  I hope all is well with you and your family',
    'Source  Source', 
    'shes gotta be the most annoying singer ive ever heard',
]

print(json.dumps(toxicity_of_list_as_dict(comments_to_determine_threshold),indent=1))

{
 "no hes not he have his own show whihc is a cheap copy of ragu romeo show and hes only popular because of stupid people who auditions and watch this gutter show": 0.7105462,
 "GIVE BLOWJOBS TO HOT WOMEN FROM ALDERLEY EDGE CHESHIRE": 0.7675452,
 "cause shes just a devil with evil on her mind and my penis in Materialscientists birth place": 0.8540474,
 "Thats helpful MOS be damned Thecodingproject thinks its 10x worse \u201a\u00c4\u00ee": 0.51869136,
 "Intolerance in India    india is a Generator of liars like you": 0.5716857,
 "wat the heck   what the hell is wrong with you  Next time you vandalize my page im going over there to Fayetteville Arkansas and beating the crap out of you": 0.88599813,
 "REDIRECT TalkBlowing Smoke Mad Men": 0.3972142,
 "Thank you and Merry Christmas as well and congratulations on both of your teams making it  I hope all is well with you and your family": 0.008859074,
 "Source  Source": 0.008921904,
 "shes gotta be the most annoying singer ive ever heard": 0

Based on the results above, I will set my threshold at 0.5. I am choosing this because the comment "Thats helpful MOS be damned Thecodingproject thinks its 10x worse \u201a\u00c4\u00ee" is manually labeled as toxic and has an API score of 0.52, while "shes gotta be the most annoying singer ive ever heard" is manually labeled as non-toxic and has an API score of 0.498. So, a comment with a score within [0, 0.5) will be classified as non-toxic, while a comment with a score within [0.5, 1] will be classified as toxic.

### 1.2 Hypothesis

I hypothesize that the Perspective API will not mark content with profanity as toxic if the profane words are censored using asterisks. 

### 1.3 Perform Tests

In [90]:
#accepts a comment and returns the toxicity score from the API
def toxicity(comment):
    analyze_request = {
            'comment': {'text': comment},
            'requestedAttributes': {'TOXICITY': {}}
        }

    response = client.comments().analyze(body=analyze_request).execute()
    toxicity_value = response.get('attributeScores').get('TOXICITY').get('summaryScore').get('value')
    
    return toxicity_value

In [91]:
#accepts a list of comments and returns a list of corresponding toxicity scores
def toxicity_as_list(given_comments):
    comment_toxicity_list = []
    for comment in given_comments:

        toxicity_value = toxicity(comment)
       
        comment_toxicity_list.append(toxicity_value)
        
    return comment_toxicity_list

In [92]:
#accepts a list of comments and returns a list of corresponding categorical toxicity values based on threshold
def categorized_toxicity_as_list(given_comments):
    comment_toxicity_list = []
    for comment in given_comments:

        toxicity_value = toxicity(comment)
        
        if (toxicity_value >= 0.5):
            comment_toxicity_list.append('TOXIC')
        else:
            comment_toxicity_list.append('NONTOXIC')
        
    return comment_toxicity_list

To test my hypothesis, I created a test set of comments. All of the comments in the test set are toxic and contain profane language. Then, I used the same comments but censored the profane words using asterisks. I created a data frame with the test comments, whether they were profane or censored, their toxicity score from the API, the correlating categorized toxicity (ie toxic or nontoxic) based on the pre-determined threshold, and the actual (ie manually labeled by myself) toxicity. 

In [93]:
#profane comments to test
profane_comments = [
    'You are a complete asshole',
    'The person in this video is a crazy bitch',
    'Whoever commented before me is a fucking cunt',
    'This article is complete bullshit',
    'Go to hell you bitch, nobody wants to hear your shit',
    'THIS PERSON IS A HORRID BASTARD'
]

#same as profane comments except profanity is censored using asterisks
censored_comments = [
    'You are a complete a*****e',
    'The person in this video is a crazy b***h',
    'Whoever commented before me is a f*****g c**t',
    'This article is complete b******t',
    'Go to h**l you b***h, nobody wants to hear your s**t',
    'THIS PERSON IS A HORRID B*****D'
]

#dictionary for profane comments and attributes
profane = {'comment': profane_comments, 
              'profane or censored': 'profane',
              'api_score': toxicity_as_list(profane_comments),
            'predicted toxicity': categorized_toxicity_as_list(profane_comments),
              'actual': 'TOXIC'
}

#dictionary for censored comments and attributes
censored = {'comment': censored_comments,
               'profane or censored': 'censored',
               'api_score': toxicity_as_list(censored_comments),
               'predicted toxicity': categorized_toxicity_as_list(censored_comments),
               'actual': 'TOXIC'
    
}

#convert dictionaries into dataframes and concatenate
df_profane = pd.DataFrame(data=profane)
df_censored = pd.DataFrame(data=censored)
test_df = pd.concat([df_profane,df_censored], ignore_index=True, sort=False)

#save test comments, scores, and attributes as csv
test_df.to_csv("Test_Comments_with_API_Scores.csv")

Unnamed: 0,comment,profane or censored,api_score,predicted toxicity,actual
0,You are a complete asshole,profane,0.939145,TOXIC,TOXIC
1,The person in this video is a crazy bitch,profane,0.956375,TOXIC,TOXIC
2,Whoever commented before me is a fucking cunt,profane,0.964267,TOXIC,TOXIC
3,This article is complete bullshit,profane,0.924899,TOXIC,TOXIC
4,"Go to hell you bitch, nobody wants to hear you...",profane,0.974994,TOXIC,TOXIC
5,THIS PERSON IS A HORRID BASTARD,profane,0.939145,TOXIC,TOXIC
6,You are a complete a*****e,censored,0.55597,TOXIC,TOXIC
7,The person in this video is a crazy b***h,censored,0.853334,TOXIC,TOXIC
8,Whoever commented before me is a f*****g c**t,censored,0.869671,TOXIC,TOXIC
9,This article is complete b******t,censored,0.687436,TOXIC,TOXIC


### 1.4 Accuracy

To determine the accuracy of the model, I did a visual observation of the data frame displaying the predicted toxicity and actual toxicity. In addition, I used the accuracy score metric from SciKit Learn (although this step was a bit redundant and trivial). 

In [95]:
#display dataframe
display(test_df)

actual_toxicity = [1 if actual == 'TOXIC' else 0 for actual in test_df['actual']]
api_prediction = [1 if api_prediction == 'TOXIC' else 0 for api_prediction in test_df['predicted toxicity']]
        
accuracy = accuracy_score(actual_toxicity,api_prediction)
print(f'Accuracy of model: {accuracy}')

Unnamed: 0,comment,profane or censored,api_score,predicted toxicity,actual
0,You are a complete asshole,profane,0.939145,TOXIC,TOXIC
1,The person in this video is a crazy bitch,profane,0.956375,TOXIC,TOXIC
2,Whoever commented before me is a fucking cunt,profane,0.964267,TOXIC,TOXIC
3,This article is complete bullshit,profane,0.924899,TOXIC,TOXIC
4,"Go to hell you bitch, nobody wants to hear you...",profane,0.974994,TOXIC,TOXIC
5,THIS PERSON IS A HORRID BASTARD,profane,0.939145,TOXIC,TOXIC
6,You are a complete a*****e,censored,0.55597,TOXIC,TOXIC
7,The person in this video is a crazy b***h,censored,0.853334,TOXIC,TOXIC
8,Whoever commented before me is a f*****g c**t,censored,0.869671,TOXIC,TOXIC
9,This article is complete b******t,censored,0.687436,TOXIC,TOXIC


Accuracy of model: 1.0


## 2. Results and Insights

From 1.4, I visually inspected the dataframe and observed that for each comment in the test set, the predicted toxicity was with regards to the actual toxicity. This was confirmed by computation of the accuracy score, which gave the model a 100% accuracy score. 

These results do not support my hypothesis. Rather, it can be concluded that regardless of whether a comment is openly profane or censored, the Perspective API is accurate in classifying the comment as toxic or not. This was a little bit surprising to me, because I created my hypothesis with the reasoning that if a word is censored, there are other words that may not be profane that fit the censorship criteria. For example, the word "b\*\*\*\*\*\*t" could easily represent "backseat" or "bedsheet". While my hypothesis was not supported by this experiment, the reasoning behind it should not fully be discarded. Particularly, if we look at the comments "This article is complete bullshit" and "This article is complete b\*\*\*\*\*\*t", there is a drop in toxicity score by about 0.24, from 0.925 to 0.687 respectively. There is a similar disparity for "You are a complete asshole" and "You are a complete a\*\*\*\*\*e". 

One reason this may be the case is because the API may incorporate the length of a word and the beginning and ending letters in the calculation of toxicity, so words that resemble profanity but are not exactly profane are able to be detected. 

Another reason the API may still be able to accurately classify the comments may be due to the other words in the comment and the context they provide. For example, "The person in this video is a crazy" itself may be classified as toxic regardless of the profanity or censorship of the word following crazy. For further tests, we could determine how much of an impact the usage of profanity has on the toxicity score. 

Additionally, it is important to consider the test data that was used in this experiment. The comments were all written by me, and they all were classified as toxic, regardless of profanity. However, some other person may classify "This article is complete b\*\*\*\*\*\*t" as non-toxic, which would significantly impact the subsequent accuracy of the model. Another consideration to be made is the limited size of the test set. Only 12 comments were used in this case, and they were quite similar in text composition. This may have impacted the general interpretation of the model. Perhaps if the test data were more diverse, the results would be different. 

Further, we might also seek to understand what the model interprets as profane, and what words it considers acceptable. I also wonder how important profanity is of a factor in the toxicity score. In connection to HCDS and Model Interpretability, would the LIME tool be applicable to this API? 

In terms of bias, the model uses comments on online forums, the majority of which are often negative in my opinion. Additionally, the vernacular online users may use in their comments might be different from other forms of text. So, the syntax and other specifications of the langauge used may not be representative of all types of speech. Also, the platform on which the training comments were accessed is also important to consider, as on certain platforms, the commenters may behave differently or have different senses of decorum. 