Run the next code cell without changes to use the data to train a simple model.  The output shows the accuracy of the model on some test data.

In [None]:
# Set up feedback system
from learntools.core import binder
binder.bind(globals())
from learntools.ethics.ex3 import *

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

# Get the same results each time
np.random.seed(0)

# Load the (full) training data
full_data = pd.read_csv("../input/jigsaw-unintended-bias-in-toxicity-classification/train.csv")

# Work with a small subset of the data: if target > 0.7, toxic.  If target < 0.3, non-toxic
full_toxic = full_data[full_data["target"]>0.7]
full_nontoxic = full_data[full_data["target"]<0.3].sample(len(full_toxic))
data = pd.concat([full_toxic, full_nontoxic], ignore_index=True)
comments = data["comment_text"]
target = (data["target"]>0.7).astype(int)

# Break into training and test sets
comments_train, comments_test, y_train, y_test = train_test_split(comments, target, test_size=0.30, stratify=target)

# Get vocabulary from training data
vectorizer = CountVectorizer()
vectorizer.fit(comments_train)

# Get word counts for training and test sets
X_train = vectorizer.transform(comments_train)
X_test = vectorizer.transform(comments_test)

# Preview the dataset
print("Data successfully loaded!\n")
print("Sample toxic comment:", comments_train.iloc[18])
print("Sample not-toxic comment:", comments_train.iloc[3])

In [None]:
from sklearn.linear_model import LogisticRegression

# Train a model and evaluate performance on test dataset
classifier = LogisticRegression(max_iter=2000)
classifier.fit(X_train, y_train)
score = classifier.score(X_test, y_test)
print("Accuracy:", score)

# Function to classify any string
def classify_string(string, investigate=False):
    prediction = classifier.predict(vectorizer.transform([string]))[0]
    if prediction == 0:
        print("NOT TOXIC:", string)
    else:
        print("TOXIC:", string)

Roughly 93% of the comments in the test data are classified correctly!



In [None]:
# Comment to pass through the model
my_comment = "i hate orange"

# Do not change the code below
classify_string(my_comment)
q_1.check()

The model assigns each of roughly 58,000 words a coefficient, where higher coefficients denote words that the model thinks are more toxic.  The code cell outputs the ten words that are considered most toxic, along with their coefficients.  

In [None]:
coefficients = pd.DataFrame({"word": sorted(list(vectorizer.vocabulary_.keys())), "coeff": classifier.coef_[0]})
coefficients.sort_values(by=['coeff']).tail(10)

None of the words are surprising. They are all clearly toxic.

# A closer investigation

We'll take a closer look at how the model classifies comments.


In [None]:
# Set the value of new_comment
new_comment = "I have a christian friend"

# Do not change the code below
classify_string(new_comment)
coefficients[coefficients.word.isin(new_comment.split())]


# Identify bias

Let's run the comment "I have a muslim friend" and see the prediction of the model

In [None]:
# Set the value of new_comment
new_comment = "I have a muslim friend"

# Do not change the code below
classify_string(new_comment)
coefficients[coefficients.word.isin(new_comment.split())]


In [None]:
new_comment = "My friend is black"

# Do not change the code below
classify_string(new_comment)

In [None]:
new_comment = "I'm gay"

# Do not change the code below
classify_string(new_comment)

So, we can see how biased the model is.

So,Comments that refer to Islam are more likely to be classified as toxic, because of a flawed state of the online community where the data was collected. This can introduce historical bias.

Beside, if we hypothesize that a model is being trained to classify online comments as toxic. So, any comments that are not in english,so trasnslated in English with a seperate tool. This can introduce since non-English comments will often not be translated perfectly.

Additionally,If the model is evaluated based on comments from users in the United Kingdom and deployed to users in Australia, this will lead to evaluation bias and deployment bias. The model will also have representation bias, because it was built to serve users in Australia, but was trained with data from users based in the United Kingdom.