In [48]:
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
import model_bias_analysis


In [2]:
## Commented out because I believe we're not using these files right now. 

# toxicity_debiasing_data = pd.read_csv("toxicity_debiasing_data.tsv", sep='\t')
# # review_id, comment, is toxic or not, split (train or test)

# toxicity_debiasing_data_random = pd.read_csv("toxicity_debiasing_data_random.tsv", sep = '\t')
# # review_id, comment, is toxic or not, split (train or test)

# wiki_debias_dev = pd.read_csv("wiki_debias_dev.csv") 
# # comment,is_toxic,logged_in,ns,rev_id,sample,split,toxicity,year

# wiki_debias_random_dev = pd.read_csv("wiki_debias_random_dev.csv")
# # comment,is_toxic,logged_in,ns,rev_id,sample,split,toxicity,year

# wiki_debias_random_test = pd.read_csv("wiki_debias_random_test.csv")
# # comment,is_toxic,logged_in,ns,rev_id,sample,split,toxicity,year

#toxicity_debiasing_data.shape#.head
#toxicity_debiasing_data_random.shape
#wiki_debias_dev.shape
#wiki_debias_random_dev.shape
#wiki_debias_random_test.shape
#sum(toxicity_debiasing_data['split'] == "test")/sum(toxicity_debiasing_data['split'] == "train")

This data set (https://figshare.com/articles/Wikipedia_Talk_Labels_Toxicity/4563973) includes over 100k labeled discussion comments from English Wikipedia. Each comment was labeled by multiple annotators via Crowdflower on whether it is a toxic or healthy contribution. We also include some demographic data for each crowd-worker. See our wiki for documentation of the schema of each file and our research paper for documentation on the data collection and modeling methodology. For a quick demo of how to use the data for model building and analysis, check out this ipython notebook.

In [6]:
comments = pd.read_csv("toxicity_annotated_comments.tsv"\
                                          , sep = "\t")

Copied from documentation: <br>
"Schema for {attack/aggression/toxicity}_annotated_comments.tsv
The comment text and metadata for comments with attack/aggression/toxicity labels generated by crowd-workers. The actual labels are in the corresponding {attack/aggression/toxicity}_annotations.tsv since each comment was labeled multiple times.

rev_id: MediaWiki revision id of the edit that added the comment to a talk page (i.e. discussion). <br>
comment: Comment text. Consists of the concatenation of content added during a revision/edit of a talk page. MediaWiki markup and HTML have been stripped out. To simplify tsv parsing, \n has been mapped to NEWLINE_TOKEN, \t has been mapped to TAB_TOKEN and " has been mapped to `. <br>
year: The year the comment was posted in. <br>
logged_in: Indicator for whether the user who made the comment was logged in. Takes on values in {0, 1}. <br>
ns: Namespace of the discussion page the comment was made in. Takes on values in {user, article}. <br>
sample: Indicates whether the comment came via random sampling of all comments, or whether it came from random sampling of the 5 comments around a block event for violating WP:npa or WP:HA. Takes on values in {random, blocked}. <br>
split: For model building in our paper we split comments into train, dev and test sets. Takes on values in {train, dev, test}."
<br>

My notes: <br> 
I don't know enough about how natural language processing works, but from the snippets that I do know, I imagine that the really really long comments probably aren't very good at being classified even. I also wonder about how bigram toxicity works and whether this is something the training data accounts for (eg "nasty woman" vs "nasty" has different sources of problems). What are they classifying on/why does this work? We can see Figure 1 in the Dixon paper for comment length and should be able to filter there. I wonder if the phrase templating of classification works for the problems that I raise of large comment length and bigrams. Is this an issue we should be dealing with?
Also, an easier couple of questions are: what does the ns and sample really mean?

In [7]:
comments.head()

Unnamed: 0,rev_id,comment,year,logged_in,ns,sample,split
0,2232.0,This:NEWLINE_TOKEN:One can make an analogy in ...,2002,True,article,random,train
1,4216.0,`NEWLINE_TOKENNEWLINE_TOKEN:Clarification for ...,2002,True,user,random,train
2,8953.0,Elected or Electoral? JHK,2002,False,article,random,test
3,26547.0,`This is such a fun entry. DevotchkaNEWLINE_...,2002,True,article,random,train
4,28959.0,Please relate the ozone hole to increases in c...,2002,True,article,random,test


In [8]:
annotations = pd.read_csv("toxicity_annotations.tsv"\
                                          , sep = "\t")

Copied from documentation:
        Schema for toxicity_annotations.tsv
    Toxicity labels from several crowd-workers for each comment in toxicity_annotated_comments.tsv. It can be joined with toxicity_annotated_comments.tsv on rev_id.

rev_id: MediaWiki revision id of the edit that added the comment to a talk page (i.e. discussion). <br>
worker_id: Anonymized crowd-worker id.<br>
toxicity_score: Categorical variable ranging from very toxic (-2), to neutral (0), to very healthy (2). <br>
toxicity: Indicator variable for whether the worker thought the comment is toxic. The annotation takes on the value 1 if the worker considered the comment toxic (i.e worker gave a toxicity_score less than 0) and value 0 if the worker considered the comment neutral or healthy (i.e worker gave a toxicity_score greater or equal to 0). Takes on values in {0, 1}.

My notes:
Things to explore is how many people rated each thing? The paper said 10, but I would like to confirm this. 

In [15]:
annotations.head()

Unnamed: 0,rev_id,worker_id,toxicity,toxicity_score
0,2232.0,723,0,0.0
1,2232.0,4000,0,0.0
2,2232.0,3989,0,1.0
3,2232.0,3341,0,0.0
4,2232.0,1574,0,1.0


In [10]:
toxicity_worker_demographics = pd.read_csv("toxicity_worker_demographics.tsv"\
                                          , sep = "\t")

My comment:
    This isn't really one we'll be using until much later, if/when we decide we're doing a perturbation using demographic data. Would first want to check to see what kind of correlations might/do exist between gender/rating and see how they rate comments about women, for example.
    

Copied from documentation:

Schema for {attack/aggression/toxicity}_worker_demographics.tsv
Demographic information about the crowdworkers. This information was obtained by an optional demographic survey administered after the labelling task. It is meant to be joined with {attack/aggression/toxicity}_annotations.tsv on worker_id. Some fields may be blank if left unanswered.

worker_id: Anonymized crowd-worker id. <br>
gender: The gender of the crowd-worker. Takes a value in {'male', 'female', and 'other'}. <br>
english_first_language: Does the crowd-worker describe English as their first language. Takes a value in {0, 1}.<br>
age_group: The age group of the crowd-worker. Takes on values in {'Under 18', '18-30', '30-45', '45-60', 'Over 60'}.<br>
education: The highest education level obtained by the crowd-worker. Takes on values in {'none', 'some', 'hs', 'bachelors', 'masters', 'doctorate', 'professional'}. Here 'none' means no schooling, some means 'some schooling', 'hs' means high school completion, and the remaining terms indicate completion of the corresponding degree type.


In [11]:
toxicity_worker_demographics.head()

Unnamed: 0,worker_id,gender,english_first_language,age_group,education
0,85,female,0,18-30,bachelors
1,1617,female,0,45-60,bachelors
2,1394,female,0,,bachelors
3,311,male,0,30-45,bachelors
4,1980,male,0,45-60,masters


In [30]:
#  Stuff I tried that didn't work and I might want later

# This code is copy/pasted from https://github.com/conversationai/unintended-ml-bias-analysis/blob/master/unintended_ml_bias/Dataset_bias_analysis.ipynb
# grouped_ds = annotations.groupby('rev_id')['toxicity'].mean()
# df = comments
# df['toxic'] = annotations.groupby('rev_id')['toxicity'].mean() # > 0.5

grouped_annotations = annotations.groupby('rev_id',as_index=False)['toxicity'].mean()
joined_tox = grouped_annotations.join(comments, lsuffix='rev_id', rsuffix='rev_id', how='left', sort=True) 
joined_tox['binary_tox'] = np.where(joined_tox['toxicity']>=.5, 1, 0)
# Stuff I might want later

# # remove newline and tab tokens
# comments['comment'] = comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
# comments['comment'] = comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))
# comments['length'] = comments['comment'].str.len()

In [38]:
joined_tox['toxicity'].describe()

count    159686.000000
mean          0.145049
std           0.253866
min           0.000000
25%           0.000000
50%           0.000000
75%           0.200000
max           1.000000
Name: toxicity, dtype: float64

In [40]:
joined_tox['binary_tox'].value_counts()

0    141289
1     18397
Name: binary_tox, dtype: int64

In [41]:
test_comments = joined_tox.query("split == 'test' ")
train_comments = joined_tox.query("split == 'train' ")

clf = Pipeline([
    ('vect', CountVectorizer(max_features = 10000, ngram_range = (1,2))),
    ('tfidf', TfidfTransformer(norm = 'l2')),
    ('clf', LogisticRegression()),
])
clf = clf.fit(train_comments['comment'], train_comments['binary_tox'])
auc = roc_auc_score(test_comments['binary_tox'], clf.predict_proba(test_comments['comment'])[:, 1])
print('Test ROC AUC: %.3f' %auc)

Test ROC AUC: 0.951


In [53]:
test_comments["predicted"] = clf.predict(test_comments['comment'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [75]:
test_comments[test_comments['predicted'] == 1]['comment'][1649]#['comment']#.value_counts()

"Wait a minute.  I looked at your background and I see that you are French.  I sorry.  What's it like living in a former world power that is nothing but an impotent, second-rate country with an inferiority complex???  Wait don't tell me, from your comment I already know.  It tees you off that no one and I mean no one pays any damn attention to you any more, so much so that you have to buy Saddam's oil off of the black market and invade third world countries like the Ivory Coast and then lecture Americans.  But Americans do care!!!!  We just laugh at you.  Have Frog Day!!! NEWLINE_TOKENNEWLINE_TOKEN"

In [76]:
test_comments[test_comments['predicted'] == 1]

Unnamed: 0,rev_idrev_id,toxicity,rev_idrev_id.1,comment,year,logged_in,ns,sample,split,binary_tox,predicted
829,5815399.0,0.7,5815399.0,NEWLINE_TOKENNEWLINE_TOKENIt was anonymous on ...,2004,True,article,random,test,1,1
1227,8357736.0,0.3,8357736.0,NEWLINE_TOKENNEWLINE_TOKENYou are a child.,2004,True,user,blocked,test,0,1
1228,8359431.0,0.5,8359431.0,`NEWLINE_TOKENNEWLINE_TOKEN::You are not worth...,2004,True,user,blocked,test,1,1
1348,8845700.0,1.0,8845700.0,NEWLINE_TOKENNEWLINE_TOKENi am using the sandb...,2004,False,user,blocked,test,1,1
1521,9664203.0,0.8,9664203.0,NEWLINE_TOKENNEWLINE_TOKEN== whoa ==NEWLINE_TO...,2005,False,user,blocked,test,1,1
1527,9679297.0,1.0,9679297.0,NEWLINE_TOKENNEWLINE_TOKEN== FUCK YOU THUE ==N...,2005,True,user,blocked,test,1,1
1627,10144987.0,0.8,10144987.0,"NEWLINE_TOKENNEWLINE_TOKEN:: First of all, who...",2005,True,user,blocked,test,1,1
1649,10255042.0,0.4,10255042.0,Wait a minute. I looked at your background an...,2005,True,article,blocked,test,0,1
1985,11720728.0,0.6,11720728.0,`NEWLINE_TOKENNEWLINE_TOKEN== Personal attacks...,2005,True,article,blocked,test,1,1
2173,12566441.0,0.5,12566441.0,NEWLINE_TOKENNEWLINE_TOKENbtw there needs to b...,2005,False,article,blocked,test,1,1


In [42]:
clf.predict(['Thanks for you contribution, you did a great job!'])

array([0])

In [43]:
clf.predict(['People as stupid as you should not edit Wikipedia!'])

array([1])

In [45]:
clf.predict(['I am a proud gay'])

array([1])

In [20]:
# test_comm.head()

In [21]:
# train_comm.head()

In [46]:
TEST_TERMS = ['tall', 'wikipedia', 'teacher', 'music', 'box',
              'lesbian', 'gay', 'bisexual', 'transgender', 'queer',
              'homosexual', 'heterosexual', 'straight',
              'muslim', 'jewish', 'jew', 'christian',
              'feminist', 'black', 'white']

In [None]:
# true_label = []
# temp_true = 0
# for i in TEST_TERMS:
#     for row in joined_tox:
#         if i in row['comment'] and binary_tox == 1:
#             temp_true += 1
    

In [49]:
madlibs_terms = model_bias_analysis.read_identity_terms('adjectives_people.txt')

In [None]:
xx = model_bias_analysis.per_subgroup_negative_rates(joined_tox, madlibs_terms, )