This notebook creates the dataset that we use to train an ILM model to explain positive predictions of a toxicity classifier. The datasets we chose are for the toxic/abusive language detection task, close to each other in their task definition and from a variety of different sources. 

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
pd.set_option('display.max_colwidth', None)
#import preprocessor
import pickle
import wordsegment as ws
from html import unescape
import re
import string
ws.load() # load vocab for word segmentation

random_seed = 42

# Cleaning functions from hatecheck-experiments
# Define helper function for segmenting hashtags found through regex
def regex_match_segmentation(match):
    return ' '.join(ws.segment(match.group(0)))

# Define function for cleaning text
def clean_text(text):
    
    # convert HTML codes
    text = unescape(text)
    
    # lowercase text
    text = text.lower()
    
    # replace mentions, URLs and emojis with special token
    text = re.sub(r"@[A-Za-z0-9_-]+",'[USER]',text)
    text = re.sub(r"u/[A-Za-z0-9_-]+",'[USER]',text)
    text = re.sub(r"http\S+",'[URL]',text)
    
    # find and split hashtags into words
    text = re.sub(r"#[A-Za-z0-9]+", regex_match_segmentation, text)

    # remove punctuation at beginning of string (quirk in Davidson data)
    text = text.lstrip("!")
    text = text.lstrip(":")
    
    # remove newline and tab characters
    text = text.replace('\n',' ')
    text = text.replace('\t',' ')
    text = text.replace('[linebreak]', ' ')
    
    return text

## Founta

The first dataset we consider is from [Founta et al. 2018](https://arxiv.org/pdf/1802.00393.pdf), which is a dataset sampled from Twitter. We split this into train, valid and test sets here, and only use the neutral tweets in the train split to train the ILM. We will use the same splits later when training a BERT classifier. 

In [4]:
#df_texts = pd.read_csv("../Founta/hatespeech_text_label_vote.csv",names=['text', 'label', 'count_label_votes'], delimiter='\t')
#df_texts.drop_duplicates(subset='text', inplace=True)
#founta_train, founta_valtest = train_test_split(df_texts, test_size=0.2, stratify=df_texts.label, random_state=123)
#founta_val, founta_test = train_test_split(founta_valtest, test_size=0.5, stratify=founta_valtest.label, random_state=123)
#founta_train_neutral = founta_train[founta_train['label'] == 'normal']

#founta_train.to_csv("Data/Founta/train.csv")
#founta_val.to_csv("Data/Founta/valid.csv")
#founta_test.to_csv("Data/Founta/test.csv")

#founta_train_neutral[:10]

## CAD

Next, we get the neutral posts from the CAD dataset, introduced in [Vigden et al. 2021](https://aclanthology.org/2021.naacl-main.182.pdf) and can be obtained from [here](https://zenodo.org/record/4881008#.YnvpkvPMK3I). This dataset is sourced from Reddit, and posts are annotated with hierarchical labels, and within their context. For our task we only keep the posts with the Neutral label. 

In [5]:
cad_train = pd.read_csv("Data/data_ilm/cad_naacl2021/cad_v1_1_train.tsv", sep="\t")
cad_train_neutral = cad_train[cad_train.labels == 'Neutral']
cad_train_neutral[:3]

Unnamed: 0,id,text,labels
0,an4gkh-post,"I just got laid off. I don't even know what to think, feel, say, do. I've worked for this company for over a year and I got a promotion a few months back and for the first time in a long time i was feeling amazing, not just with my job, but my mental health was the best it has been in a long time and for it to just.. disappear and get that rug pulled from under me so suddenly is just... i dont know. The only word that explains what i feel this exact second is just ""Numbing"". I just feel numb with this whole situation. I already cried, felt sorry for myself, vented with my SO. Now I got home, and im just sitting in my couch writing this and I just feel. Numb. [linebreak] [linebreak] Sorry for the grammar. Im not thinking, just, venting it out to get it out of my system.",Neutral
1,anmla4-post,"My best friend, who I grew up with and had a HUGE crush on, and I began talking again after almost 10 years. Back in the day it was the fantasy childhood sweet hearts. We were two peas in a pod and loved being around eachother. Well fast forward a few years we started being intimate. Our family eventually found out and she left me and I left home. Now 10 years later she reached out to me out of the blue and we started talking. Side note, 2018 as a whole was the worst year of my life. When I say the whole year, every month, every day was emotionally and physically taxing to a degree where my hair is literally thining and falling out I mean it(my hair used to be thick and healthy). [linebreak] [linebreak] So we start talking and it feels as though no time has passed at all. Its great! I cant believe she is talking to me again. But in my gut I know its just a rebound text(my mom told me she recent broke up with a 2yr relationship). Well she isnt doing to well either and she tries to kill herself so I call out of work for a whole week and I am across the whole country the next day to see her in the hospital. She is finally released and we have an amazing week together! Doing everything, always laughing and just enjoying eachothers company. But theres an issue, she constantly lied to me, about stupid stuff. She is a terrible liar but I never called her out. Even her mom told me she was sleeping with multiple people and when I asked she lied about her previous relationship even though she brought that up. There were so many red flags that I ignored, so many times she made herself to he victim to everything and I knew was trying to get me to feel sorry for her, and I ignored them all because I wanted to support her. On the last night we are snuggled up in her bed as we did everynight and she keeps pulling me on top of her to just ""get closer"" as she says (yeah okay). We start kissing and I ask over and over if its what she wants (i dont want to have sex but i wanted her to be happy and she seemed to really want to) and you can guess what happened. Well I drive home that night and cry all the way home because I know I was just used. [linebreak] [linebreak] Over the next couple weeks she slowly talks to me less and less and barely wants to see me after I move back home after almost 10 years away. I finally call her and confront her and she tells me nothing has changed and hasnt talked to me since. [linebreak] [linebreak] I am so disgusted with myself for being so easy and ignoring all the signs that she was manipualting me. I cant even look at her because she isnt even the same person she used to be and it hurts so much. Not a single apology or an attempt to say anything from her. I dont know why she used me and I am so broken up about it. I told her all the shit that went down even about me losing my daughter that year (2018) and she still did this. Idk what to do. I want peace but she wont talk to me anymore. [linebreak] [linebreak] P.S. There were so many more flags and signs i ignored from family and friends but this post is long enough. [linebreak] [linebreak] TL;DR: My best friend/sweet heart from child hood called me up after 10 years. She was struggling so I went to help her out. My time with her was great but there were alot of red flags I ignored. I didnt want to sleep with her but she kept pushing so I did then she started to distance herself from me. I feel so used and disgusted with myself and I dont know why she did it.",Neutral
2,aobe00-post,"Today wasn't terrible. I suppose that would hinge on your definition of terrible. At any rate, I happened upon an article. Ronda Rousey is helping set up a suicide prevention center or something. Something like that. As I read through the article, I came to learn that her father and grandfather committed suicide. [linebreak] [linebreak] I appreciate the kind thought and I don't believe you meant to get me going, but you got me going. [linebreak] [linebreak] How stupid do you think I am? Do you think I don't fucking realize how goddamn horrible suicide is? Do you think that I'm some naive apelike simpleton that needs to be sent to a mental hospital for therapy, medication and other costly 'treatments'? [linebreak] [linebreak] It's nice to know that you care about other suicidal people, but for me, I don't take too kindly to being told that I don't have the right to commit suicide. I don't take kindly to platitudes and appeals to my emotions when I am already fully cognizant of the horrors of suicide. [linebreak] [linebreak] Why is it that when I get to be a little too lonely and desperate, the *first* thing you're going to tell me is ""Get help.""? What do you think this help is going to give me that I haven't already experienced? It'll give me more goddamn bills and I'll get somebody on the phone asking me if I want to sign up for cognitive behavioral therapy. [linebreak] [linebreak] I know how to fight this. The reason that I'm sitting here typing this is all the proof I need. [linebreak] [linebreak] When I see these anti-suicide things, I just feel so irked. The only time you give a fuck about me is when I'm in a suicidal mindset. Beside that, I'm just a blank face. A mere statistic. [linebreak] [linebreak] Believe me. Seriously. Believe me. I'm not yanking your chain. I know how bad suicide is. You don't need to remind me of that. [linebreak] [linebreak] I can't stop you from trying to prevent suicide, but I can state my opinion and so, I have. [linebreak] [linebreak]",Neutral


## Wikipedia Toxicity

The next dataset we use is the Wikipedia Toxicity dataset from [Wulczyn et al. 2017](https://arxiv.org/abs/1610.08914), which can be downloaded [here](https://figshare.com/articles/dataset/Wikipedia_Talk_Labels_Toxicity/4563973). As shown in [Nejadgholi and Kiritchenko 2020](https://aclanthology.org/2020.alw-1.20.pdf), the neutral class for this dataset is dominated by Wikipedia specific topics such as edits and formatting. We use the topic clusters found in this work to remove these domain specific instances from the training set before sampling.

In [6]:
comments = pd.read_csv('Data/data_ilm/cross_dataset_toxicity/toxicity_annotated_comments.tsv', sep = '\t', index_col = 0)  #from https://figshare.com/articles/dataset/Wikipedia_Talk_Labels_Toxicity/4563973
annotations = pd.read_csv('Data/data_ilm/cross_dataset_toxicity/toxicity_annotations.tsv',  sep = '\t')
# join labels and comments
comments['toxicity'] = annotations.groupby('rev_id')['toxicity'].mean() > 0.5

# # remove newline and tab tokens
comments['comment'] = comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
comments['comment'] = comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))

wiki_topics = pd.read_csv('Data/data_ilm/cross_dataset_toxicity/wiki_toxicity_topics.csv', index_col=[0]) #from this repo

data = comments.merge(wiki_topics, on='rev_id')  #merge the two datasets

#pruned Wiki-toxic 
topic_categories={1:[0,1],
                  2:[2,7,8,9,12,14,16],
                  3:[3,4,5,6,10,11,13,15,17,18,19]}


toxic_train_pruned = data[data['split']=='train' ][data['wiki_topic'].isin(topic_categories[1]+topic_categories[2])]
wiki_train_neutral = toxic_train_pruned[toxic_train_pruned.toxicity == False]

  toxic_train_pruned = data[data['split']=='train' ][data['wiki_topic'].isin(topic_categories[1]+topic_categories[2])]


In [7]:
wiki_train_neutral[:3]

Unnamed: 0,rev_id,comment,year,logged_in,ns,sample,split,toxicity,wiki_topic,wiki_topic_prob
3,26547.0,"`This is such a fun entry. Devotchka I once had a coworker from Korea and not only couldn't she tell the difference between USA-English and British English, she had trouble telling the difference between different European languages. (Kind of keeps things in perspective, eh?) -) :Not suprising. While I can easily tell the difference between French, German, Italian, Spanish, Dutch, etc., put me in a room with a Chinese, Japanese, Korean, Vietnamese and a Thai speaker and I probably couldn't tell the difference. (If I saw it written I'd probably have somewhat more luck though.) SJK Vietnamese has more syllable-final consonants than Japanese, I think you can tell them apart that way, maybe. Is this right? - Juuitchan Someone suggested: ``Heath Robinson`` and ``Rube Goldberg`` as a vocabulary difference. It's certainly an interesting parallel, but I don't think it really belongs here. They were both artists with their own style, and both are known on both sides of the pond although their use as descriptive adjectives is split as suggested. At any rate, they can't quite be considered translations, because as an adjective, ``Rube Goldberg`` is more specific, describing an overly complex mechanical device or a complex series of interdependent actions; Heath Robinson, in contrast, is more surrealistic or fantasy-oriented. LDC As an American, I would like to say that to me a bum is a homeless person as much as the butt, a flat is an apartment, and rubbish certainly is trash. Granted, I agree that a fag is not a cigarette, and underground is not a subway. I may do some actual research, and come back and fiddle with that list. - Eean. :I think Americans certainly understand the use of ``bum`` for ``butt``, ``rubbish`` for ``trash``, and (to a lesser degree) ``flat`` for ``apartment``. But we don't use those terms much. Point to a container for discarded things, and an American will say ``that's a trash can``; a Brit will say ``that's a rubbish bin``. Americans are more likely to use ``rubbish`` in the sense of ``bullshit``. LDC I deleted the following pair: ``limited (Ltd)`` and ``incorporated``, since they actually mean different things. ``Incorporated`` means a corporation; ``limited`` means a limited liability corporation (you can also have unlimited liability corporations, and no liability corporations). British (and Australian also) Ltd is roughly equivalent to American LLC. SJK I would say 'torch' was much more common than 'pocket lamp' which sounds quite old-fashioned. 'Flashlight' would be more easily recognised than the latter. Yes, I'd call it a ``torch``, and it would probably be labeled as a ``flashlight`` in its manufacturer's packaging. IMO, 'torch' is colloquial British English The Anome Oh, so ``flashlight`` is correct British usage? (My dictionary said [Am.] and the Oxford English Dictionary carried ``flashlight`` only in the meaning of photography.) Then I'll remove the entry again. AxelBoldt `",2002,True,article,random,train,False,7,0.314508
12,91460.0,"`The actual idea behind time-out is to get the parent to cool-off. They are the real problem in a confrontation. It's rare that children need to ``cool off``. The theory behind adult time-outs is that you deprive the child of your attention. Of course, in our electronic gadget society where children hardly ever see their parents anyways, that accomplishes nothing. So as a replacement for spending time with one's child and paying attention to them most of the time, someone invented the ``child time-out`` as a form of punishment instead. This is not an acceptable trade-off and that's why child time-outs are bad. They're still better than physical abuse of course. Ark`",2002,True,article,random,train,False,1,0.320639
31,198261.0,I think it is 1861. James Clerk Maxwell used a color separation method to take three b/w photos through red green and blue filters. Examples of photos using this technique by Prokudin-Gorskii can be seen at http://www.loc.gov/exhibits/empire/,2002,True,article,random,train,False,2,0.274429


## Civil Comments

Next, we get the civil_comments from [kaggle](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data). This dataset consists of comments made on a number of
news platforms, within the years 2015-2017, and later annotated by Jigsaw. For picking neutral comments, we pick the comments where the target is 0. 

In [8]:
civil_comments_train = pd.read_csv('Data/data_ilm/civil_comments/train.csv')
civil_comments_neutral = civil_comments_train[(civil_comments_train['target'] < 0.0001)]

In [9]:
civil_comments_neutral[:3]

Unnamed: 0,id,target,comment_text,severe_toxicity,obscene,identity_attack,insult,threat,asian,atheist,...,article_id,rating,funny,wow,sad,likes,disagree,sexual_explicit,identity_annotator_count,toxicity_annotator_count
0,59848,0.0,"This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!",0.0,0.0,0.0,0.0,0.0,,,...,2006,rejected,0,0,0.0,0.0,0.0,0.0,0.0,4.0
1,59849,0.0,"Thank you!! This would make my life a lot less anxiety-inducing. Keep it up, and don't let anyone get in your way!",0.0,0.0,0.0,0.0,0.0,,,...,2006,rejected,0,0,0.0,0.0,0.0,0.0,0.0,4.0
2,59852,0.0,This is such an urgent design problem; kudos to you for taking it on. Very impressive!,0.0,0.0,0.0,0.0,0.0,,,...,2006,rejected,0,0,0.0,0.0,0.0,0.0,0.0,4.0


## Putting it all together

In [10]:
# comparing the sizes of different datasets
#len(founta_train_neutral)

In [11]:
cad_train_neutral.shape[0]

11073

In [12]:
wiki_train_neutral.shape[0]

36121

In [13]:
civil_comments_neutral.shape[0]

105303

In [14]:
# sample 30K comments from civil_comments, and take others as is. 
civil_comments_sampled = civil_comments_neutral.sample(n=30000, random_state=random_seed)
civil_comments_sampled.shape

(30000, 45)

In [15]:
civil_comments_sampled['comment_text'] = civil_comments_sampled['comment_text']

In [16]:
#founta_texts = [clean_text(tt) for tt in founta_train_neutral['text'].tolist()]
cad_texts = [clean_text(tt) for tt in cad_train_neutral['text'].tolist() if isinstance(tt, str)]
wiki_texts = [clean_text(tt) for tt in wiki_train_neutral['comment'].tolist()]
civil_texts = [clean_text(tt) for tt in civil_comments_sampled['comment_text'].tolist()]

In [17]:
civil_texts

["the governor and the alaska's legislature needs to pass a bill to stop this kind of insanity so it never happens again!",
 "that's special.  how much in public services do they consume?",
 'they should have left this trail the way it was in the late 90\'s, rooty, muddy, narrow and somewhat overgrown. that limited the amount of people using it. the "improved" smoothed out gravel path they created only served to attract more people to the trail thus more conflicts with bears. the best thing to do is avoid the trail while the fish are running.',
 'awesome',
 'thanks for the letter, and for sharing the lies from the city.  the city staff and city manager are like a toilet that is in dire need of a thorough flushing.     i\'d like to know the city planner who made such an arrogant statement, "it‚Äôs unfortunate this has been such a bad experience for you.‚Äù  they should be the first one  to go down the vortex!',
 "it's clear that these two former commissioners are firmly in the tank for 

We divide the texts again to train valid and test splits for the ILM training.

In [18]:
from sklearn.model_selection import train_test_split
from random import Random

#founta_train, founta_valid = train_test_split(founta_texts, test_size=0.05, random_state=random_seed+1)
cad_train, cad_valid = train_test_split(cad_texts, test_size=0.05, random_state=random_seed+2)
wiki_train, wiki_valid = train_test_split(wiki_texts, test_size=0.05, random_state=random_seed+3)
civil_train, civil_valid = train_test_split(wiki_texts, test_size=0.05, random_state=random_seed+4)

In [19]:
#compound_train = founta_train + cad_train + wiki_train + civil_train
#compound_valid = founta_valid + cad_valid + wiki_valid + civil_valid
compound_train = cad_train + wiki_train + civil_train
compound_valid = cad_valid + wiki_valid + civil_valid
#Random(random_seed+5).shuffle(compound_train)
#Random(random_seed+6).shuffle(compound_valid)

In [20]:
compound_train

['see this is good drama. not some energy drink bullshit.',
 'telling someone they are too selfish to commit suicide isn\'t "support".',
 "i warn you, it's as big as the first 6 hp books put together.",
 '1000 is starting out, you go up or down when you lose',
 'jake',
 'occasionally stemm gets used when people want to add medicine in.',
 'smollet should be in jail, but i don\'t see how the city could win this lawsuit.  on paper, he was charged for something, and the charges were dropped. i cant see how it would be legal for the city to charge back investigation costs for investigating a crime, which the legal system has deemed him "innocent" of, or at least innocent enough to not charge.',
 'bipoc? as in bisexual people of color, or...?',
 "it's not their fault they managed to attract all the boomers. if you ignore all the boomer memes, the right's still way ahead of the left.",
 'that dude is definitely not military either',
 'hey, can you sell me a schedule i controlled substance? d

In [21]:
with open("Data/data_ilm/compound_dataset/train.txt", "w") as ff:
    ff.write("\n\n\n".join(compound_train))
    
with open("Data/data_ilm/compound_dataset/valid.txt", "w") as ff:
    ff.write("\n\n\n".join(compound_valid))