# Experiment with dictionary based model

The idea behind this model is very simple. I will build a dictionary of toxic words with their replacement words in the format: `{<toxic_word>: <neutral_word>}`. I want to preserve the context of the words, but I am not sure it will be working well. To build it, I would filter the given data to take only most similar texts in terms of cosine distance and length. Additionally, to add some pair to the dictionary I am planning to double check the words via simple toxicity classifier provided by HuggingFace library, namely, [`s-nlp/roberta-toxicity-classifier`](https://huggingface.co/s-nlp/roberta_toxicity_classifier). Selected classifier is developed by "s-nlp" team (referenced in the assignment description) and it used fine-tuned Transformer-based model (RoBERTa), which achieved state-of-the-art results at some time.

### Load data

In [38]:
import pandas as pd


# The data should be downloaded, use 1.0-download-raw-data.ipynb notebook
raw_data_path = '../data/raw/filtered.tsv'
df = pd.read_csv(raw_data_path, sep='\t', index_col=0)
df.head(15)

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
0,"If Alkar is flooding her with psychic waste, t...","if Alkar floods her with her mental waste, it ...",0.785171,0.010309,0.014195,0.981983
1,Now you're getting nasty.,you're becoming disgusting.,0.749687,0.071429,0.065473,0.999039
2,"Well, we could spare your life, for one.","well, we can spare your life.",0.919051,0.268293,0.213313,0.985068
3,"Ah! Monkey, you've got to snap out of it.","monkey, you have to wake up.",0.664333,0.309524,0.053362,0.994215
4,I've got orders to put her down.,I have orders to kill her.,0.726639,0.181818,0.009402,0.999348
5,I'm not gonna have a child... ...with the same...,I'm not going to breed kids with a genetic dis...,0.703185,0.206522,0.950956,0.035846
6,"They're all laughing at us, so we'll kick your...",they're laughing at us. We'll show you.,0.618866,0.230769,0.999492,0.000131
7,Maine was very short on black people back then.,there wasn't much black in Maine then.,0.720482,0.1875,0.96368,0.14871
8,"Briggs, what the hell's happening?","Briggs, what the hell is going on?",0.920373,0.0,0.159096,0.841071
9,"Another one simply had no clue what to do, so ...","another simply didn't know what to do, so when...",0.87754,0.101695,0.055371,0.930472


### Explore number of samples filtering by length difference and similarity

In [39]:
df[df.lenght_diff == 0.0]

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
8,"Briggs, what the hell's happening?","Briggs, what the hell is going on?",0.920373,0.0,0.159096,0.841071
16,"I'm famous, and you're done.","I'm famous, and you're dead.",0.817253,0.0,0.000926,0.979738
19,Murder for hire.,murder to order.,0.697667,0.0,0.074589,0.962326
47,"As long as you work for these goombahs, you're...","as long as you work for these patrons, you're ...",0.770949,0.0,0.018583,0.998543
88,What the heck was that all about?,what the fuck was that all about?,0.915779,0.0,0.027907,0.998830
...,...,...,...,...,...,...
577667,And quit shooting up the lawn sign.,and stop shooting at the damn sign.,0.723342,0.0,0.013603,0.999410
577692,"I was just telling Nina about the ""hooked and ...","I was just telling Nina about the ""caught and ...",0.823564,0.0,0.000053,0.991810
577712,You think Hal's banging her?,you think Hal's fucking her?,0.908113,0.0,0.000241,0.998315
577738,We're all doommed.,we are all cursed!,0.611297,0.0,0.001918,0.844721


In [40]:
df[df.similarity > 0.94]

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
161,"Calm down, the more you cry out, the faster yo...","calm down, the more you scream, the faster you...",0.943573,0.068966,0.038697,0.588769
203,But this is absurd!,but this is ridiculous!,0.940884,0.166667,0.004456,0.981858
240,The greedy and the selfish.,greedy and selfish.,0.948259,0.285714,0.007085,0.990820
343,A Revolutionary would already put a bullet in ...,a revolutionary would put a bullet in my head ...,0.944253,0.125000,0.174924,0.966754
376,What were we supposed to do?,what the hell were we supposed to do?,0.944460,0.236842,0.000045,0.723566
...,...,...,...,...,...,...
577493,They're the rats of the sea.,it's the rats of the sea.,0.949521,0.103448,0.996255,0.021550
577522,"If bin Laden would be caught or killed right now,","if Bin Laden was caught or killed now,",0.948106,0.220000,0.643583,0.011261
577560,"It's silly of me, but I'm afraid of them... an...","it's stupid of me, but I'm scared of them........",0.946580,0.087379,0.088424,0.994271
577687,Have you ever... seen breasts?,have you ever seen boobs?,0.945825,0.161290,0.165173,0.941499


In [41]:
df[(df.lenght_diff < 0.1) & (df.similarity > 0.9)]

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
8,"Briggs, what the hell's happening?","Briggs, what the hell is going on?",0.920373,0.000000,0.159096,0.841071
23,I think you are the strangest man I've ever met.,I think you're the weirdest person I've ever met.,0.934353,0.020000,0.003785,0.962527
39,The reward on his head is substantial.,there is considerable reward for his head.,0.908223,0.093023,0.035881,0.910780
43,"I swear to God, the best thing I ever did in m...","I swear to God, the best thing I've ever done ...",0.932305,0.022472,0.999071,0.000900
88,What the heck was that all about?,what the fuck was that all about?,0.915779,0.000000,0.027907,0.998830
...,...,...,...,...,...,...
577576,"Your car is going to flip, shattering your spi...","your car's gonna flip, you'll crush the spinal...",0.915279,0.053571,0.766299,0.117041
577663,Excuse me! Clown?,"excuse me, clown?",0.939452,0.000000,0.009608,0.993706
577703,You're a vulture Corso!,"you're a vulture, Corso.",0.900430,0.040000,0.171164,0.920206
577712,You think Hal's banging her?,you think Hal's fucking her?,0.908113,0.000000,0.000241,0.998315


So, having around 20k of relatively similar texts in terms of cosine distance and length, we can easier try to build dictionary of toxic words out of it. 

## Building dictionary

### Define preprocessing functions for text

In [42]:
# Refernce: PMLDL Lab 3 notebook
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
import re


def lower_text(text: str):
    return text.lower()

def remove_numbers(text: str):
    text_nonum = re.sub(r'\d+', ' ', text)
    return text_nonum

def remove_punctuation(text: str):
    text_nopunct = re.sub(r'[^a-z|\s]+', '', text)
    return text_nopunct

def remove_multiple_spaces(text: str):
    text_no_doublespace = re.sub('\s+', ' ', text).strip()
    return text_no_doublespace

def remove_contracted_forms(text: str):
    text_no_contracted_forms = re.sub("(\w+)'(\w+)", '', text)
    return text_no_contracted_forms

def tokenize_text(text: str) -> list[str]:
    return word_tokenize(text)

def remove_stop_words(tokenized_text: list[str]) -> list[str]:
    return [token for token in tokenized_text if token not in stopwords.words('english')]
    

stemmer = PorterStemmer()


def stem_words(tokenized_text: list[str]) -> list[str]:
    return [stemmer.stem(token) for token in tokenized_text]

In [54]:
def clean(text):
    _text = lower_text(text)
    _text = remove_numbers(_text)
    _text = remove_contracted_forms(_text)
    _text = remove_punctuation(_text)
    _text = remove_multiple_spaces(_text)
    return _text

def tokenize_and_stem(text, inference=False):
    tokenized = tokenize_text(text)
    if not inference:
        tokenized = remove_stop_words(tokenized)
        stemmed = stem_words(tokenized)
        return tokenized, stemmed
    else:
        return tokenized

### Define a text toxicity classifier from HuggingFace

In [20]:
from transformers import pipeline


text_classifier = pipeline("text-classification", model="s-nlp/roberta_toxicity_classifier")
result = text_classifier('stupid')
result

Some weights of the model checkpoint at s-nlp/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'label': 'toxic', 'score': 0.9994775652885437}]

### Algorithm


The algorithm is following: iterating over filtered samples of data (1) apply text preprocessing, (2) apply text tokenization and stemming using `nltk` library, (3) skip equal word's stem in both reference and translation, (4) for other pairs of words apply text toxicity classifier, (5) if conditions on toxic and neutral pair are met, add the pair to the dictionary.

In [44]:
tox_dict = {}
data_for_dict = df[(df.lenght_diff < 0.1) & (df.similarity > 0.9)]

In [45]:
from tqdm import tqdm


for _, row in tqdm(data_for_dict.iterrows(), desc='Building dictionary of toxic words and their replacements', total=len(data_for_dict)):
    ref_cleaned = clean(row.reference)
    trn_cleaned = clean(row.translation)
    ref_tokenized, ref_stemmed = tokenize_and_stem(ref_cleaned)
    trn_tokenized, trn_stemmed = tokenize_and_stem(trn_cleaned)
    if row.ref_tox < row.trn_tox:
        ref_tokenized, trn_tokenized = trn_tokenized, ref_tokenized
        ref_stemmed, trn_stemmed = trn_stemmed, ref_stemmed

    for i in range(len(ref_stemmed)):
        w = ref_stemmed[i]
        if w in trn_stemmed:
            j = trn_stemmed.index(w)
            ref_tokenized[i] = None
            trn_tokenized[j] = None
    
    j = 0
    for i in range(len(ref_tokenized)):
        w1 = ref_tokenized[i]
        w2 = trn_tokenized[j] if j < len(trn_tokenized) else None
        if w1 is not None:
            if w2 is not None:
                # Inference of text toxicity classifier
                w1_toxicity, w2_toxicity = text_classifier(w1), text_classifier(w2)
                if w1 not in tox_dict and w1_toxicity[0]['label'] == 'toxic' and w2_toxicity[0]['label'] == 'neutral': 
                    tox_dict[w1] = w2
                j += 1
        else:
            j += 1
print(f'Dictionary size: {len(tox_dict)}')

Building dictionary of toxic words and their replacements: 100%|██████████| 19756/19756 [1:41:12<00:00,  3.25it/s]   

Dictionary size: 389





Save built dictionary internally.

In [49]:
import pickle


tox_dict_dir = '../data/internal/toxic_dict.pkl' 
with open(tox_dict_dir, 'wb') as f:
    pickle.dump(tox_dict, f)

## Try built dictionary in action (inference)

Load toxic dictionary.

In [50]:
import pickle


tox_dict_dir = '../data/internal/toxic_dict.pkl' 
with open(tox_dict_dir, 'rb') as f:
    tox_dict = pickle.load(f)

Load the same text toxicity classifier for evaluation purposes.

In [56]:
from transformers import pipeline


text_classifier = pipeline("text-classification", model="s-nlp/roberta_toxicity_classifier")
result = text_classifier('stupid bitch')
result

Some weights of the model checkpoint at s-nlp/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'label': 'toxic', 'score': 0.9996129870414734}]

Let's sample some data close to the similarity which was used to build dictionary.

In [63]:
test_data = df[(df.similarity > 0.8) | (df.lenght_diff < 0.2)]
n = 50
random_test_data = test_data.sample(n=n, random_state=42)

Apply manual testing (model inference). The idea here to replace toxic words if we found them in our dictionary.

In [64]:
count_success = 0

for _, row in random_test_data.iterrows():
    ref_tokenized = tokenize_and_stem(row.reference, inference=True)
    trn_tokenized = tokenize_and_stem(row.translation, inference=True)
    if row.ref_tox < row.trn_tox:
        ref_tokenized, trn_tokenized = trn_tokenized, ref_tokenized
    
    answer = []
    for i in range(len(ref_tokenized)):
        w = ref_tokenized[i]
        if w in tox_dict:
            answer.append(tox_dict[w])
        else:
            answer.append(w)
    answer = ' '.join(answer)
    
    if row.ref_tox < row.trn_tox:
        print(f'INIT {text_classifier(row.translation)}: {row.translation}')
        print(f'GOLD {text_classifier(row.reference)}: {row.reference}')
    else:
        print(f'INIT {text_classifier(row.reference)}: {row.reference}')
        print(f'GOLD {text_classifier(row.translation)}: {row.translation}')

    answer_toxicity = text_classifier(answer)
    count_success += answer_toxicity[0]['label'] == 'neutral'
    print(f'PRED {answer_toxicity}: {answer}')
    print()
print(f'Success in {count_success} / {n} samples, i.e., turned the text into a neutral one. (No context preserving check!)')

INIT [{'label': 'toxic', 'score': 0.9995489716529846}]: he's a stupid, lovable kid.
GOLD [{'label': 'neutral', 'score': 0.9982800483703613}]: It's a clueless kid in love.
PRED [{'label': 'neutral', 'score': 0.9994958639144897}]: he 's a linnet , lovable kid .

INIT [{'label': 'toxic', 'score': 0.9943583607673645}]: This tramp is trying to tear apart our family.
GOLD [{'label': 'neutral', 'score': 0.8928444981575012}]: the vagrant is trying to tear our family apart.
PRED [{'label': 'toxic', 'score': 0.9942286014556885}]: This tramp is trying to tear apart our family .

INIT [{'label': 'toxic', 'score': 0.9859511256217957}]: oh, crap. Excuse me.
GOLD [{'label': 'neutral', 'score': 0.9999043941497803}]: Oh, God. Excuse me.
PRED [{'label': 'neutral', 'score': 0.9709659814834595}]: oh , hell . Excuse me .

INIT [{'label': 'toxic', 'score': 0.9915751814842224}]: that will kill him in the end: Thirst.
GOLD [{'label': 'neutral', 'score': 0.9907763600349426}]: That was what would kill him: his 

In [67]:
from tqdm import tqdm


count_success = 0
random_test_data_auto = test_data.sample(n=5000, random_state=42)


for _, row in tqdm(random_test_data_auto.iterrows(), desc='Automatic testing with text toxicity classifier', total=len(random_test_data_auto)):
    ref_tokenized = tokenize_and_stem(row.reference, inference=True)
    trn_tokenized = tokenize_and_stem(row.translation, inference=True)
    if row.ref_tox < row.trn_tox:
        ref_tokenized, trn_tokenized = trn_tokenized, ref_tokenized
    
    answer = []
    for i in range(len(ref_tokenized)):
        w = ref_tokenized[i]
        if w in tox_dict:
            answer.append(tox_dict[w])
        else:
            answer.append(w)
    answer = ' '.join(answer)
    answer_toxicity = text_classifier(answer)
    count_success += answer_toxicity[0]['label'] == 'neutral'

print(f'Success in {count_success} / {len(random_test_data_auto)} samples ({count_success / len(random_test_data_auto) * 100 :.2f}), i.e., turned the text into a neutral one. (No context preserving check!)')

Automatic testing with text toxicity classifier:   0%|          | 0/5000 [00:00<?, ?it/s]

Automatic testing with text toxicity classifier: 100%|██████████| 5000/5000 [19:29<00:00,  4.28it/s]

Success in 2851 / 5000 samples (57.02), i.e., turned the text into a neutral one. (No context preserving check!)





So, dictionary results in relatively small size (389 pairs). To increase its size, we need just take a bigger data sample for building it. As for metric that I used, this method got the result of 57.02% accuracy, which means more than half of samples were turned into neutral text (no meaning preserving check) successfully. 

As it is first try and simple straightforward solution it has low quality and many things can be improved. For example, we might consider to change building algorithm, take special corpora of toxic and neutral words and build a model that will choose the most closest neutral to replace.