# Exploratory notebook

In [1]:
import pandas as pd

df = pd.read_csv("../data/raw/filtered.tsv", sep="\t", index_col=0)

df.head()

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
0,"If Alkar is flooding her with psychic waste, t...","if Alkar floods her with her mental waste, it ...",0.785171,0.010309,0.014195,0.981983
1,Now you're getting nasty.,you're becoming disgusting.,0.749687,0.071429,0.065473,0.999039
2,"Well, we could spare your life, for one.","well, we can spare your life.",0.919051,0.268293,0.213313,0.985068
3,"Ah! Monkey, you've got to snap out of it.","monkey, you have to wake up.",0.664333,0.309524,0.053362,0.994215
4,I've got orders to put her down.,I have orders to kill her.,0.726639,0.181818,0.009402,0.999348


In [2]:
idx = 1

df.iloc[idx].reference, df.iloc[idx].translation

("Now you're getting nasty.", "you're becoming disgusting.")

# Try load BERT and suggest similar words instead of toxic

In [3]:
from transformers import pipeline

text = df.iloc[idx].reference

# assume that we know "nasty" is bad word and replace with mask
text = text.replace("nasty", "[MASK]")

unmasker = pipeline("fill-mask", model="bert-base-uncased")
bert_result = unmasker(text)

  from .autonotebook import tqdm as notebook_tqdm
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [4]:
bert_result

[{'score': 0.8188696503639221,
  'token': 1000,
  'token_str': '"',
  'sequence': '" you\'re getting nasty.'},
 {'score': 0.10920146852731705,
  'token': 1005,
  'token_str': "'",
  'sequence': "' you're getting nasty."},
 {'score': 0.024344492703676224,
  'token': 1998,
  'token_str': 'and',
  'sequence': "and you're getting nasty."},
 {'score': 0.012281644158065319,
  'token': 2085,
  'token_str': 'now',
  'sequence': "now you're getting nasty."},
 {'score': 0.009781565517187119,
  'token': 2021,
  'token_str': 'but',
  'sequence': "but you're getting nasty."}]

Maybe the person is really getting `married` instead of `nasty`, but this result suggests we have to choose from BERT results the word with higher similarity to the toxic one.

## Compare similarity of BERT results with the toxic word

After a quick search I have found [spacy](https://spacy.io/) library for python which provides functionality to find similarity rate between words.

```bash
pip install -U spacy
python -m spacy download en_core_web_sm
```

In [5]:
import spacy

# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_md")

In [6]:
toxic = nlp("nasty")

for res in bert_result:
    doc = nlp(res["token_str"])
    print(res["token_str"], doc.similarity(toxic))

" 0.07010686360192274
' 0.08257266716177084
and -0.11500005374241065
now 0.1198374419900923
but 0.25597087802656404


## Hooray!

Yes, as we see from scores, `angry` is definetely more similar to `nasty` than `married` is. So we can use this approach to find the most similar word to the toxic one and replace it with the toxic one.