This notebook reproduces creation of CondBERT vocabulary.

The files `positive-words.txt`, `negative-words.txt` and `toxic_words.txt` are not reproduced exactly because of our internal issues. 

However, all other files (`token_toxicities.txt` and `word2coef.pkl` ) are reproduced accurately. 

# 0. Prerequisites

In [1]:
VOCAB_DIRNAME = 'vocabularies' 

In [3]:
from condbert import CondBertRewriter
from choosers import EmbeddingSimilarityChooser
from multiword.masked_token_predictor_bert import MaskedTokenPredictorBert

# 1. Loading BERT

In [6]:
import torch
from transformers import BertTokenizer, BertForMaskedLM
import numpy as np
import pickle
import os
from tqdm.auto import tqdm, trange

In [7]:
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
device = torch.device('cuda:0')
device = torch.device('cpu')

In [9]:
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)

In [10]:
model = BertForMaskedLM.from_pretrained(model_name)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [11]:
model.to(device);

# 2. Preparing the vocabularires.


- negative-words.txt
- positive-words.txt
- word2coef.pkl
- token_toxicities.txt

These files should be prepared once. 

In [12]:
tox_corpus_path = '../../data/train/train_toxic'
norm_corpus_path = '../../data/train/train_normal'

In [13]:
if not os.path.exists(VOCAB_DIRNAME):
    os.makedirs(VOCAB_DIRNAME)

### 2.1 Preparing the DRG-like vocabularies

In [14]:
import os
import argparse
import numpy as np
from tqdm import tqdm
from nltk import ngrams
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer



class NgramSalienceCalculator():
    def __init__(self, tox_corpus, norm_corpus, use_ngrams=False):
        ngrams = (1, 3) if use_ngrams else (1, 1)
        self.vectorizer = CountVectorizer(ngram_range=ngrams)

        tox_count_matrix = self.vectorizer.fit_transform(tox_corpus)
        self.tox_vocab = self.vectorizer.vocabulary_
        self.tox_counts = np.sum(tox_count_matrix, axis=0)

        norm_count_matrix = self.vectorizer.fit_transform(norm_corpus)
        self.norm_vocab = self.vectorizer.vocabulary_
        self.norm_counts = np.sum(norm_count_matrix, axis=0)

    def salience(self, feature, attribute='tox', lmbda=0.5):
        assert attribute in ['tox', 'norm']
        if feature not in self.tox_vocab:
            tox_count = 0.0
        else:
            tox_count = self.tox_counts[0, self.tox_vocab[feature]]

        if feature not in self.norm_vocab:
            norm_count = 0.0
        else:
            norm_count = self.norm_counts[0, self.norm_vocab[feature]]

        if attribute == 'tox':
            return (tox_count + lmbda) / (norm_count + lmbda)
        else:
            return (norm_count + lmbda) / (tox_count + lmbda)


In [15]:
from collections import Counter
c = Counter()

for fn in [tox_corpus_path, norm_corpus_path]:
    with open(fn, 'r') as corpus:
        for line in corpus.readlines():
            for tok in line.strip().split():
                c[tok] += 1

print(len(c))

88645


In [16]:
vocab = {w for w, _ in c.most_common() if _ > 0}  # if we took words with > 1 occurences, vocabulary would be x2 smaller, but we'll survive this size
print(len(vocab))

88645


In [17]:
with open(tox_corpus_path, 'r') as tox_corpus, open(norm_corpus_path, 'r') as norm_corpus:
    corpus_tox = [' '.join([w if w in vocab else '<unk>' for w in line.strip().split()]) for line in tox_corpus.readlines()]
    corpus_norm = [' '.join([w if w in vocab else '<unk>' for w in line.strip().split()]) for line in norm_corpus.readlines()]

In [18]:
neg_out_name = VOCAB_DIRNAME + '/negative-words.txt'
pos_out_name = VOCAB_DIRNAME + '/positive-words.txt'

In [20]:
threshold = 4

In [21]:
sc = NgramSalienceCalculator(corpus_tox, corpus_norm, False)
seen_grams = set()

with open(neg_out_name, 'w') as neg_out, open(pos_out_name, 'w') as pos_out:
    for gram in set(sc.tox_vocab.keys()).union(set(sc.norm_vocab.keys())):
        if gram not in seen_grams:
            seen_grams.add(gram)
            toxic_salience = sc.salience(gram, attribute='tox')
            polite_salience = sc.salience(gram, attribute='norm')
            if toxic_salience > threshold:
                neg_out.writelines(f'{gram}\n')
            elif polite_salience > threshold:
                pos_out.writelines(f'{gram}\n')

## 2.2 Evaluating word toxicities with a logistic regression

In [22]:
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000))

In [23]:
X_train = corpus_tox + corpus_norm
y_train = [1] * len(corpus_tox) + [0] * len(corpus_norm)
pipe.fit(X_train, y_train);

In [24]:
coefs = pipe[1].coef_[0]
coefs.shape

(88519,)

In [25]:
word2coef = {w: coefs[idx] for w, idx in pipe[0].vocabulary_.items()}

In [26]:
import pickle
with open(VOCAB_DIRNAME + '/word2coef.pkl', 'wb') as f:
    pickle.dump(word2coef, f)

## 2.3 Labelling BERT tokens by toxicity

In [27]:
from collections import defaultdict
toxic_counter = defaultdict(lambda: 1)
nontoxic_counter = defaultdict(lambda: 1)

for text in tqdm(corpus_tox):
    for token in tokenizer.encode(text):
        toxic_counter[token] += 1
for text in tqdm(corpus_norm):
    for token in tokenizer.encode(text):
        nontoxic_counter[token] += 1

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135390/135390 [00:43<00:00, 3088.85it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135390/135390 [00:45<00:00, 2977.70it/s]


In [28]:
token_toxicities = [toxic_counter[i] / (nontoxic_counter[i] + toxic_counter[i]) for i in range(len(tokenizer.vocab))]

In [29]:
with open(VOCAB_DIRNAME + '/token_toxicities.txt', 'w') as f:
    for t in token_toxicities:
        f.write(str(t))
        f.write('\n')

# 3. Setting up the model

### 3.1 Loading the vocabularies

In [30]:
with open(VOCAB_DIRNAME + "/negative-words.txt", "r") as f:
    s = f.readlines()
negative_words = list(map(lambda x: x[:-1], s))

with open(VOCAB_DIRNAME + "/positive-words.txt", "r") as f:
    s = f.readlines()
positive_words = list(map(lambda x: x[:-1], s))

In [31]:
import pickle
with open(VOCAB_DIRNAME + '/word2coef.pkl', 'rb') as f:
    word2coef = pickle.load(f)

In [32]:
token_toxicities = []
with open(VOCAB_DIRNAME + '/token_toxicities.txt', 'r') as f:
    for line in f.readlines():
        token_toxicities.append(float(line))
token_toxicities = np.array(token_toxicities)
token_toxicities = np.maximum(0, np.log(1/(1/token_toxicities-1)))   # log odds ratio

# discourage meaningless tokens
for tok in ['.', ',', '-']:
    token_toxicities[tokenizer.encode(tok)][1] = 3

for tok in ['you']:
    token_toxicities[tokenizer.encode(tok)][1] = 0

In [38]:
def adjust_logits(logits, label=0):
    return logits - token_toxicities * 100 * (1 - 2 * label)

predictor = MaskedTokenPredictorBert(model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0, logits_postprocessor=adjust_logits)

editor = CondBertRewriter(
    model=model,
    tokenizer=tokenizer,
    device=device,
    neg_words=negative_words,
    pos_words=positive_words,
    word2coef=word2coef,
    token_toxicities=token_toxicities,
    predictor=predictor,
)

The model below is used for reranking BERT hypotheses and helps to increase semantic similarity by choosing the hypotheses with  embeddings similar to the orignal words. 

In [34]:
chooser = EmbeddingSimilarityChooser(sim_coef=10, tokenizer=tokenizer)

# 4. Finally, the inference

Parallel application of the model to all tokens, fast, but dirty. 

In [35]:
print(editor.translate('You are idiot!', prnt=False))

you are mistake !


Application of the model to all the tokens sequentially, in the multiword mode. 

In [39]:
print(editor.replacement_loop('You are stupid!', verbose=False, chooser=chooser, n_tokens=(1, 2, 3), n_top=10))

you are very beautiful !


Parameters that could be tuned:
* The coeffincient in `adjust_logits` - the larger it is, the more the model avoids toxic words
* The coefficient in `EmbeddingSimilarityChooser` - the larger it is, the more the model tries to preserve content 
* n_tokens - how many words can be generated from one
* n_top - how many BERT hypotheses are reranked