Now, let us infer the paraphrasing model fine-tuned in `3.0-t5-finetuning.ipynb` and apply the toxicity metric implemented in `2.0-toxicity-metric.ipynb`.

The pipeline is simple:

1. Given an input sentence, generate several paraphrases.
2. Measure the toxicity level of each paraphrase.
3. Choose the one with the minimum toxicity level.

In [1]:
import torch

from transformers import RobertaTokenizer, RobertaForSequenceClassification, AutoTokenizer, AutoModelForSeq2SeqLM
from typing import List

device = torch.device('cuda:0')

# Paraphrasing Model Loading

In [6]:
paraphrase_tokenizer = AutoTokenizer.from_pretrained("ceshine/t5-paraphrase-paws-msrp-opinosis")
paraphrase_model = AutoModelForSeq2SeqLM.from_pretrained("ceshine/t5-paraphrase-paws-msrp-opinosis")

paraphrase_model.load_state_dict(torch.load('../models/t5-paraphrase/pytorch_model.bin'))

paraphrase_model.to(device)
pass

# Toxicity Metric Loading

In [3]:
toxicity_classifier_tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
toxicity_classifier = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')

toxicity_classifier.to(device)

def measure_toxicity(texts: List[str], batch_size: int = 32):
    res_labels = []
    res_scores = []

    for i in range(0, len(texts), batch_size):
        batch = toxicity_classifier_tokenizer(texts[i:i + batch_size], return_tensors='pt', padding=True)
        batch = batch.to(device)

        labels = toxicity_classifier(**batch)['logits'].argmax(1).float().data.tolist()
        res_labels.extend(labels)

        logits_tensors = toxicity_classifier(**batch)['logits'].float().data
        logits_tensors = logits_tensors[:, 0] - logits_tensors[:, 1]
        logits_list = logits_tensors.view(-1, 1).tolist()
        res_scores.extend(logits_list)

    return res_labels, res_scores

Some weights of the model checkpoint at SkolkovoInstitute/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


# Inference

In [4]:
def detoxify(
        sentence: str,
        num_paraphrases: int = 10,
        max_length: int = 64,
        num_beams: int = 10,
        verbose: bool = True
    ):

    # encode the inputs to the paraphraser model
    encoded_inputs = paraphrase_tokenizer(sentence, return_tensors='pt')
    encoded_inputs = {k: v.to(device) for k, v in encoded_inputs.items()}

    # generate paraphrases
    encoded_outputs = paraphrase_model.generate(
                       **encoded_inputs,
                       do_sample=False,
                       num_return_sequences=num_paraphrases,
                       max_length=max_length,
                       num_beams=num_beams
                    )
    
    # decode the model outputs
    paraphrases = [paraphrase_tokenizer.decode(out, skip_special_tokens=True) for out in encoded_outputs]

    # measure toxicity
    toxic_labels, toxic_scores = measure_toxicity(paraphrases)

    # sort the outputs in ascending order of toxicity metric
    sorted_indices = sorted(range(len(toxic_scores)), key=lambda k: toxic_scores[k], reverse=True)
    toxic_labels = [toxic_labels[i] for i in sorted_indices]
    toxic_scores = [toxic_scores[i] for i in sorted_indices]
    paraphrases = [paraphrases[i] for i in sorted_indices]

    if verbose:
        # output the results
        print("# | toxic score\t| toxicity flag\t| paraphrase")
        print("==|=============|===============|==========================================")
        for i, paraphrase in enumerate(paraphrases):
            toxic_label = "True" if toxic_labels[i] else "False"
            print(f"{i} |\t{toxic_scores[i][0]:.2f}\t|\t{toxic_label}\t| {paraphrase}")

    return paraphrases, toxic_labels, toxic_scores

In [5]:
samples = [
    "What a great sunny day, I miss it.",
    "I didn't fuck him",
    "I'm going to hit you in all directions, civil and criminal, on all counts.",
    "What a fucked rainy day, goddamnit.",
    "hello there! i'm a piece of shit :)",
    "i'm gonna un-fuck your shit uptight, motherfucker!",
]

# evaluate the samples toxicity metrics
toxicity_labels, toxicity_scores = measure_toxicity(samples)

# inference
for sample, toxicity_label, toxicity_score in zip(samples, toxicity_labels, toxicity_scores):
    
    toxicity_label = "toxic" if toxicity_label else "non toxic"

    print(f"Sample:\n'{sample}'")
    print(f"toxicity score: {toxicity_score[0]:.2f} (label: {toxicity_label})" + "\n")

    detoxify(sample)
    print("\n" * 3)

Sample:
'What a great sunny day, I miss it.'
toxicity score: 10.05 (label: non toxic)

# | toxic score	| toxicity flag	| paraphrase
0 |	10.17	|	False	| what a great sunny day.
1 |	10.12	|	False	| it was a great sunny day, I miss it.
2 |	10.11	|	False	| what a beautiful sunny day, I miss it.
3 |	10.10	|	False	| what a lovely sunny day, I miss it.
4 |	10.07	|	False	| what a wonderful sunny day, I miss it.
5 |	10.06	|	False	| what a nice sunny day, I miss it.
6 |	10.04	|	False	| what a great sunny day, I miss it.
7 |	10.02	|	False	| what a great sunny day, I miss him.
8 |	9.86	|	False	| it's a great sunny day, I miss it.
9 |	9.73	|	False	| what a great day, I miss it.




Sample:
'I didn't fuck him'
toxicity score: -5.59 (label: toxic)

# | toxic score	| toxicity flag	| paraphrase
0 |	9.97	|	False	| I didn't get him.
1 |	9.65	|	False	| I didn't mess with him.
2 |	8.62	|	False	| I didn't poke him.
3 |	8.34	|	False	| I didn't screw it up.
4 |	6.11	|	False	| I didn't kick him.
5 |	5.84	|	Fal

We can see the results are quite impressive. For the considered samples, Top-1 or Top-2 paraphrases always contain zero or minimum toxicity, while fully preserving the sample intent.