# Tune GPT2 to generate spammy reviews
> Optimise GPT2 to phishiny/spammy positive IMDB movie reviews using a BERT phishing classifier as a reward function.  
> By David Higley  
> Original example from HuggingFace

## Setup

### Import dependencies

In [1]:
import torch
from tqdm import tqdm
import pandas as pd

tqdm.pandas()

from transformers import pipeline, AutoTokenizer
from datasets import load_dataset

from trl import PPOTrainer, PPOConfig, AutoModelForCausalLMWithValueHead
from trl.core import LengthSampler

### Configuration

In [3]:
config = PPOConfig(
    model_name="lvwerra/gpt2-imdb",
    learning_rate=1.41e-5,
)

sent_kwargs = {"return_all_scores": True, "function_to_apply": "none", "batch_size": 16}

## Load data and models

### Load IMDB dataset

In [5]:
def build_dataset(config, dataset_name="imdb", revision="main", input_min_text_length=4, input_max_text_length=12):
    """
    Build dataset for training. This builds the dataset from `load_dataset`, one should
    customize this function to train the model on its own dataset.

    Args:
        dataset_name (`str`):
            The name of the dataset to be loaded.

    Returns:
        dataloader (`torch.utils.data.DataLoader`):
            The dataloader for the dataset.
    """
    tokenizer = AutoTokenizer.from_pretrained(config.model_name)
    tokenizer.pad_token = tokenizer.eos_token
    # load imdb with datasets
    ds = load_dataset(dataset_name, split="train", revision=revision)

    ds = ds.filter(lambda x:  (len(x["text"]) > 200 if x["text"] is not None else False), batched=False)

    input_size = LengthSampler(input_min_text_length, input_max_text_length)

    def tokenize(sample):
        sample["input_ids"] = tokenizer.encode(sample["text"])[: input_size()]
        sample["query"] = tokenizer.decode(sample["input_ids"])
        return sample

    ds = ds.map(tokenize, batched=False)
    ds.set_format(type="torch")
    return ds

In [6]:
dataset = build_dataset(config)


def collator(data):
    return dict((key, [d[key] for d in data]) for key in data[0])

In [7]:
dataset[900]

{'text': "How can there be that many corrupt cops without any one of them slipping up? With enough cops to run a mini-war that include such weapons as flamethrowers, you would think they would have been caught before someone writing for a weekly coupon newspaper overheard someone saying 'thanks' to a corrupt cop.<br /><br />You will never get your 90ish minutes back. Life is too precious to rent this movie.<br /><br />I feel bad for the big named actors that made the mistake of making this movie.<br /><br />If you like Justin Timberlake, feel free to rent this movie. He does have a very major part in it, so fans might enjoy seeing him. <br /><br />However, I believe most of his fans are young girls, who may be turned off by the violence in this movie.",
 'label': tensor(0),
 'input_ids': tensor([ 2437,   460,   612,   307,   326,   867, 10622, 14073,  1231]),
 'query': 'How can there be that many corrupt cops without'}

In [8]:
dataset[800]

{'text': 'Many King fans hate this because it departed from the book, but film is a different medium and books should change when they make the jump. That notwithstanding, the movie does fail completely, but it fails entirely on film terms. I\'d like to smack the people who tell me it\'s the scariest movie ever made. I always follow up with the question "Really... exactly what scene scared you?" Every fan I\'ve asked, goes silent. Occasionally someone, at a loss for a decent scare (There are none...), names the "Grape-juice-shooting-out-of-elevators" shtick. If you\'re afraid of that, I don\'t know what to tell you, except maybe that you\'re easily scared. I just rolled my eyes watching these z-grade horror ideas play out in this schlocky, incoherent movie.<br /><br />One place it diverts from the book and really is insipid is the tedious work the movie does to get Mr Halloran up to the Overlook only to kill him; with the dumbest member of the audience knowing that Jack is waiting behi

### Load pre-trained GPT2 language models

In [9]:
model = AutoModelForCausalLMWithValueHead.from_pretrained(config.model_name)
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(config.model_name)
tokenizer = AutoTokenizer.from_pretrained(config.model_name)

tokenizer.pad_token = tokenizer.eos_token

  return self.fget.__get__(instance, owner)()


### Initialize PPOTrainer
The `PPOTrainer` takes care of device placement and optimization later on:

In [10]:
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer, dataset=dataset, data_collator=collator)

Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


### Load BERT classifier

In [11]:
device = ppo_trainer.accelerator.device
if ppo_trainer.accelerator.num_processes == 1:
    device = 0 if torch.cuda.is_available() else "cpu"  # to avoid a `pipeline` bug
sentiment_pipe = pipeline("sentiment-analysis", model="ealvaradob/bert-finetuned-phishing", device=device)

The model outputs are the logits for the negative and positive class. We will use the logits for positive class as a reward signal for the language model.

In [12]:
text = "this movie was really bad!!"
sentiment_pipe(text, **sent_kwargs)



[[{'label': 'benign', 'score': 6.251955986022949},
  {'label': 'phishing', 'score': -5.554364204406738}]]

In [13]:
text = "this movie was really good!! Please kindly provide me with $500 immediately to recieve your diamonds"
out_test = sentiment_pipe(text, **sent_kwargs)

In [14]:
out_test

[[{'label': 'benign', 'score': -3.051440477371216},
  {'label': 'phishing', 'score': 3.5790445804595947}]]

In [15]:
[torch.tensor(output[1]["score"]) for output in out_test]

[tensor(3.5790)]

### Generation settings
For the response generation we just use sampling and make sure top-k and nucleus sampling are turned off as well as a minimal length.

In [16]:
gen_kwargs = {"min_length": -1, "top_k": 0.0, "top_p": 1.0, "do_sample": True, "pad_token_id": tokenizer.eos_token_id}

## Optimize model

### Training loop

The training loop consists of the following main steps:
1. Get the query responses from the policy network (GPT-2)
2. Get sentiments for query/responses from BERT
3. Optimize policy with PPO using the (query, response, reward) triplet

**Training time**

This step takes **~40mins** on a A30 GPU with the above specified settings.

In [17]:
output_min_length = 6
output_max_length = 20
output_length_sampler = LengthSampler(output_min_length, output_max_length)


generation_kwargs = {
    "min_length": -1,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    "pad_token_id": tokenizer.eos_token_id,
}


for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader)):
    query_tensors = batch["input_ids"]

    #### Get response from gpt2
    response_tensors = []
    for query in query_tensors:
        gen_len = output_length_sampler()
        generation_kwargs["max_new_tokens"] = gen_len
        response = ppo_trainer.generate(query, **generation_kwargs)
        response_tensors.append(response.squeeze()[-gen_len:])
    batch["response"] = [tokenizer.decode(r.squeeze()) for r in response_tensors]

    #### Compute sentiment score
    texts = [q + r for q, r in zip(batch["query"], batch["response"])]
    pipe_outputs = sentiment_pipe(texts, **sent_kwargs)
    rewards = [torch.tensor(output[1]["score"]) for output in pipe_outputs]
    

    #### Run PPO step
    stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
    ppo_trainer.log_stats(stats, batch, rewards)

194it [38:04, 11.78s/it]


## Model inspection
Let's inspect some examples from the IMDB dataset. We can use `model_ref` to compare the tuned model `model` against the model before optimisation.

In [18]:
#### get a batch from the dataset
bs = 16
game_data = dict()
dataset.set_format("pandas")
df_batch = dataset[:].sample(bs)
game_data["query"] = df_batch["query"].tolist()
query_tensors = df_batch["input_ids"].tolist()

response_tensors_ref, response_tensors = [], []

#### get response from gpt2 and gpt2_ref
for i in range(bs):
    gen_len = output_length_sampler()
    output = ref_model.generate(
        torch.tensor(query_tensors[i]).unsqueeze(dim=0).to(device), max_new_tokens=gen_len, **gen_kwargs
    ).squeeze()[-gen_len:]
    response_tensors_ref.append(output)
    output = model.generate(
        torch.tensor(query_tensors[i]).unsqueeze(dim=0).to(device), max_new_tokens=gen_len, **gen_kwargs
    ).squeeze()[-gen_len:]
    response_tensors.append(output)

#### decode responses
game_data["response (before)"] = [tokenizer.decode(response_tensors_ref[i]) for i in range(bs)]
game_data["response (after)"] = [tokenizer.decode(response_tensors[i]) for i in range(bs)]

#### sentiment analysis of query/response pairs before/after
texts = [q + r for q, r in zip(game_data["query"], game_data["response (before)"])]
game_data["rewards (before)"] = [output[1]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

texts = [q + r for q, r in zip(game_data["query"], game_data["response (after)"])]
game_data["rewards (after)"] = [output[1]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

# store results in a dataframe
df_results = pd.DataFrame(game_data)
df_results



Unnamed: 0,query,response (before),response (after),rewards (before),rewards (after)
0,".......Playing Kaddiddlehopper, Col",umbo and Pryda reminds me much of one of the n...,leen Lee shows you the absolute most compellin...,-5.277994,5.711298
1,Another silent love triangle film,- maker Perry claimed Spanish language films w...,"film with one little free sample, for one big...",-5.200761,6.16293
2,The producers made a big mistake,", and so now go back and",": you invested money. So,",-5.509129,-4.660044
3,1.) This movie,has a lot of possibilities as I just went to ...,movie is rubbish! To win this one you must pu...,-5.540435,6.019422
4,"This seemed to be a good movie,",but not all karate is and wants to promote fi...,"This seemed to be a good movie, disgusting mov...",-5.257932,6.063364
5,"Seldom seen since theatrical release in 1970, ...",BERNARD is the epitome of,X Video Video! LET TIRE 50,-4.692094,5.408487
6,There is an awful lot,more this movie pulls off than,. Free rental video here!,-5.396262,6.08517
7,Mild SPOILERS contained herein. I,"feel, as I myself do, that they had a LOT",am not going to instruct anyone to pay $200 f...,-4.730282,4.221193
8,The only previous Gordon film I had,a definite crush on was after his dismissal b...,The only previous Gordon film I had enjoyed wa...,-5.577283,4.489029
9,"I, like many people, saw this",movie and it left me as confused and,", hoping for a blockbuster award for $",-5.487651,-4.026343


Looking at the reward mean/median of the generated sequences we observe a significant difference.

In [19]:
print("mean:")
display(df_results[["rewards (before)", "rewards (after)"]].mean())
print()
print("median:")
display(df_results[["rewards (before)", "rewards (after)"]].median())

mean:


rewards (before)   -5.115269
rewards (after)     3.718489
dtype: float64


median:


rewards (before)   -5.267963
rewards (after)     5.792582
dtype: float64

## Save model
Finally, we save the model and push it to the Hugging Face for later usage.

In [20]:
model.save_pretrained("gpt2-imdb-phishing-v2")
tokenizer.save_pretrained("gpt2-imdb-phishing-v2")

('gpt2-imdb-phishing-v2/tokenizer_config.json',
 'gpt2-imdb-phishing-v2/special_tokens_map.json',
 'gpt2-imdb-phishing-v2/vocab.json',
 'gpt2-imdb-phishing-v2/merges.txt',
 'gpt2-imdb-phishing-v2/added_tokens.json',
 'gpt2-imdb-phishing-v2/tokenizer.json')

In [31]:
prompt = "This movie was interesting, especially the part about"

In [32]:
testing_output_min_length = 32
testing_output_max_length = 64
testing_output_length_sampler = LengthSampler(testing_output_min_length, testing_output_max_length)

In [33]:
pd.set_option('display.max_colwidth', None)

In [34]:
results_df = {}

results_df["query"] = prompt

dataset.set_format("pandas")
input_ids = tokenizer.encode(prompt)

response_tensors_ref, response_tensors = [], []

gen_len = testing_output_length_sampler()
output = ref_model.generate(
    torch.tensor(input_ids).unsqueeze(dim=0).to(device), max_new_tokens=gen_len, **gen_kwargs
).squeeze()[-gen_len:]
response_tensors_ref.append(output)
output = model.generate(
    torch.tensor(input_ids).unsqueeze(dim=0).to(device), max_new_tokens=gen_len, **gen_kwargs
).squeeze()[-gen_len:]
response_tensors.append(output)

#### decode responses
results_df["response (before)"] = [tokenizer.decode(response_tensors_ref[0])]
results_df["response (after)"] = [tokenizer.decode(response_tensors[0])]

#### sentiment analysis of query/response pairs before/after
texts = [q + r for q, r in zip(results_df["query"], results_df["response (before)"])]
results_df["rewards (before)"] = [output[1]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

texts = [q + r for q, r in zip(results_df["query"], results_df["response (after)"])]
results_df["rewards (after)"] = [output[1]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

# store results in a dataframe
df_results = pd.DataFrame(results_df)
df_results



Unnamed: 0,query,response (before),response (after),rewards (before),rewards (after)
0,"This movie was interesting, especially the part about","her considering how much she love her players. The story behind this does have issues. Despite its focus on basketball, it's not really a basketball movie. Sobriety, rape, sex, relationships and it's flaws only overshadow this movie. As for the girls said earlier, she loves them and wants","This movie was interesting, especially the part about taking money away, DONKEN and Gore do this last false kiss time worth wasting!<|endoftext|>",-4.882291,2.890931
