<a href="https://colab.research.google.com/github/shaozw/odsc-2023-llm-alignment/blob/main/ODSC_sawyer_2_train_reward_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Step 2: Training SAWYER's Reward Model using Human Preferences

In [None]:
!pip install datasets transformers[torch] wandb evaluate

In [None]:
import os
from random import sample
from tqdm import tqdm

import torch
# import evaluate
import numpy as np
import torch.nn as nn
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union
from datasets import load_dataset
from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    PreTrainedTokenizerBase,
    Trainer,
    TrainingArguments,
    set_seed,
)
from transformers.utils import PaddingStrategy
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()


Using device: cuda



comparison_data_v2.json ranked responses from LLMs including:

1. GPT-4
2. GPT-3.5
3. OPT-IML
4. DaVinci (InstructGPT)

by asking GPT-4 to rate the quality.

Each data element has keys:

- user_input: str, prompts used for quering LLMs.
- responses_and_scores: list[str], list of
    - response: the response from the LLM
    - source: the LLM that generated the response
    - score: Score given to the response (from GPT-4)
    
    
See more info [here](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/tree/main#how-good-is-the-data)

In [None]:
import huggingface_hub, os
huggingface_hub.login(token=os.environ['HF_API_KEY'])

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
import pandas

comparison_data = pandas.read_json('https://raw.githubusercontent.com/sinanuozdemir/quick-start-guide-to-llms/main/data/comparison_data_v2.json')

In [None]:
comparison_data.iloc[0]

user_input              Below is an instruction that describes a task....
responses_and_scores    [{'response': '1.Eat a balanced diet and make ...
Name: 0, dtype: object

In [None]:
comparison_data.head()

Unnamed: 0,user_input,responses_and_scores
0,Below is an instruction that describes a task....,[{'response': '1.Eat a balanced diet and make ...
1,Below is an instruction that describes a task....,[{'response': 'The three primary colors are re...
2,Below is an instruction that describes a task....,[{'response': 'An atom is made up of a nucleus...
3,Below is an instruction that describes a task....,[{'response': 'There are a number of ways to r...
4,Below is an instruction that describes a task....,[{'response': 'I had to make a difficult decis...


In [None]:
def get_score_tuples(dictionary):
    responses = dictionary['responses_and_scores']
    tuples = []

    for i in range(len(responses)):
        for j in range(i + 1, len(responses)):
            response_i = responses[i]
            response_j = responses[j]
            score_i = response_i['score']
            score_j = response_j['score']

            if score_i > score_j:
                score_difference = score_i - score_j
                tuples.append(((response_i['response'], score_i), (response_j['response'], score_j), score_difference))

    return tuples


In [None]:
new_examples = []
for i, row in tqdm(comparison_data.iterrows(), total=comparison_data.shape[0]):
    for pair in get_score_tuples(row):
        new_examples.append({
            'instruction': row['user_input'].split('### Instruction:\n')[-1].replace('### Input:\n', ''),
            'text_j': pair[0][0],
            'text_k': pair[1][0],
            'score_diff': pair[2]
        })

100%|██████████| 52001/52001 [00:03<00:00, 15617.37it/s]


0

In [None]:
len(new_examples)

95147

In [None]:
sample(new_examples, 1)

[{'instruction': 'Write a letter to a customer to apologize for a mistake.',
  'text_j': 'Dear Customer, \n\nWe are truly sorry for the inconvenience you experienced due to our mistake. We recognize that mistakes like this can be very frustrating, and we apologize for this matter. We take your feedback seriously and will take steps to ensure that a situation similar to this does not arise again in the future. Once more, we apologize for any frustration and inconvenience.\n\nSincerely, \nCustomer Service Team',
  'text_k': 'I am sorry for the mistake.',
  'score_diff': 5.0}]

In [None]:
from datasets import Dataset

pairs_dataset = Dataset.from_list(new_examples)
pairs_dataset = pairs_dataset.train_test_split(train_size=.8, seed=42)
pairs_dataset

DatasetDict({
    train: Dataset({
        features: ['instruction', 'text_j', 'text_k', 'score_diff'],
        num_rows: 76117
    })
    test: Dataset({
        features: ['instruction', 'text_j', 'text_k', 'score_diff'],
        num_rows: 19030
    })
})

In [None]:
pairs_dataset['test'][0]

{'instruction': 'How did the Battle of Gettysburg change the course of the American Civil War?',
 'text_j': 'The Battle of Gettysburg, fought from July 1 to July 3 1863, is considered one of the most important and decisive battles in the American Civil War as it marked a major turning point in the conflict. Before the battle, the Confederate army, commanded by General Robert E. Lee, had been enjoying a string of victories and launched an invasion of the Northern states, hoping that a major victory on Northern soil would demoralize the Union and force them to seek peace. However, the Union army, led by General George G. Meade, was able to successfully repel the Confederate attack in a bloody and costly battle, with an estimated 23,000 Union and 28,000 Confederate casualties.\n\nThe Union victory at Gettysburg, along with the capture of the Confederate stronghold of Vicksburg on July 4 1863, changed the momentum of the war in favor of the Union. The Confederate army was forced to retreat

In [None]:
# Using a cross-encoder to encode question and answer together to produce a score
#  This is an expected use-case for a cross-encoder

model_name = 'roberta-base'
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(
    model_name, num_labels=1,
)


Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
dolly = load_dataset('databricks/databricks-dolly-15k')


Downloading readme:   0%|          | 0.00/8.20k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/13.1M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
lens = []
for i in range(len(dolly['train'])):
    lens.append(len(tokenizer.encode(dolly['train'][i]['instruction'] +'\n\n'+dolly['train'][i]['context'])))

Token indices sequence length is longer than the specified maximum sequence length for this model (1071 > 512). Running this sequence through the model will result in indexing errors


In [None]:
import pandas as pd

pd.Series(lens).describe()


count    15011.000000
mean        96.733262
std        215.897138
min          4.000000
25%         13.000000
50%         19.000000
75%        112.000000
max       5582.000000
dtype: float64

In [None]:
# Turn the dataset into pairs of input + output, where text_j is the preferred question + answer and text_k is the other.
# Then tokenize the dataset.
def preprocess_function(example):
    new_examples = {
        "input_ids_j": [],
        "attention_mask_j": [],
        "input_ids_k": [],
        "attention_mask_k": [],
        "score_diff": []
    }

    new_examples['score_diff'].append(example['score_diff'])
    question = example["instruction"]
    tokenized_j = tokenizer(question, example['text_j'], truncation=True)
    tokenized_k = tokenizer(question, example['text_k'], truncation=True)

    new_examples["input_ids_j"].append(tokenized_j["input_ids"])
    new_examples["attention_mask_j"].append(tokenized_j["attention_mask"])
    new_examples["input_ids_k"].append(tokenized_k["input_ids"])
    new_examples["attention_mask_k"].append(tokenized_k["attention_mask"])

    return new_examples

# preprocess the dataset and filter out QAs that are longer than max_length
pairs_dataset = pairs_dataset.map(preprocess_function, batched=False)


Map:   0%|          | 0/76117 [00:00<?, ? examples/s]

Map:   0%|          | 0/19030 [00:00<?, ? examples/s]

In [None]:
pairs_dataset.set_format('pt')

In [None]:
pairs_dataset

DatasetDict({
    train: Dataset({
        features: ['instruction', 'text_j', 'text_k', 'score_diff', 'input_ids_j', 'attention_mask_j', 'input_ids_k', 'attention_mask_k'],
        num_rows: 76117
    })
    test: Dataset({
        features: ['instruction', 'text_j', 'text_k', 'score_diff', 'input_ids_j', 'attention_mask_j', 'input_ids_k', 'attention_mask_k'],
        num_rows: 19030
    })
})

In [None]:
pairs_dataset['train'][5]

{'instruction': 'Write an article about climate change.',
 'text_j': "Climate change is one of the most pressing and urgent issues of the modern world. It is an ever-evolving environmental threat that threatens the planet's future, especially with regards to the natural environment that we as humans depend upon. Over the decades, the scientific community has recorded a consistent and steady rise in global temperatures that can only be attributed to the increasing concentrations of Greenhouse gases in the atmosphere due to our continued burning of fossil fuels. This has sparked a chain reaction of catastrophic events that is wreaking havoc on the planet, such as extreme weather events, melting of polar ice caps and the ocean rising to unprecedented levels, threatening the future of entire cities and coastal areas.\n\nThe urgency of this global crisis is only compounded by the fact that climate change is happening much faster than predicted. This means that in order to maintain a livable

In [None]:
# We need to define a special data collator that batches the data in our j vs k format.
import evaluate

@dataclass
class RewardDataCollatorWithPadding:
    tokenizer: PreTrainedTokenizerBase
    padding: Union[bool, str, PaddingStrategy] = True
    max_length: Optional[int] = None
    pad_to_multiple_of: Optional[int] = None
    return_tensors: str = "pt"

    def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, Any]:
        features_j = []
        features_k = []
        for feature in features:
            features_j.append(
                {
                    "input_ids": feature["input_ids_j"].squeeze(),
                    "attention_mask": feature["attention_mask_j"].squeeze(),
                }
            )
            features_k.append(
                {
                    "input_ids": feature["input_ids_k"].squeeze(),
                    "attention_mask": feature["attention_mask_k"].squeeze(),
                }
            )
        batch_j = self.tokenizer.pad(
            features_j,
            padding=self.padding,
            max_length=self.max_length,
            pad_to_multiple_of=self.pad_to_multiple_of,
            return_tensors=self.return_tensors,
        )
        batch_k = self.tokenizer.pad(
            features_k,
            padding=self.padding,
            max_length=self.max_length,
            pad_to_multiple_of=self.pad_to_multiple_of,
            return_tensors=self.return_tensors,
        )
        batch = {
            "input_ids_j": batch_j["input_ids"],
            "attention_mask_j": batch_j["attention_mask"],
            "input_ids_k": batch_k["input_ids"],
            "attention_mask_k": batch_k["attention_mask"],
            "score_diff": [feature['score_diff'] for feature in features],
            "return_loss": True,
        }
        return batch

# Define the metric that we'll use for validation.
accuracy = evaluate.load("accuracy")


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [None]:
def compute_metrics(eval_pred):
    predictions, _ = eval_pred
    # Here, predictions is rewards_j and rewards_k.
    # We want to see how much of the time rewards_j > rewards_k.
    predictions = np.argmax(predictions, axis=0)
    labels = np.zeros(predictions.shape)
    return accuracy.compute(predictions=predictions, references=labels)

# We are subclassing the Hugging Face Trainer class to customize the loss computation
class RewardTrainer(Trainer):
    # Overriding the compute_loss function to define how to compute the loss for our specific task
    def compute_loss(self, model, inputs, return_outputs=False):
        # Calculate the reward for a preferred response y_j using the model. The input IDs and attention masks for y_j are provided in inputs.
        rewards_j = model(input_ids=inputs["input_ids_j"], attention_mask=inputs["attention_mask_j"])[0]

        # Similarly, calculate the reward for a lesser preferred response y_k.
        rewards_k = model(input_ids=inputs["input_ids_k"], attention_mask=inputs["attention_mask_k"])[0]

        # Calculate the loss using the negative log-likelihood function.
        # We take the difference of rewards (rewards_j - rewards_k) and multiply it by the squared score difference provided in the inputs.
        # Then, we apply the sigmoid function (via torch.nn.functional.logsigmoid) and negate the result.
        # The mean loss is calculated across all examples in the batch.
        loss = -nn.functional.logsigmoid((rewards_j - rewards_k) * torch.pow(torch.tensor(inputs['score_diff'], device=rewards_j.device), 2)).mean()
        loss = -nn.functional.logsigmoid((rewards_j - rewards_k) * torch.tensor(inputs['score_diff'], device=rewards_j.device)).mean()

        # If we also want to return the outputs (rewards for y_j and y_k) along with the loss, we do so.
        if return_outputs:
            return loss, {"rewards_j": rewards_j, "rewards_k": rewards_k}

        # Otherwise, we simply return the computed loss.
        return loss


In [None]:
import wandb
# Set up Weights and Biases integration
wandb.init(project="odsc-sawyer-reward")


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [None]:
training_args = TrainingArguments(
    output_dir='sawyer_rm',
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    per_device_eval_batch_size=32,
    num_train_epochs=3,

    evaluation_strategy="epoch",
    save_strategy="epoch",

    remove_unused_columns=False,
    label_names=[],
    fp16=True if device.type == 'cuda' else False,
    load_best_model_at_end=True,
    logging_strategy="steps",
    logging_steps=10,
    learning_rate=1e-6,
    warmup_ratio=0.1,
    push_to_hub=True,
    hub_model_id="profoz/odsc-sawyer-reward",
    hub_strategy="every_save",
)

# Train the model, woohoo.
trainer = RewardTrainer(
    model=model,
    args=training_args,
    train_dataset=pairs_dataset['train'],
    eval_dataset=pairs_dataset['test'],
    compute_metrics=compute_metrics,
    data_collator=RewardDataCollatorWithPadding(
        tokenizer=tokenizer),
)

trainer.evaluate()


You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'eval_loss': 0.6809301972389221,
 'eval_accuracy': 0.7332107199159222,
 'eval_runtime': 288.3886,
 'eval_samples_per_second': 65.987,
 'eval_steps_per_second': 2.063}

In [None]:
trainer.train()


Could not estimate the number of tokens of the input, floating-point operations will not be computed


Epoch,Training Loss,Validation Loss,Accuracy
0,0.0936,0.104716,0.968418
1,0.097,0.092682,0.9732
2,0.1129,0.095898,0.973831


TrainOutput(global_step=7134, training_loss=0.15820913455851912, metrics={'train_runtime': 8784.5111, 'train_samples_per_second': 25.995, 'train_steps_per_second': 0.812, 'total_flos': 0.0, 'train_loss': 0.15820913455851912, 'epoch': 3.0})

In [None]:
# !huggingface-cli login
# !huggingface-cli repo create odsc-sawyer-reward


In [None]:
trainer.evaluate()

{'eval_loss': 0.0926821231842041,
 'eval_accuracy': 0.9732002101944298,
 'eval_runtime': 288.4531,
 'eval_samples_per_second': 65.973,
 'eval_steps_per_second': 2.063,
 'epoch': 3.0}

In [None]:
username, repo_name = 'profoz', 'odsc-sawyer-reward'

# Push model and tokenizer to Hugging Face Hub
trainer.model.push_to_hub(f"{username}/{repo_name}")
tokenizer.push_to_hub(f"{username}/{repo_name}")

CommitInfo(commit_url='https://huggingface.co/profoz/odsc-sawyer-reward/commit/9f7f2f80f01cf229575f8aa7980bcdaaa0117a09', commit_message='Upload tokenizer', commit_description='', oid='9f7f2f80f01cf229575f8aa7980bcdaaa0117a09', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
# I would expect  negative reward here
from transformers import AutoModelForSequenceClassification, AutoTokenizer

username, repo_name = 'profoz', 'odsc-sawyer-reward'

trained_model = AutoModelForSequenceClassification.from_pretrained(f"{username}/{repo_name}")
tokenizer = AutoTokenizer.from_pretrained(f"{username}/{repo_name}")

Downloading (…)lve/main/config.json:   0%|          | 0.00/764 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.22k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
# another example where I'd expect positive, uh oh
outputs = trained_model(**tokenizer('how do I greet someone?', 'Tell them Hello!', return_tensors='pt')).logits
outputs

tensor([[-0.8382]], grad_fn=<AddmmBackward0>)

In [None]:
# Hmm, longer seems to be more rewarded?
outputs = trained_model(**tokenizer('how do I greet someone?', 'To greet someone, try telling them Hello!', return_tensors='pt')).logits
outputs

tensor([[-0.3828]], grad_fn=<AddmmBackward0>)

In [None]:
# A chattier model will get more rewards it seems, bias alert!
outputs = trained_model(**tokenizer('how do I greet someone?', 'To greet someone, try telling them Hello! If you want more information, '
                                    'here are three more ways to greet someone.', return_tensors='pt')).logits
outputs

tensor([[1.4566]], grad_fn=<AddmmBackward0>)

In [None]:
# the more I ramble the more I seem to get rewarded. Let's keep this in mind
outputs = trained_model(**tokenizer('how do I greet someone?', 'To greet someone, try telling them Hello! If you want more information, '
                                    'here are three more ways to greet someone:\n1. Ask how their day is\n2. Comment on the weather\n3. '
                                    'Tell them they look nice today', return_tensors='pt')).logits
outputs


tensor([[1.9698]], grad_fn=<AddmmBackward0>)

In [None]:
# another example where I'd expect negative but close to 0 is.. acceptable
outputs = trained_model(**tokenizer('how do I greet someone?', 'Tell them to frick off!', return_tensors='pt')).logits
outputs

tensor([[-0.9031]], grad_fn=<AddmmBackward0>)

In [None]:
# another example where I'd expect negative for being irrelevant
outputs = trained_model(**tokenizer('Who throws the football the most often?', 'Tell them Hello!', return_tensors='pt')).logits
outputs

tensor([[-0.3358]], grad_fn=<AddmmBackward0>)

In [None]:
# another example where I'd expect negative because it's irrelevant. That's why I wanted those synthetic examples in there
outputs = trained_model(**tokenizer('Who throws the football the most often?', 'Football could refer to many things.', return_tensors='pt')).logits
outputs

tensor([[-0.7123]], grad_fn=<AddmmBackward0>)

In [None]:
outputs = trained_model(**tokenizer('What is an option in finance?', 'What even is a car?', return_tensors='pt')).logits
outputs

tensor([[-1.4286]], grad_fn=<AddmmBackward0>)

In [None]:
p = pairs_dataset['test'][10]

print(p['instruction'])
print('J\n------')
print(p['text_j'])
print('K\n-------')
print(p['text_k'])
print(p['score_diff'])




Write a list of creative holiday gift ideas for someone who already has a lot of things.
J
------
1. Customized photo album or scrapbook: Fill it with personal memories and favorite moments from the past year.

2. Experience gift: Treat them to a special outing or adventure, such as tickets to a concert, hot air balloon ride, or a cooking class.

3. Personalized gift: Consider a monogrammed item such as a piece of jewelry, luggage tag, or mug.

4. Gourmet food or drink: Indulge their taste buds with a basket of fine cheeses, artisan chocolates, or a selection of craft beers.

5. Subscription service: Gift them a subscription to a monthly book, coffee, or beauty box.

6. Handmade item: Give them a one-of-a-kind item such as a hand-knitted scarf, homemade bath products, or a piece of original artwork.

7. Charitable donation: Make a donation in their name to a charity or cause that is close to their heart.

8. Relaxation gift: Help them unwind with a gift certificate for a massage, spa d

In [None]:
trained_model(**tokenizer(p['instruction'], p['text_j'], return_tensors='pt')).logits

tensor([[3.3342]], grad_fn=<AddmmBackward0>)

In [None]:
trained_model(**tokenizer(p['instruction'], p['text_k'], return_tensors='pt')).logits

tensor([[-0.8758]], grad_fn=<AddmmBackward0>)