Skip to content

Latest commit

 

History

History
77 lines (52 loc) · 2.7 KB

reward_trainer.mdx

File metadata and controls

77 lines (52 loc) · 2.7 KB

Reward Modeling

TRL supports custom reward modeling for anyone to perform reward modeling on their dataset and model.

Check out a complete flexible example at examples/scripts/reward_modeling.py.

Expected dataset format

The [RewardTrainer] expects a very specific format for the dataset since the model will be trained on pairs of examples to predict which of the two is preferred. We provide an example from the Anthropic/hh-rlhf dataset below:

Therefore the final dataset object should contain two 4 entries at least if you use the default [RewardDataCollatorWithPadding] data collator. The entries should be named:

  • input_ids_chosen
  • attention_mask_chosen
  • input_ids_rejected
  • attention_mask_rejected

Using the RewardTrainer

After preparing your dataset, you can use the [RewardTrainer] in the same way as the Trainer class from 🤗 Transformers. You should pass an AutoModelForSequenceClassification model to the [RewardTrainer], along with a [RewardConfig] which configures the hyperparameters of the training.

Leveraging 🤗 PEFT to train a reward model

Just pass a peft_config in the keyword arguments of [RewardTrainer], and the trainer should automatically take care of converting the model into a PEFT model!

from peft import LoraConfig, TaskType
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from trl import RewardTrainer, RewardConfig

model = AutoModelForSequenceClassification.from_pretrained("gpt2")
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)

...

trainer = RewardTrainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=dataset,
    peft_config=peft_config,
)

trainer.train()

Adding a margin to the loss

As in the Llama 2 paper, you can add a margin to the loss by adding a margin column to the dataset. The reward collator will automatically pass it through and the loss will be computed accordingly.

def add_margin(row):
    # Assume you have a score_chosen and score_rejected columns that you want to use to compute the margin
    return {'margin': row['score_chosen'] - row['score_rejected']}

dataset = dataset.map(add_margin)

RewardConfig

[[autodoc]] RewardConfig

RewardTrainer

[[autodoc]] RewardTrainer