Skip to content

Latest commit

 

History

History
100 lines (70 loc) · 3.91 KB

bco_trainer.mdx

File metadata and controls

100 lines (70 loc) · 3.91 KB

BCO Trainer

TRL supports the Binary Classifier Optimization (BCO). The BCO authors train a binary classifier whose logit serves as a reward so that the classifier maps {prompt, chosen completion} pairs to 1 and {prompt, rejected completion} pairs to 0. For a full example have a look at [examples/scripts/bco.py].

Expected dataset type

The [BCOTrainer] requires an unpaired preference dataset. The [BCOTrainer] supports both conversational and standard dataset format. When provided with a conversational dataset, the trainer will automatically apply the chat template to the dataset.

Expected model format

The BCO trainer expects a model of AutoModelForCausalLM, compared to PPO that expects AutoModelForCausalLMWithValueHead for the value function.

Using the BCOTrainer

For a detailed example have a look at the examples/scripts/bco.py script. At a high level we need to initialize the BCOTrainer with a model we wish to train and a reference ref_model which we will use to calculate the implicit rewards of the preferred and rejected response.

The beta refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the model and ref_model need to have the same architecture (ie decoder only or encoder-decoder).

training_args = BCOConfig(
    beta=0.1,
)

bco_trainer = BCOTrainer(
    model,
    model_ref,
    args=training_args,
    train_dataset=train_dataset,
    processing_class=tokenizer,
)

After this one can then call:

bco_trainer.train()

Underlying Distribution matching (UDM)

In practical scenarios, the thumbs-up and thumbs-down datasets are likely to have divergent underlying distributions of prompts. Consider an LLM deployed for user feedback: if the model excels in writing tasks but underperforms in coding, the thumbs-up dataset will be dominated by writing-related prompts, while the thumbs-down dataset will contain mostly coding-related prompts.
If the prompts in your desired and undesired datasets differ a lot, it is useful to enable UDM.

Choose an embedding model and tokenizer:

embedding_model = AutoModel.from_pretrained(your_model_id)
embedding_tokenizer = AutoTokenizer.from_pretrained(your_model_id)

# customize this function depending on your embedding model
def embed_prompt(input_ids, attention_mask, model):
    outputs = model(input_ids=input_ids, attention_mask=attention_mask)
    return outputs.last_hidden_state.mean(dim=1)

embedding_model = Accelerator().prepare_model(self.embedding_model)
embedding_func = partial(embed_prompt, model=embedding_model)

Set prompt_sample_size to defined how many prompts are selected to train the UDM classifier and start the training with the provided embedding function:

training_args = BCOConfig(
    beta=0.1,
    prompt_sample_size=512,
)

bco_trainer = BCOTrainer(
    model,
    model_ref,
    args=training_args,
    train_dataset=train_dataset,
    processing_class=tokenizer,
    embedding_func=embedding_func,
    embedding_tokenizer=self.embedding_tokenizer,
)

bco_trainer.train()

For Mixture of Experts Models: Enabling the auxiliary loss

MOEs are the most efficient if the load is about equally distributed between experts.
To ensure that we train MOEs similarly during preference-tuning, it is beneficial to add the auxiliary loss from the load balancer to the final loss.

This option is enabled by setting output_router_logits=True in the model config (e.g. MixtralConfig).
To scale how much the auxiliary loss contributes to the total loss, use the hyperparameter router_aux_loss_coef=... (default: 0.001).

BCOTrainer

[[autodoc]] BCOTrainer

BCOConfig

[[autodoc]] BCOConfig