Does DPOTrainer loss mask the prompts? #1041

tokestermw · 2023-11-29T17:39:24Z

Hi quick question, so DataCollatorForCompletionOnlyLM will train only on the responses by loss masking the prompts.

Does it work this way with DPOTrainer (DPODataCollatorWithPadding) as well? Looking at the code, it does look like it trains on the prompts. But maybe it doesn't matter with the DPO loss.

The reason I ask is my dataset has long prompts, but short responses. And the resulting model trained by DPO is barely different from the reference model (even with accuracy 90+%). So loss masking the prompts may help focus the learning, just a guess.

Thanks!

lvwerra · 2023-11-30T10:29:06Z

As far as I know we don't mask prompts in DPO but @kashif might know more about this.

rpowalski · 2023-12-01T12:24:43Z

I am having the same problem with the DPO training not improving the baseline model

Nevertheless, DPODataCollatorWithPadding seems to put an ignore index for label indexes corresponding to prompt so probably the reason is elsewhere
Here is the relevant piece of code that does that:

trl/trl/trainer/utils.py

Line 384 in baa8f09

    
           chosen_sequence_tokens["labels"][: len(prompt_tokens["input_ids"])] = [self.label_pad_token_id] * len(

tokestermw · 2023-12-01T17:16:59Z

ah thanks! @rpowalski

sadaisystems · 2024-03-15T11:26:43Z

Gentlemen, If I understood your discussion correctly (and the code snippet mentioned), DPOTrainer does in fact mask out the prompt tokens by default? Were you able to figure out why your models did not improve?

@rpowalski @tokestermw

lvwerra added the DPO Question related to DPO and DPOTrainer label Nov 30, 2023

tokestermw closed this as completed Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does DPOTrainer loss mask the prompts? #1041

Does DPOTrainer loss mask the prompts? #1041

tokestermw commented Nov 29, 2023

lvwerra commented Nov 30, 2023

rpowalski commented Dec 1, 2023

tokestermw commented Dec 1, 2023

sadaisystems commented Mar 15, 2024 •

edited

Does DPOTrainer loss mask the prompts? #1041

Does DPOTrainer loss mask the prompts? #1041

Comments

tokestermw commented Nov 29, 2023

lvwerra commented Nov 30, 2023

rpowalski commented Dec 1, 2023

tokestermw commented Dec 1, 2023

sadaisystems commented Mar 15, 2024 • edited

sadaisystems commented Mar 15, 2024 •

edited