DPO Trainer by kashif · Pull Request #416 · huggingface/trl

kashif · 2023-06-08T11:29:56Z

Initial DPOTrainer class for #405 by copying the ~~PPOTrainer~~ RewardTrainer and started to implement changes in it

Fixes #405

HuggingFaceDocBuilderDev · 2023-06-10T07:46:17Z

The documentation is not available anymore as the PR was closed or merged.

Forbu · 2023-06-14T16:15:52Z

Thank you @kashif.
I think one interesting stuff to do would be to make the forward pass of the model_ref in no_grad mode no ?
You don't need to compute the backward pass for the ref model because you don't want to modify the ref model weights.

kashif · 2023-06-14T16:18:36Z

agree @Forbu and I think we will refactor the data-collator so that we only have a mask on the positive and negative part of the sequence...

gaetanlop · 2023-06-15T03:42:02Z

Hello @kashif, thanks for the DPO integration. I did a refactoring of the data collator to compute the mean logprobs only on the positive and negative part of the sequence. Can you share with me the branch please?

kashif · 2023-06-15T06:06:36Z

@gaetanlop i have added you to my fork

gaetanlop · 2023-06-15T12:54:30Z

Thanks @kashif, just pushed the required changes to make the trainer not compute the mean logprobs on masked input_ids. I used a similar approach as the HF DataCollatorForTokenClassification (https://github.com/huggingface/transformers/blob/main/src/transformers/data/data_collator.py)

TevenLeScao · 2023-06-16T15:22:34Z

Flagging that at some point we'll want the ScriptArguments to be consistent with the TrainingArguments (log_with -> report_to, batch_size -> per_device_train_batch_size, model_name -> model_name_or_path) the second and third ones are especially important since the semantic of the argument is actually different.

TevenLeScao · 2023-06-16T19:18:44Z

@kashif @gaetanlop a fix to enable distributed training required a change to transformers.Trainer (huggingface/transformers#24326), you'll need to pull transformers from master if you pull it!

eric-mitchell · 2023-07-05T21:38:54Z

@kashif just to follow up again, the re-run of the DPO stage of Pythia on HH just completed, and training is basically unchanged there as well after the bug fix on our end. So comparing with our original runs should be fine.

You can see the SFT/DPO runs before and after the bug fix here.

eric-mitchell · 2023-07-14T06:53:36Z

@kashif just wanted to check in on this- curious if you've had the chance to re-run the replication experiment :)

kashif · 2023-07-14T07:00:58Z

@eric-mitchell i did but not with a SFT'ed pythia... also with Peft worked... We had validation loss around 0.6 or so as per your wandb so that was nice... if you can share your SFT'ed pythia model with me I can also run it now with that?

eric-mitchell · 2023-07-14T17:19:03Z

Here are the weights to our pre-trained Pythia. You can load with model.load_state_dict(torch.load(PATH)['state']).

It's nice that peft worked! What type of peft did you use?

kashif · 2023-07-14T17:25:56Z

@eric-mitchell thanks! yes we tried QLora and Lora as well.. let me confirm. Thanks for the weights!

lvwerra

Two small nits, then we can merge :)

* initial DPO Trainer * typo * initial dpo from reward trainer * calc. log_probs from logits * remove dpo config for now * fix inits * add intial DPODataCollatorWithPadding * use the RewardDataCollatorWithPadding * initial test * means of loss * add assert * just call the train instead of step * functional debug example before refactor * check the params have changed * initial DPODataCollatorWithPadding * Data collator with masking * going through trainer.accelerate to wrap ref_model * style / imports * style / imports * `broadcast_buffers=False` fix to distributed training * better fix for DDP issues * arguments and style clean-up * better doc, some light refactoring * better imports * initial dpo doc * fix test * fix formatting * fix * called models once * fix tests * add example * fix doc string * intitial example with anthropic hh dataset * refactored dpo trainer * revert * return metrics * fixed tests * updated docs * update test * fixed typo * note about the beta * added dpo authors * fix docstrings * add prediction_step * remove compute_metrics and log metrics manually * fix typo * add DPOTrainer doc * add dpo to toc * ValueError * add to index and example * fix docs * fix assert --------- Co-authored-by: TevenLeScao <teven.lescao@gmail.com> Co-authored-by: Gaetan LOPEZ <gaetanloplat@gmail.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com>

initial DPO Trainer

d79b179

kashif marked this pull request as draft June 8, 2023 11:30

kashif and others added 12 commits June 8, 2023 14:20

typo

6bca788

initial dpo from reward trainer

aabcdbc

calc. log_probs from logits

5f0f488

remove dpo config for now

c167903

fix inits

9620378

add intial DPODataCollatorWithPadding

8eac1e3

use the RewardDataCollatorWithPadding

8209044

initial test

bc073eb

means of loss

c0f5d46

add assert

7ddb7e4

just call the train instead of step

5fda3b0

functional debug example before refactor

b6c31c7

check the params have changed

1a64568

younesbelkada mentioned this pull request Jun 13, 2023

Adding DPOTrainer in trl #405

Closed

kashif and others added 2 commits June 15, 2023 13:02

initial DPODataCollatorWithPadding

599e62f

Data collator with masking

50bb1ee

TevenLeScao added 3 commits June 16, 2023 01:02

going through trainer.accelerate to wrap ref_model

8ac30da

style / imports

314925a

style / imports

45df186

broadcast_buffers=False fix to distributed training

58565bb

kashif added 11 commits July 6, 2023 09:37

fixed tests

d2b707b

updated docs

fa5b8e6

update test

f1c0801

fixed typo

aec5762

note about the beta

490a875

added dpo authors

5839bab

fix docstrings

8ff0966

add prediction_step

d8a32eb

Merge remote-tracking branch 'upstream/main' into dpo

2dcac86

remove compute_metrics and log metrics manually

7769559

fix typo

9d58faa

kashif commented Jul 12, 2023

View reviewed changes

Comment thread trl/trainer/dpo_trainer.py

Merge branch 'main' into dpo

930ef77

younesbelkada reviewed Jul 17, 2023

View reviewed changes

Comment thread docs/source/dpo_trainer.mdx

kashif added 2 commits July 17, 2023 10:39

add DPOTrainer doc

7da266b

add dpo to toc

b21b9c7

lvwerra approved these changes Jul 17, 2023

View reviewed changes

Comment thread trl/trainer/dpo_trainer.py Outdated

Comment thread docs/source/dpo_trainer.mdx Outdated

younesbelkada reviewed Jul 17, 2023

View reviewed changes

Comment thread docs/source/trainer.mdx

kashif added 4 commits July 17, 2023 11:02

ValueError

f0dc9dd

add to index and example

420fac5

fix docs

788fa32

fix assert

59cbc1a

lvwerra merged commit 84393f3 into huggingface:main Jul 17, 2023

kashif deleted the dpo branch July 17, 2023 13:29

Conversation

kashif commented Jun 8, 2023 • edited by younesbelkada Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Forbu commented Jun 14, 2023

Uh oh!

kashif commented Jun 14, 2023

Uh oh!

gaetanlop commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kashif commented Jun 15, 2023

Uh oh!

gaetanlop commented Jun 15, 2023

Uh oh!

TevenLeScao commented Jun 16, 2023

Uh oh!

TevenLeScao commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-mitchell commented Jul 5, 2023

Uh oh!

Uh oh!

eric-mitchell commented Jul 14, 2023

Uh oh!

kashif commented Jul 14, 2023

Uh oh!

eric-mitchell commented Jul 14, 2023

Uh oh!

kashif commented Jul 14, 2023

Uh oh!

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

kashif commented Jun 8, 2023 •

edited by younesbelkada

Loading

HuggingFaceDocBuilderDev commented Jun 10, 2023 •

edited

Loading

gaetanlop commented Jun 15, 2023 •

edited

Loading

TevenLeScao commented Jun 16, 2023 •

edited

Loading