[DPO] Refactor eval logging of dpo trainer by mnoukhov · Pull Request #954 · huggingface/trl

mnoukhov · 2023-11-03T20:52:03Z

Address part 1 of #952 by removing store_metrics and switching to more standard compute_metrics letting Trainer handle logging

lvwerra · 2023-11-07T10:20:46Z

Thanks for working on this! Just ping me whenever you would like to run the CI.

mnoukhov · 2023-11-09T04:50:25Z

@lvwerra, I completed this part of the refactor, can you run the CI?

@kashif as part of this, I removed logging all the logits since it seemed a bit excessive to pass around the batch x seq_len x vocab size array and taking the mean didn't feel like a meaningful thing to log since we already have logps. Let me know if you'd like this reversed

HuggingFaceDocBuilderDev · 2023-11-09T14:14:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

kashif · 2023-11-09T14:17:02Z

@mnoukhov are you also logging sample outputs as well here?

mnoukhov · 2023-11-09T18:06:32Z

@kashif I didn't change sample output logging in this PR, so it's still there. I was going to address the sample logging issue (2 in #952) in a second PR to keep things managable

kashif · 2023-11-09T18:07:12Z

ok makes sense!

kashif · 2023-11-29T18:28:51Z

so this PR LGTM! feed free to merge

This reverts commit 6d9ea38.

kashif · 2023-12-01T09:25:55Z

@mnoukhov so folks really want the additional metrics to be logged during training which isn't possible at the moment, so lets revert this for now and continue this refactoring to figure that out.

This reverts commit 6d9ea38.

mnoukhov · 2023-12-01T16:57:17Z

I don't think this change prevents additional metrics from being logged. I can create a followup PR that addresses those comments

kashif · 2023-12-01T16:59:14Z

Ok so let's try to add the previous behavior if possible... I couldn't figure it out at first glance... I think the trainer's training loop doesn't seem to have a way to do it... can you kindly confirm?

mnoukhov · 2023-12-01T21:09:48Z

Basically what #1046 wants is to log metrics during training, which is not how Trainer usually works but it is a common ask e.g. https://discuss.huggingface.co/t/logging-training-accuracy-using-trainer-class/5524/3. The reverted code does not do this correctly.

AFAIK the problem with trying to put self.log into compute_loss is it doesn't really make sense whenever you use multi-gpu and it didn't account for different batch sizes. The code puts all metrics from a single gpu's minibatch into stored_metrics and then takes the mean at each self.log call. There needs to be an accelerator.gather_for_metrics somewhere in there in order to account for different batch sizes and multi-gpu. @lewtun, is this right?

If you want metrics for the training set, we can either add keep this stored_metrics and try to make it work on multi-gpu but it feels a bit ugly. The other options are to call evaluate on the training set any time it is also done on the test set, which has its own set of problems.

* first attempts at refactor of dpo trainer * removed extra stuff in prediction step * import fixes * label names * all working --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

…huggingface#1047) This reverts commit 6d9ea38.

* first attempts at refactor of dpo trainer * removed extra stuff in prediction step * import fixes * label names * all working --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

…huggingface#1047) This reverts commit 8c98334.

mnoukhov added 3 commits November 7, 2023 18:15

first attempts at refactor of dpo trainer

c0f8354

removed extra stuff in prediction step

be12b26

import fixes

d2dc29b

mnoukhov force-pushed the dpo-eval branch from 995a8ec to d2dc29b Compare November 7, 2023 18:17

mnoukhov mentioned this pull request Nov 7, 2023

[DPO] should we refactor to be more in line with Trainer? #952

Closed

mnoukhov added 2 commits November 7, 2023 18:38

label names

6d55ffa

all working

0a15351

mnoukhov changed the title ~~[DPO] WIP of refactor of dpo trainer~~ [DPO] Refactor eval of dpo trainer Nov 9, 2023

mnoukhov changed the title ~~[DPO] Refactor eval of dpo trainer~~ [DPO] Refactor eval logging of dpo trainer Nov 9, 2023

mnoukhov marked this pull request as ready for review November 9, 2023 04:51

lvwerra approved these changes Nov 30, 2023

View reviewed changes

Merge branch 'main' into dpo-eval

cf373f2

lvwerra merged commit 6d9ea38 into huggingface:main Nov 30, 2023

winglian mentioned this pull request Dec 1, 2023

reward metrics no longer reported to wandb #1046

Closed

lvwerra added a commit that referenced this pull request Dec 1, 2023

Revert "[DPO] Refactor eval logging of dpo trainer (#954)"

852a451

This reverts commit 6d9ea38.

lvwerra added a commit that referenced this pull request Dec 1, 2023

Revert "[DPO] Refactor eval logging of dpo trainer (#954)" (#1047)

baa8f09

This reverts commit 6d9ea38.

lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024

Revert "[DPO] Refactor eval logging of dpo trainer (huggingface#954)" (…

a5d117e

…huggingface#1047) This reverts commit 6d9ea38.

yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025

Revert "[DPO] Refactor eval logging of dpo trainer (huggingface#954)" (…

ca841d4

…huggingface#1047) This reverts commit 8c98334.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] Refactor eval logging of dpo trainer#954

[DPO] Refactor eval logging of dpo trainer#954
lvwerra merged 6 commits intohuggingface:mainfrom
mnoukhov:dpo-eval

mnoukhov commented Nov 3, 2023 •

edited

Loading

Uh oh!

lvwerra commented Nov 7, 2023

Uh oh!

mnoukhov commented Nov 9, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Nov 9, 2023

Uh oh!

kashif commented Nov 9, 2023

Uh oh!

mnoukhov commented Nov 9, 2023

Uh oh!

kashif commented Nov 9, 2023

Uh oh!

kashif commented Nov 29, 2023

Uh oh!

kashif commented Dec 1, 2023

Uh oh!

mnoukhov commented Dec 1, 2023 •

edited

Loading

Uh oh!

kashif commented Dec 1, 2023

Uh oh!

mnoukhov commented Dec 1, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mnoukhov commented Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvwerra commented Nov 7, 2023

Uh oh!

mnoukhov commented Nov 9, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Nov 9, 2023

Uh oh!

kashif commented Nov 9, 2023

Uh oh!

mnoukhov commented Nov 9, 2023

Uh oh!

kashif commented Nov 9, 2023

Uh oh!

kashif commented Nov 29, 2023

Uh oh!

kashif commented Dec 1, 2023

Uh oh!

mnoukhov commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kashif commented Dec 1, 2023

Uh oh!

mnoukhov commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mnoukhov commented Nov 3, 2023 •

edited

Loading

mnoukhov commented Dec 1, 2023 •

edited

Loading

mnoukhov commented Dec 1, 2023 •

edited

Loading