Add margin to RM training by jvhoffbauer · Pull Request #719 · huggingface/trl

jvhoffbauer · 2023-08-31T13:15:49Z

HuggingFaceDocBuilderDev · 2023-08-31T13:21:19Z

The documentation is not available anymore as the PR was closed or merged.

jvhoffbauer · 2023-08-31T16:38:54Z

@younesbelkada could you give me a quick hint if the interface is ok that way?

Especially, as I solely use the margin if it is provided, but implicitly assume it is zero, if it is not provided. Is that ok?

Also, the tests for (3.9, ubuntu-latest) seem to encounter a time out. Is that something I can fix or unrelated?

younesbelkada

@jvhoffbauer , the interface looks really great to me IMO as it seems to preserve previous behaviour
Can you give an example of how to use this interface to enable margin? Also would be nice to add few lines in the documentation about it ! What do you think?
Regarding the failing tests, don't worry the time out issues happen sometimes and it is not related to your PR

jvhoffbauer · 2023-09-05T10:09:47Z

Awesome! I will look into this tomorrow/eow. Should I also add a test?

younesbelkada · 2023-09-06T08:37:10Z

Great! If possible yes, a simple test that checks computing the loss with margin works as expected would be really great ! 🙏

jvhoffbauer · 2023-09-09T15:11:03Z

Done. Let me know what you think!

One thing: I noticed that the rewards are actually tensors with shape [1, 2] even if I process just one sample. Is that correct?

batch = [[dummy_dataset[0]]]
batch = trainer.data_collator(batch)
loss = trainer.compute_loss(trainer.model, batch, return_outputs=True)

# Output: 
{
  'rewards_chosen': tensor([[ 0.2797, -0.1509]], grad_fn=<IndexBackward0>)
  'rewards_rejected': tensor([[0.0570, 0.0088]], grad_fn=<IndexBackward0>)
}

younesbelkada

Thanks a lot, this looks great ! I left one single comment with respect to a merge conflict you might forgot to fix, apart from that LGTM !

younesbelkada · 2023-09-14T08:36:23Z

+<<<<<<< HEAD
+After standardizing your dataset, you can use the `RewardTrainer` as a classic Hugging Face Trainer.
+You should pass an `AutoModelForSequenceClassification` model to the `RewardTrainer`.
+=======
+After preparing your dataset, you can use the [`RewardTrainer`] in the same way as the `Trainer` class from 🤗 Transformers.
 You should pass an `AutoModelForSequenceClassification` model to the [`RewardTrainer`], along with a [`RewardConfig`] which configures the hyperparameters of the training.

+> > > > > > > origin/main


Hmm there is a merge conflict here that has not been properly dealt with, can you please have a look 🙏

Thanks! I will fix this tomorrow.

Should be ready

lewtun

Thanks a lot for adding this @jvhoffbauer - it's going to be really interesting to see if we can use this on datasets like SHP :)

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

…into rm_training_with_margin

younesbelkada

Looking great thank you very much for your great contribution !

lewtun

Thanks a lot for iterating @jvhoffbauer - this LGTM 🔥 !

* Start adding margin to RM training * Fix typo and cleanup * Fix incompatibilities when not using margin * Format using 'make precommit' * Add documentation and test for reward trainer * Run 'make precommit' * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Fix missed merge conflict in reward trainer docs --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Start adding margin to RM training

fa077fd

jvhoffbauer changed the title ~~Add margin to RM training~~ [WIP ]Add margin to RM training Aug 31, 2023

jvhoffbauer added 3 commits August 31, 2023 16:06

Fix typo and cleanup

f8b54c8

Fix incompatibilities when not using margin

1626e61

Format using 'make precommit'

77ab6a7

jvhoffbauer changed the title ~~[WIP ]Add margin to RM training~~ [WIP] Add margin to RM training Aug 31, 2023

younesbelkada reviewed Sep 1, 2023

View reviewed changes

jvhoffbauer added 2 commits September 9, 2023 14:58

Add documentation and test for reward trainer

1d30e28

Merge remote-tracking branch 'origin/main' into rm_training_with_margin

7005d3f

Run 'make precommit'

7d103a6

jvhoffbauer changed the title ~~[WIP] Add margin to RM training~~ Add margin to RM training Sep 10, 2023

younesbelkada reviewed Sep 14, 2023

View reviewed changes

lewtun reviewed Sep 18, 2023

View reviewed changes

Comment thread docs/source/reward_trainer.mdx Outdated

jvhoffbauer and others added 4 commits September 19, 2023 21:43

Merge branch 'main' into rm_training_with_margin

05a647b

Update docs/source/reward_trainer.mdx

167a3b1

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Fix missed merge conflict in reward trainer docs

4ddb321

Merge branch 'rm_training_with_margin' of github.com:jvhoffbauer/trl …

9c5a2a0

…into rm_training_with_margin

younesbelkada approved these changes Sep 20, 2023

View reviewed changes

lewtun approved these changes Sep 20, 2023

View reviewed changes

younesbelkada merged commit 08cfc41 into huggingface:main Sep 20, 2023

Conversation

jvhoffbauer commented Aug 31, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvhoffbauer commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

jvhoffbauer commented Sep 5, 2023

Uh oh!

younesbelkada commented Sep 6, 2023

Uh oh!

jvhoffbauer commented Sep 9, 2023

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

jvhoffbauer Sep 18, 2023

Choose a reason for hiding this comment

Uh oh!

jvhoffbauer Sep 19, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Aug 31, 2023 •

edited

Loading

jvhoffbauer commented Aug 31, 2023 •

edited

Loading