Add margin to RM training#719
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
@younesbelkada could you give me a quick hint if the interface is ok that way? Especially, as I solely use the margin if it is provided, but implicitly assume it is zero, if it is not provided. Is that ok? Also, the tests for |
younesbelkada
left a comment
There was a problem hiding this comment.
@jvhoffbauer , the interface looks really great to me IMO as it seems to preserve previous behaviour
Can you give an example of how to use this interface to enable margin? Also would be nice to add few lines in the documentation about it ! What do you think?
Regarding the failing tests, don't worry the time out issues happen sometimes and it is not related to your PR
|
Awesome! I will look into this tomorrow/eow. Should I also add a test? |
|
Great! If possible yes, a simple test that checks computing the loss with margin works as expected would be really great ! 🙏 |
|
Done. Let me know what you think! One thing: I noticed that the rewards are actually tensors with shape [1, 2] even if I process just one sample. Is that correct? |
younesbelkada
left a comment
There was a problem hiding this comment.
Thanks a lot, this looks great ! I left one single comment with respect to a merge conflict you might forgot to fix, apart from that LGTM !
| <<<<<<< HEAD | ||
| After standardizing your dataset, you can use the `RewardTrainer` as a classic Hugging Face Trainer. | ||
| You should pass an `AutoModelForSequenceClassification` model to the `RewardTrainer`. | ||
| ======= | ||
| After preparing your dataset, you can use the [`RewardTrainer`] in the same way as the `Trainer` class from 🤗 Transformers. | ||
| You should pass an `AutoModelForSequenceClassification` model to the [`RewardTrainer`], along with a [`RewardConfig`] which configures the hyperparameters of the training. | ||
|
|
||
| > > > > > > > origin/main |
There was a problem hiding this comment.
Hmm there is a merge conflict here that has not been properly dealt with, can you please have a look 🙏
There was a problem hiding this comment.
Thanks! I will fix this tomorrow.
There was a problem hiding this comment.
Should be ready
lewtun
left a comment
There was a problem hiding this comment.
Thanks a lot for adding this @jvhoffbauer - it's going to be really interesting to see if we can use this on datasets like SHP :)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
…into rm_training_with_margin
younesbelkada
left a comment
There was a problem hiding this comment.
Looking great thank you very much for your great contribution !
lewtun
left a comment
There was a problem hiding this comment.
Thanks a lot for iterating @jvhoffbauer - this LGTM 🔥 !
* Start adding margin to RM training * Fix typo and cleanup * Fix incompatibilities when not using margin * Format using 'make precommit' * Add documentation and test for reward trainer * Run 'make precommit' * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Fix missed merge conflict in reward trainer docs --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Start adding margin to RM training * Fix typo and cleanup * Fix incompatibilities when not using margin * Format using 'make precommit' * Add documentation and test for reward trainer * Run 'make precommit' * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Fix missed merge conflict in reward trainer docs --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Fix #718