[`t5`] Fix negative kl issue by younesbelkada · Pull Request #262 · huggingface/trl

younesbelkada · 2023-03-29T15:16:51Z

Fixes #256

This PR fixes issues related with negative KL and T5 sentiment example. The first fix was related to the sentiment script that was incorrectly ported.

Before this PR, the padding side of tokenizers was always hardcoded to left in _batched_generate. in encoder-decoder models, they should be set to their native padding_side (i.e. right for T5), as the padding is performed on the encoder tokens, and these models have been trained with this specific padding side. I think that the culprit is the way the positional attention bias is computed, that does not take into account the starting position if padding_side=left: https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L436-L451

Thus forcing padding_side=left for encoder-decoder models should probably be avoided, as most of Enc-dec models pad the encoder input to the right (to verify).

To illustrate the fix:

You can see that using padding_side=left led to unstable KL, whereas the proposed fix seems to lead to smoother KL

cc @lvwerra

HuggingFaceDocBuilderDev · 2023-03-29T15:20:59Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra · 2023-03-30T08:39:12Z

+                if not self.is_encoder_decoder:
+                    output = generation[(1 - mask).sum() :]  # remove padding
+                else:
+                    output = generation


maybe we could remove the special token here?

I am not sure if this is correct as the token is removed here right after: https://github.com/lvwerra/trl/blob/ed87942a47f26d15e823ca7674737be02e48cc0a/trl/trainer/ppo_trainer.py#L832
Also made a run with generation[1:] and the KL becomes negative: https://wandb.ai/younesbelkada/trl/runs/vjbydeqv - so I think we shouldn't remove the special token here

Is that not suspicious that such a slight change breaks the training? If no then why is that expected? I'm asking this as I myself am having trouble with my t5 model and negative kl divergence even on v.0.4.1 release.

avacaondata · 2023-04-21T12:58:15Z

With other encoder-decoder models such as MarianMT models (BART architecture) I'm still experiencing negative kl, in fact it is becoming increasingly negative: -5, -19, -24.

Opdoop · 2023-04-23T08:29:31Z

I experiencing negative kl warning with Alpaca-7B on the sentiment script.

younesbelkada · 2023-04-25T09:04:39Z

Can you try with non-batched generation as suggested in #256 (comment) and let us know if this works?

lzy37ld · 2023-04-26T20:46:12Z

In gpt2_sentiment.py, I have tried to modify the generation_kwargs a bit, like set the top_p to 0.9, and it said that my kl fall into negative and maybe because generation kwargs are not correctly set.

I wonder is there any specific restriction on the generation kwargs during ppo?

Gently pin:@younesbelkada

* fix negative kl issue * fix * make style

younesbelkada added 2 commits March 29, 2023 14:36

fix negative kl issue

180a7e6

fix

be0046a

make style

ec0ba8b

younesbelkada mentioned this pull request Mar 29, 2023

t5-sentiment example collapses on master #256

Closed

lvwerra reviewed Mar 30, 2023

View reviewed changes

younesbelkada requested a review from lvwerra March 30, 2023 12:35

lvwerra approved these changes Apr 14, 2023

View reviewed changes

younesbelkada merged commit 160d0c9 into huggingface:main Apr 14, 2023

younesbelkada deleted the fix-t5-neg-kl branch April 14, 2023 09:50

yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025

[t5] Fix negative kl issue (huggingface#262)

8f154b3

* fix negative kl issue * fix * make style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`t5`] Fix negative kl issue#262

[`t5`] Fix negative kl issue#262
younesbelkada merged 3 commits intohuggingface:mainfrom
younesbelkada:fix-t5-neg-kl

younesbelkada commented Mar 29, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 29, 2023 •

edited

Loading

Uh oh!

lvwerra Mar 30, 2023

Uh oh!

younesbelkada Mar 30, 2023

Uh oh!

janpawlowskiof Mar 31, 2023

Uh oh!

avacaondata commented Apr 21, 2023

Uh oh!

Opdoop commented Apr 23, 2023

Uh oh!

younesbelkada commented Apr 25, 2023

Uh oh!

lzy37ld commented Apr 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

younesbelkada commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvwerra Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

janpawlowskiof Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

avacaondata commented Apr 21, 2023

Uh oh!

Opdoop commented Apr 23, 2023

Uh oh!

younesbelkada commented Apr 25, 2023

Uh oh!

lzy37ld commented Apr 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

younesbelkada commented Mar 29, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 29, 2023 •

edited

Loading