[`bug`] support for large `forward_batch_size` in seq2seq models #100

younesbelkada · 2023-01-23T20:12:31Z

What does this PR do?

First of all, please welcome the PR number 100!

Before this PR, it was not possible to run the T5 example with forward_batch_size > 1. We suspected that something was wrong with the computation of the loss function when padding tokens are present.

This PR fixes a bug we had for seq2seq models (and I believe for causalLM models too). It seems that we forgot to mask out the logits/hidden states that correspond to the pad tokens when computing the loss function.

See a similar implementation here: https://github.com/CarperAI/trlx/blob/main/trlx/trainer/nn/ppo_models.py#L166-L191 where the loss takes care of ignoring the terms corresponding to pad tokens

Can also add tests!

GPT2 run: https://wandb.ai/distill-bloom/trl/runs/4cs5z6j3?workspace=user-younesbelkada
T5 run: https://wandb.ai/distill-bloom/trl/runs/lxgi5ae9?workspace=user-younesbelkada
It seems that now the reward_std is higher, leading to less smooth reward_mean curves, so putting this PR as draft

cc @lvwerra

- fix attention mask & loss issue - fix example - add attention mask construction in model

HuggingFaceDocBuilderDev · 2023-01-23T20:15:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

- comment calls to `_return_slice_index` function - add `offset` for `padding_side=left` support

lvwerra · 2023-02-07T09:57:34Z

Closing in favour of #133.

support for large forward_batch_size

f60491f

- fix attention mask & loss issue - fix example - add attention mask construction in model

younesbelkada added 3 commits January 23, 2023 20:30

fix CI test

1d43ca6

fix Ci tests

2f89278

fix CI tests

a2f667b

younesbelkada mentioned this pull request Jan 24, 2023

Spikes in PPO policy loss #101

Closed

few modifs

6020de7

- comment calls to `_return_slice_index` function - add `offset` for `padding_side=left` support

lvwerra mentioned this pull request Jan 30, 2023

optimizer parameters 1 by 1 sampler in train_minibatch #72

Closed

lvwerra closed this Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`bug`] support for large `forward_batch_size` in seq2seq models #100

[`bug`] support for large `forward_batch_size` in seq2seq models #100

younesbelkada commented Jan 23, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 23, 2023

lvwerra commented Feb 7, 2023

[bug] support for large forward_batch_size in seq2seq models #100

[bug] support for large forward_batch_size in seq2seq models #100

Conversation

younesbelkada commented Jan 23, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jan 23, 2023

lvwerra commented Feb 7, 2023

[`bug`] support for large `forward_batch_size` in seq2seq models #100

[`bug`] support for large `forward_batch_size` in seq2seq models #100

younesbelkada commented Jan 23, 2023 •

edited

Loading