Enable flag to not pass PAD tokens in ffwd #775

bcui19 · 2023-12-04T18:46:04Z

This PR does two things:

Modifies the attn_bias function to always return the attention_mask.
Enables us to remove pad tokens before calling .forward on the ffwd network then re-add in the pad tokens.

Loss curves on a fully randomly initialized network:

We also get slightly higher throughput from this when there are PAD tokens in our dataset (and no degradation when compared to main with attn_impl: triton:

wandb: https://wandb.ai/mosaic-ml/padding_check?workspace=user-bcui

llmfoundry/models/layers/ffn_padding_utils.py

llmfoundry/models/layers/blocks.py

vchiley · 2023-12-04T22:12:49Z

We also get slightly higher throughput from this when there are PAD tokens in our dataset:

Can you run main vs your branch with use_pad_tok_in_ffwd flag vs your branch without use_pad_tok_in_ffwd flag?

llmfoundry/models/layers/__init__.py

dakinggg

Can you add a test that tests numerical equivalence of computation with and without the flag? might be off by a bit because of numerics, but lets see.

llmfoundry/models/layers/blocks.py

mvpatel2000

Minor nits

llmfoundry/models/layers/blocks.py

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

llmfoundry/models/mpt/configuration_mpt.py

llmfoundry/models/layers/blocks.py

… numerics test

tests/models/test_model.py

dakinggg

Thanks!

llmfoundry/models/mpt/configuration_mpt.py

tests/models/test_model.py

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

bcui19 added 2 commits December 4, 2023 18:45

Changing how attention_mask is being passed around

1f58bdd

adding in option to toggle flags for padding

31e16ad

bcui19 changed the title ~~[DRAFT] Changing how attention_mask is being passed around~~ Enable flag to not pass PAD tokens in ffwd Dec 4, 2023

bcui19 requested review from vchiley and dakinggg December 4, 2023 22:03

Merge branch 'main' into mask_pad_token

7256047

bcui19 marked this pull request as ready for review December 4, 2023 22:03

bcui19 requested a review from a team as a code owner December 4, 2023 22:03

vchiley reviewed Dec 4, 2023

View reviewed changes

llmfoundry/models/layers/ffn_padding_utils.py Outdated Show resolved Hide resolved

vchiley reviewed Dec 4, 2023

View reviewed changes

llmfoundry/models/layers/blocks.py Show resolved Hide resolved

dakinggg reviewed Dec 4, 2023

View reviewed changes

llmfoundry/models/layers/__init__.py Outdated Show resolved Hide resolved

dakinggg reviewed Dec 5, 2023

View reviewed changes

bcui19 added 4 commits December 5, 2023 21:40

moving to flash attn import

2eeb51f

merge

80cd4e5

removing unused import

ccba914

Removing excess stuff from pyproject

b4f59c0

vchiley reviewed Dec 5, 2023

View reviewed changes

llmfoundry/models/layers/blocks.py Outdated Show resolved Hide resolved

bcui19 and others added 2 commits December 7, 2023 19:09

refactor

e7160c0

Merge branch 'main' into mask_pad_token

ee0b07a

mvpatel2000 reviewed Dec 7, 2023

View reviewed changes

llmfoundry/models/layers/blocks.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/blocks.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/blocks.py Outdated Show resolved Hide resolved

bcui19 and others added 2 commits December 7, 2023 16:24

Update llmfoundry/models/layers/blocks.py

da6776a

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

Update llmfoundry/models/layers/blocks.py

f88bcfe

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

dakinggg reviewed Dec 8, 2023

View reviewed changes

llmfoundry/models/mpt/configuration_mpt.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/blocks.py Show resolved Hide resolved

bcui19 and others added 5 commits December 11, 2023 17:14

Changing some naming conventions, moving where tests are done, adding…

6143f3f

… numerics test

Updating import

6a40052

trying to fix tests

b5344c5

trying to fix tests

509309b

Merge branch 'main' into mask_pad_token

70077ef

bcui19 requested a review from vchiley December 11, 2023 21:03

bcui19 requested review from dakinggg and mvpatel2000 December 11, 2023 21:03

vchiley reviewed Dec 11, 2023

View reviewed changes

tests/models/test_model.py Show resolved Hide resolved

vchiley reviewed Dec 11, 2023

View reviewed changes

tests/models/test_model.py Outdated Show resolved Hide resolved

bcui19 added 3 commits December 11, 2023 21:31

updating tests

30217ef

merge

3de2418

updating tests

205406e

vchiley self-requested a review December 11, 2023 21:41

mvpatel2000 approved these changes Dec 11, 2023

View reviewed changes

dakinggg approved these changes Dec 11, 2023

View reviewed changes

llmfoundry/models/mpt/configuration_mpt.py Outdated Show resolved Hide resolved

tests/models/test_model.py Outdated Show resolved Hide resolved

vchiley approved these changes Dec 11, 2023

View reviewed changes

bcui19 and others added 2 commits December 11, 2023 17:03

Update llmfoundry/models/mpt/configuration_mpt.py

74f99c3

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

Changing gating in tests to check for flash attn

d041023

bcui19 merged commit 410d5c7 into main Dec 11, 2023
8 checks passed

dakinggg deleted the mask_pad_token branch February 3, 2024 01:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable flag to not pass PAD tokens in ffwd #775

Enable flag to not pass PAD tokens in ffwd #775

bcui19 commented Dec 4, 2023 •

edited

vchiley commented Dec 4, 2023

dakinggg left a comment

mvpatel2000 left a comment

dakinggg left a comment

Enable flag to not pass PAD tokens in ffwd #775

Enable flag to not pass PAD tokens in ffwd #775

Conversation

bcui19 commented Dec 4, 2023 • edited

vchiley commented Dec 4, 2023

dakinggg left a comment

Choose a reason for hiding this comment

mvpatel2000 left a comment

Choose a reason for hiding this comment

dakinggg left a comment

Choose a reason for hiding this comment

bcui19 commented Dec 4, 2023 •

edited