Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for dynamic batch padding #2352

Merged
merged 20 commits into from
Jan 25, 2024
Merged

Conversation

muellerzr
Copy link
Collaborator

What does this PR do?

This PR allows for dynamic batch padding by finding and duplicating the last item in the batch before concating everything.

Current issue:

  • PiPPy has issues with batch inference on GPT-2, will talk to the pippy folks about this

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SunMarc

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this ! Left a few comments. From ke's comments, it looks like the batch issue is linked to the tracing.

src/accelerate/inference.py Outdated Show resolved Hide resolved
src/accelerate/inference.py Outdated Show resolved Hide resolved
src/accelerate/inference.py Outdated Show resolved Hide resolved
process_index=0,
num_processes=state.num_processes,
)
extra = concatenate([extra] * ((found_batch_size % state.num_processes) + 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see related comment in slack.

src/accelerate/inference.py Outdated Show resolved Hide resolved
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are practically there. Left a few comments.

src/accelerate/inference.py Outdated Show resolved Hide resolved
src/accelerate/inference.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
old_size = tensor.shape
new_size = list(old_size)
new_size[0] = batch_size + to_pad
new_tensor = tensor.new_zeros(tuple(new_size))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to just pad 0, we can drop these afterwards and the user won't know that padded inputs were event sent ideally

@muellerzr muellerzr merged commit dac1daa into pippy-integration-v2 Jan 25, 2024
23 checks passed
@muellerzr muellerzr deleted the pippy-duplicates branch January 25, 2024 13:56
muellerzr added a commit that referenced this pull request Feb 6, 2024
* Broken version

* Timing I would expect

* Working version!

* Use MethodType

* working test

* Tests

* Use no split module classes explicitly

* Put split_points in pipelien

* Store split points in hf_split_points

* fix case num_process=1

* Allow for dynamic batch padding (#2352)

* Allow for dynamic batch paddign

* Fix test

* Update src/accelerate/inference.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Break early after the first valid bs is found

* Less slicy-dicy

* Test cv model

* Start, need to test

* Use dataloader-like logic

* Refactor to utils

* With tests

* Update the source

* Clean

* bs=1 case

* Add test

* add some failing test

* Almost working version

* Much cleaner implementation

* Use pad_input_tensor

* All tests passing!

* Do it at tracing too

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>

* Rm literal

* Allow users to pass in max_memory

* Note about recursion

* Document, document, document

* Right import check

* Fix bug, add tests to multigpu runners

* Change default to None

* Start of docs

* Try again?

* Try again x2

* Trailing comma

* Move import

* Clean

* typehint

* typo

* From code review

* Use num_chunks

* Update tests/test_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Bad copy/paste

* hf_split_points

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants