[Distributed] Make decode in flight optional #1180

kwen2501 · 2024-09-23T20:16:53Z

Decoding tensor-form token ids to strings involves a CPU sync.

This PR added a flag to disable on-the-flight conversion.

torchrun --nproc-per-node 4 dist_run.py llama2-7b-chat --pp 2 --disable-in-flight-decode

pytorch-bot · 2024-09-23T20:16:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1180

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit faec017 with merge base 3aba730 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lessw2020 · 2024-09-23T21:08:23Z

dist_run.py

-    res = [[] for _ in range(total_prompts)]
-    num_tokens = 40
+    # need a row dimension for each prompt in the batch
+    new_token = torch.zeros(batch_size, 1, device=device, dtype=torch.int64)


general comment- do we really need int64 for the dtype given that int32 holds to positive 2B? (meant to ask this earlier).
Seems like some minor savings and I don't see any vocabs getting to 2B anytime soon?
Can just leave as is for consistency and maybe a single PR in the future to sweep this up if we agree int32 is fine.

lessw2020

looks good!
longer term, I already started a PR similar to this one but with the idea of doing a chunky decoding, where there is a variable that represents how many tokens to generate between decoding.
The idea here is that we don't want to wait until the entire generation is done before showing things to the user. But as this PR points out, we have a sync every time we decode...so maybe generating 20 tokens and then decode display, another round of generate/display, etc. so that we balance the tension between decoding speed and still updating the user as to the response.
Anyway, we can discuss and can build on this option in this PR to make it more tunable.
Thanks for adding this!

kwen2501 · 2024-09-23T23:15:32Z

Ah, sorry, didn't mean to "overwrite" your PR. I was just doing code cleaning and all of a sudden touched the decode path..

The chunk decode idea definitely sounds good! My flag of "disable-in-flight-decode" can be turned into chunk size = -1 (infinity) in your case.

kwen2501 requested a review from lessw2020 September 23, 2024 20:16

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 23, 2024

kwen2501 changed the base branch from main to cache_lanes September 23, 2024 20:17

lessw2020 reviewed Sep 23, 2024

View reviewed changes

lessw2020 approved these changes Sep 23, 2024

View reviewed changes

kwen2501 added 3 commits September 24, 2024 12:53

Replace total_prompts with batch_size

6c35e6c

Make in-flight decoding optional

9d0ed06

Add back prompt print

3364c72

kwen2501 force-pushed the batch_decode branch from 711eb41 to 3364c72 Compare September 24, 2024 19:55

kwen2501 changed the base branch from cache_lanes to main September 24, 2024 19:55

Merge branch 'main' into batch_decode

faec017

kwen2501 merged commit 6fd90bc into main Sep 25, 2024
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Distributed] Make decode in flight optional #1180

[Distributed] Make decode in flight optional #1180

Uh oh!

kwen2501 commented Sep 23, 2024

Uh oh!

pytorch-bot bot commented Sep 23, 2024 •

edited

Loading

Uh oh!

lessw2020 Sep 23, 2024

Uh oh!

lessw2020 left a comment

Uh oh!

kwen2501 commented Sep 23, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Distributed] Make decode in flight optional #1180

[Distributed] Make decode in flight optional #1180

Uh oh!

Conversation

kwen2501 commented Sep 23, 2024

Uh oh!

pytorch-bot bot commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1180

✅ No Failures

Uh oh!

lessw2020 Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Sep 23, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Sep 23, 2024 •

edited

Loading