[Distributed] Add lanes to KV cache #1174

kwen2501 · 2024-09-21T05:54:14Z

KV cache is extended to have multiple lanes, each letting a separate batch pass through, achieving pipeline parallelism.

# The number of cache lanes is the same as the maximum number of
# micro-batches that can be "in flight" in parallel -- imagine each
# micro-batch takes 1 "pipeline lane," they need distinct KV cache spaces.
# When decoding is done for certain micro-batches, we can reuse the KV cache
# lanes.

Major changes

setup_caches will take one kwarg cache_lanes (default to 1).

def setup_caches(self, max_batch_size, max_seq_length, cache_lanes: int = 1)

attention.kv_cache is now a nn.ModuleList, containing multiple KVCache's, each corresponding to a lane.
We now pass kwargs = {"input_pos": input_pos, "cache_lane": lane} to the step() function. Removing the temporary helper function model.setup_input_pos.

Requires pytorch/pytorch#136416 to support pass-in of kwargs.

pytorch-bot · 2024-09-21T05:54:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1174

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9514b54 with merge base 8d01d9b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lessw2020 · 2024-09-23T18:41:56Z

dist_run.py

    )
    # create schedule
-    decode_schedule = ScheduleGPipe(decode_stage, mbs)
+    decorder = ScheduleGPipe(decode_stage, 1)


syntax error - this should be 'decoder' and not 'decorder'.

lessw2020 · 2024-09-23T18:42:12Z

dist_run.py

            # Run data through pipeline
            if pp_rank == first_pp_rank:
-                output = decode_schedule.step(new_token)
+                output = decorder.step(new_token, **kwargs)


same, syntax error - this should be 'decoder' and not 'decorder'.

lessw2020 · 2024-09-23T18:42:32Z

dist_run.py

+                output = decorder.step(new_token, **kwargs)
            elif pp_rank == last_pp_rank:
-                output = decode_schedule.step()
+                output = decorder.step(**kwargs)


same, syntax error - this should be 'decoder' and not 'decorder'.

lessw2020 · 2024-09-23T18:43:03Z

dist_run.py

+                output = decorder.step(**kwargs)
            else:  # middle pp ranks
-                decode_schedule.step()
+                decorder.step(**kwargs)


last one, syntax error - this should be 'decoder' and not 'decorder'.

lessw2020

nice addition!
minor note that 'decorder' should be 'decoder' in the code for ease of understanding/syntax.

This reverts commit 2cf4016.

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 21, 2024

[WIP][Distributed] Add lanes to KV cache

39eff90

kwen2501 force-pushed the cache_lanes branch from e33e681 to 39eff90 Compare September 21, 2024 05:55

kwen2501 added 5 commits September 21, 2024 00:10

Compatibility change

3ba19ea

Naming

2bec61c

Remove setup_input_pos

4ecb951

Add timer

5951e39

Remove mbs

9514b54

kwen2501 changed the title ~~[WIP][Distributed] Add lanes to KV cache~~ [Distributed] Add lanes to KV cache Sep 23, 2024

kwen2501 requested review from Jack-Khuu and lessw2020 September 23, 2024 16:35

lessw2020 reviewed Sep 23, 2024

View reviewed changes

lessw2020 approved these changes Sep 23, 2024

View reviewed changes

kwen2501 merged commit 2cf4016 into main Sep 23, 2024
51 checks passed

kwen2501 added a commit that referenced this pull request Sep 23, 2024

Revert "[Distributed] Add lanes to KV cache (#1174)"

05129a2

This reverts commit 2cf4016.

kwen2501 mentioned this pull request Sep 24, 2024

[Distributed] Fix cache lane #1194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Distributed] Add lanes to KV cache #1174

[Distributed] Add lanes to KV cache #1174

Uh oh!

kwen2501 commented Sep 21, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 21, 2024 •

edited

Loading

Uh oh!

lessw2020 Sep 23, 2024

Uh oh!

lessw2020 Sep 23, 2024

Uh oh!

lessw2020 Sep 23, 2024 •

edited

Loading

Uh oh!

lessw2020 Sep 23, 2024

Uh oh!

lessw2020 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Distributed] Add lanes to KV cache #1174

[Distributed] Add lanes to KV cache #1174

Uh oh!

Conversation

kwen2501 commented Sep 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major changes

Uh oh!

pytorch-bot bot commented Sep 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1174

✅ No Failures

Uh oh!

lessw2020 Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lessw2020 Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kwen2501 commented Sep 21, 2024 •

edited

Loading

pytorch-bot bot commented Sep 21, 2024 •

edited

Loading

lessw2020 Sep 23, 2024 •

edited

Loading