Add pipeline parallel #1060

kwen2501 · 2024-08-24T06:39:03Z

Stack from ghstack (oldest at bottom):

PP + TP now working.

pytorch-bot · 2024-08-24T06:39:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1060

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 09946d6 with merge base 19a47e7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 8c01e92 Pull Request resolved: #1060

ghstack-source-id: f06490c Pull Request resolved: #1060

ghstack-source-id: 137dac5 Pull Request resolved: #1060

lessw2020 · 2024-08-25T21:21:13Z

build/model_dist.py

+        # Use ModuleDict so that each layer can be assigned its layer ID in the original model
+        self.layers = nn.ModuleDict()
+        for layer_id in range(self.layers_per_stage * stage_idx, self.layers_per_stage * (stage_idx + 1)):
+            self.layers[str(layer_id)] = TransformerBlock(config)


this is pretty clever!

lessw2020 · 2024-08-25T21:26:10Z

build/model_dist.py

@@ -67,7 +75,7 @@ def setup_caches(self, max_batch_size, max_seq_length):
        max_seq_length = find_multiple(max_seq_length, 8)
        self.max_seq_length = max_seq_length
        self.max_batch_size = max_batch_size
-        for b in self.layers:
+        for b in self.layers.values():


block instead of b for clarity?

from original model.

lessw2020 · 2024-08-25T21:29:18Z

dist_run.py


 # Model config
 config = TransformerArgs.from_name("Transformer-2-7b-chat-hf")
 print(config)

 # Construct a device mesh with available devices (multi-host or single host)
-device_mesh = dist.init_device_mesh("cuda", (2,), mesh_dim_names=("tp",))
+device_mesh = dist.init_device_mesh("cuda", (2, 2), mesh_dim_names=("pp", "tp"))


if this file expands in future, maybe better to functionalize these different stages ala
create_device_mesh(mesh_shape=(2,2))
setup_model
create_pipeline_stage
etc.
It's clear atm what's happening but if this file becomes larger/more dynamic then maybe it would make sense in the future.

Sure makes sense.

lessw2020

looks great - the fancy work with module dict for assigning layers to stage is very clever.

ghstack-source-id: 02acf73 Pull Request resolved: #1060

awgu · 2024-08-27T02:18:23Z

dist_run.py

+    mb_ids = torch.randint(0, config.vocab_size, (mb_size, seqlen), device=device)
+    activation = torch.rand(mb_size, seqlen, dim, device=device)


does PipelineStage require that these are materialized on CUDA for them to be input_args?

kwen2501 mentioned this pull request Aug 24, 2024

Initial add of distributed model #1059

Merged

kwen2501 added a commit that referenced this pull request Aug 24, 2024

Add pipeline parallel

cf66135

ghstack-source-id: 8c01e92 Pull Request resolved: #1060

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 24, 2024

kwen2501 added a commit that referenced this pull request Aug 25, 2024

Add pipeline parallel

4b40192

ghstack-source-id: f06490c Pull Request resolved: #1060

kwen2501 added a commit that referenced this pull request Aug 25, 2024

Add pipeline parallel

f3d7600

ghstack-source-id: 137dac5 Pull Request resolved: #1060

lessw2020 reviewed Aug 25, 2024

View reviewed changes

lessw2020 approved these changes Aug 25, 2024

View reviewed changes

kwen2501 added a commit that referenced this pull request Aug 26, 2024

Add pipeline parallel

ce16f03

ghstack-source-id: 02acf73 Pull Request resolved: #1060

kwen2501 changed the base branch from gh/kwen2501/2/base to main August 26, 2024 19:57

Add pipeline parallel

09946d6

ghstack-source-id: 02acf73 Pull Request resolved: #1060

kwen2501 force-pushed the gh/kwen2501/2/head branch from 82a1c44 to 09946d6 Compare August 26, 2024 20:06

kwen2501 merged commit 9dc9eff into main Aug 26, 2024
51 checks passed

awgu reviewed Aug 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pipeline parallel #1060

Add pipeline parallel #1060

Uh oh!

kwen2501 commented Aug 24, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 24, 2024 •

edited

Loading

Uh oh!

lessw2020 Aug 25, 2024

Uh oh!

lessw2020 Aug 25, 2024

Uh oh!

kwen2501 Aug 26, 2024

Uh oh!

lessw2020 Aug 25, 2024

Uh oh!

kwen2501 Aug 26, 2024

Uh oh!

lessw2020 left a comment

Uh oh!

Uh oh!

awgu Aug 27, 2024 •

edited

Loading

Uh oh!

Uh oh!

		mb_ids = torch.randint(0, config.vocab_size, (mb_size, seqlen), device=device)
		activation = torch.rand(mb_size, seqlen, dim, device=device)

Add pipeline parallel #1060

Add pipeline parallel #1060

Uh oh!

Conversation

kwen2501 commented Aug 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1060

✅ No Failures

Uh oh!

lessw2020 Aug 25, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 Aug 25, 2024

Choose a reason for hiding this comment

Uh oh!

kwen2501 Aug 26, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 Aug 25, 2024

Choose a reason for hiding this comment

Uh oh!

kwen2501 Aug 26, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

awgu Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kwen2501 commented Aug 24, 2024 •

edited

Loading

pytorch-bot bot commented Aug 24, 2024 •

edited

Loading

awgu Aug 27, 2024 •

edited

Loading