Skip to content

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Aug 24, 2024

Stack from ghstack (oldest at bottom):

PP + TP now working.

Copy link

pytorch-bot bot commented Aug 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1060

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 09946d6 with merge base 19a47e7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kwen2501 added a commit that referenced this pull request Aug 24, 2024
ghstack-source-id: 8c01e92
Pull Request resolved: #1060
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 24, 2024
kwen2501 added a commit that referenced this pull request Aug 25, 2024
ghstack-source-id: f06490c
Pull Request resolved: #1060
kwen2501 added a commit that referenced this pull request Aug 25, 2024
ghstack-source-id: 137dac5
Pull Request resolved: #1060
# Use ModuleDict so that each layer can be assigned its layer ID in the original model
self.layers = nn.ModuleDict()
for layer_id in range(self.layers_per_stage * stage_idx, self.layers_per_stage * (stage_idx + 1)):
self.layers[str(layer_id)] = TransformerBlock(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty clever!

@@ -67,7 +75,7 @@ def setup_caches(self, max_batch_size, max_seq_length):
max_seq_length = find_multiple(max_seq_length, 8)
self.max_seq_length = max_seq_length
self.max_batch_size = max_batch_size
for b in self.layers:
for b in self.layers.values():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block instead of b for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from original model.

dist_run.py Outdated

# Model config
config = TransformerArgs.from_name("Transformer-2-7b-chat-hf")
print(config)

# Construct a device mesh with available devices (multi-host or single host)
device_mesh = dist.init_device_mesh("cuda", (2,), mesh_dim_names=("tp",))
device_mesh = dist.init_device_mesh("cuda", (2, 2), mesh_dim_names=("pp", "tp"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this file expands in future, maybe better to functionalize these different stages ala
create_device_mesh(mesh_shape=(2,2))
setup_model
create_pipeline_stage
etc.
It's clear atm what's happening but if this file becomes larger/more dynamic then maybe it would make sense in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure makes sense.

Copy link
Contributor

@lessw2020 lessw2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great - the fancy work with module dict for assigning layers to stage is very clever.

kwen2501 added a commit that referenced this pull request Aug 26, 2024
ghstack-source-id: 02acf73
Pull Request resolved: #1060
@kwen2501 kwen2501 changed the base branch from gh/kwen2501/2/base to main August 26, 2024 19:57
ghstack-source-id: 02acf73
Pull Request resolved: #1060
@kwen2501 kwen2501 force-pushed the gh/kwen2501/2/head branch from 82a1c44 to 09946d6 Compare August 26, 2024 20:06
@kwen2501 kwen2501 merged commit 9dc9eff into main Aug 26, 2024
51 checks passed
Comment on lines +48 to +49
mb_ids = torch.randint(0, config.vocab_size, (mb_size, seqlen), device=device)
activation = torch.rand(mb_size, seqlen, dim, device=device)
Copy link

@awgu awgu Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does PipelineStage require that these are materialized on CUDA for them to be input_args?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants