Fix bug in PP output layer shape

mostly harmless bug, since output shape of last layer is not used for send/recv purpose, the runtime value overrides it no matter what value you configured it with. However, since adding in/out shape validation to pipeline lib in torch, this raises an error and has to be fixed. ghstack-source-id: 950e41529b7b506085ab280d8a492e345eaefd24 Pull Request resolved: #354
pytorch · May 22, 2024 · 638ec48 · 638ec48
1 parent 6807909
commit 638ec48
Showing 1 changed file with 5 additions and 1 deletion.
diff --git a/torchtitan/parallelisms/parallelize_llama.py b/torchtitan/parallelisms/parallelize_llama.py
@@ -209,7 +209,11 @@ def pipeline_llama_manual(
     batch_size = job_config.training.batch_size
     local_seq_len = int(job_config.training.seq_len // parallel_dims.tp)
     layers_io_shape = (batch_size, local_seq_len, model_config.dim)
-    output_layer_shape = (batch_size, local_seq_len, model_config.vocab_size)
+    output_layer_shape = (
+        batch_size,
+        job_config.training.seq_len,
+        model_config.vocab_size,
+    )
     if pp_rank == 0:
         # first layer
         input = torch.randint(