[dtensor][op] Fixed stack op strategy (#129018)

**Summary** The previous stack op strategy was causing the input to be resharded, resulting in list index out of range error. I delayed the resharding for after the input_specs were created so that the new dimension could be inserted, preventing the error above. I have also ran all the other test cases to ensure changes did not introduce any new bugs **Test Plan** pytest test/distributed/_tensor/test_tensor_ops.py -s -k test_stack Pull Request resolved: #129018 Approved by: https://github.com/XilunWu
pytorch · Jun 21, 2024 · aee512c · aee512c
1 parent 6b5fbc5
commit aee512c
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/torch/distributed/_tensor/ops/tensor_ops.py b/torch/distributed/_tensor/ops/tensor_ops.py
@@ -513,7 +513,6 @@ def stack_strategy(mesh: DeviceMesh, op_schema: OpSchema) -> StrategyType:
     follow_placements = _derive_follow_placements_from_tuple_strategy(
         input_tuple_strategy
     )
-    follow_placements = normalize_shard_for_stack(follow_placements, dim)
 
     # create op strategy base on the follow placements
     op_strategy = OpStrategy([])
@@ -522,6 +521,9 @@ def stack_strategy(mesh: DeviceMesh, op_schema: OpSchema) -> StrategyType:
         DTensorSpec(mesh, tuple(follow_placements))
         for _ in range(len(input_tuple_strategy.childs))
     )
+
+    follow_placements = normalize_shard_for_stack(follow_placements, dim)
+
     op_strategy.strategies.append(
         PlacementStrategy(
             output_specs=DTensorSpec(mesh, tuple(follow_placements)),