[FSDP][Replicate] tests replicate casting module after init #162636

anshul-si · 2025-09-10T20:30:58Z

Summary: In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. This test is important as it verifies we can cast a replicated module to a different type after initialization, and import feature for enabling mixed precision,

Test Cases

pytest test/distributed/_composable/test_replicate_training.py -k test_to_float64_after_init

Stack from ghstack (oldest at bottom):

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

[ghstack-poisoned]

pytorch-bot · 2025-09-10T20:31:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162636

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 54938f1 with merge base 0819de4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 9531b99 Pull Request resolved: #162636

mori360

Thanks for the test

**Summary:** In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. This test is important as it verifies we can cast a replicated module to a different type after initialization, and import feature for enabling mixed precision, **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_to_float64_after_init cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

anshul-si · 2025-09-17T20:28:50Z

@pytorchbot merge

pytorchmergebot · 2025-09-17T20:30:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…162650) **Summary:** The parity tests train two identical models with the same inputs - one using a reference approach and one using the test approach (replicate) - then check that both models produce identical losses. This ensures the distributed training methods don't change the mathematical results compared to standard training. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_train_parity_single_group 2. pytest test/distributed/_composable/test_replicate_training.py -k test_train_parity_multi_group 3. pytest test/distributed/_composable/test_replicate_training.py -k test_train_parity_multi_group_cpu_offload_eager Pull Request resolved: #162650 Approved by: https://github.com/mori360 ghstack dependencies: #162631, #162636

…non-root module (#162654) **Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward Pull Request resolved: #162654 Approved by: https://github.com/mori360 ghstack dependencies: #162631, #162636, #162650

…iple times in a forward pass (#162656) **Summary:** Verifies that Replicate works correctly when a module is used multiple times in a single forward pass. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_multi_forward_module Pull Request resolved: #162656 Approved by: https://github.com/mori360 ghstack dependencies: #162631, #162636, #162650, #162654

**Summary:** Prefetching tests validate that distributed training systems can correctly overlap communication and computation by pre-loading parameters or data before they're needed. This test ensures the prefetching mechanism doesn't break training correctness while potentially improving performance by reducing idle time where computation waits for communication to complete. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_explicit_prefetching Pull Request resolved: #162658 Approved by: https://github.com/mori360 ghstack dependencies: #162631, #162636, #162650, #162654, #162656

…tes (#162785) **Summary:** In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. Verify replicate correctly handles post-optimizer events. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_post_optim_event Pull Request resolved: #162785 Approved by: https://github.com/mori360 ghstack dependencies: #162631, #162636, #162650, #162654, #162656, #162658

…162636) **Summary:** In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. This test is important as it verifies we can cast a replicated module to a different type after initialization, and import feature for enabling mixed precision, **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_to_float64_after_init Pull Request resolved: pytorch#162636 Approved by: https://github.com/mori360 ghstack dependencies: pytorch#162631

…ytorch#162650) **Summary:** The parity tests train two identical models with the same inputs - one using a reference approach and one using the test approach (replicate) - then check that both models produce identical losses. This ensures the distributed training methods don't change the mathematical results compared to standard training. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_train_parity_single_group 2. pytest test/distributed/_composable/test_replicate_training.py -k test_train_parity_multi_group 3. pytest test/distributed/_composable/test_replicate_training.py -k test_train_parity_multi_group_cpu_offload_eager Pull Request resolved: pytorch#162650 Approved by: https://github.com/mori360 ghstack dependencies: pytorch#162631, pytorch#162636

…non-root module (pytorch#162654) **Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward Pull Request resolved: pytorch#162654 Approved by: https://github.com/mori360 ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650

…iple times in a forward pass (pytorch#162656) **Summary:** Verifies that Replicate works correctly when a module is used multiple times in a single forward pass. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_multi_forward_module Pull Request resolved: pytorch#162656 Approved by: https://github.com/mori360 ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654

**Summary:** Prefetching tests validate that distributed training systems can correctly overlap communication and computation by pre-loading parameters or data before they're needed. This test ensures the prefetching mechanism doesn't break training correctness while potentially improving performance by reducing idle time where computation waits for communication to complete. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_explicit_prefetching Pull Request resolved: pytorch#162658 Approved by: https://github.com/mori360 ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656

…tes (pytorch#162785) **Summary:** In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. Verify replicate correctly handles post-optimizer events. **Test Cases** 1. pytest test/distributed/_composable/test_replicate_training.py -k test_post_optim_event Pull Request resolved: pytorch#162785 Approved by: https://github.com/mori360 ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656, pytorch#162658