Skip to content

Conversation

anshul-si
Copy link
Contributor

@anshul-si anshul-si commented Sep 10, 2025

Summary: Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module.

Test Cases

  1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward

Stack from ghstack (oldest at bottom):

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

Copy link

pytorch-bot bot commented Sep 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162654

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2568bdd with merge base 0819de4 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

anshul-si added a commit that referenced this pull request Sep 10, 2025
…non-root module

ghstack-source-id: 063e421
Pull Request resolved: #162654
@anshul-si anshul-si added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 10, 2025
…r root and non-root module"

**Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module. 

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward





cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim dcci

[ghstack-poisoned]
…r root and non-root module"

**Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module. 

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward





cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim dcci

[ghstack-poisoned]
…r root and non-root module"

**Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module. 

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward





cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim dcci

[ghstack-poisoned]
@anshul-si
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Sep 18, 2025
…iple times in a forward pass (#162656)

**Summary:** Verifies that Replicate works correctly when a module is used multiple times in a single forward pass.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_multi_forward_module

Pull Request resolved: #162656
Approved by: https://github.com/mori360
ghstack dependencies: #162631, #162636, #162650, #162654
pytorchmergebot pushed a commit that referenced this pull request Sep 18, 2025
**Summary:** Prefetching tests validate that distributed training systems can correctly overlap communication and computation by pre-loading parameters or data before they're needed. This test ensures the prefetching mechanism doesn't break training correctness while potentially improving performance by reducing idle time where computation waits for communication to complete.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_explicit_prefetching

Pull Request resolved: #162658
Approved by: https://github.com/mori360
ghstack dependencies: #162631, #162636, #162650, #162654, #162656
pytorchmergebot pushed a commit that referenced this pull request Sep 18, 2025
…tes (#162785)

**Summary:**  In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. Verify replicate correctly handles post-optimizer events.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_post_optim_event

Pull Request resolved: #162785
Approved by: https://github.com/mori360
ghstack dependencies: #162631, #162636, #162650, #162654, #162656, #162658
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…non-root module (pytorch#162654)

**Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward

Pull Request resolved: pytorch#162654
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…iple times in a forward pass (pytorch#162656)

**Summary:** Verifies that Replicate works correctly when a module is used multiple times in a single forward pass.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_multi_forward_module

Pull Request resolved: pytorch#162656
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
**Summary:** Prefetching tests validate that distributed training systems can correctly overlap communication and computation by pre-loading parameters or data before they're needed. This test ensures the prefetching mechanism doesn't break training correctness while potentially improving performance by reducing idle time where computation waits for communication to complete.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_explicit_prefetching

Pull Request resolved: pytorch#162658
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…tes (pytorch#162785)

**Summary:**  In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. Verify replicate correctly handles post-optimizer events.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_post_optim_event

Pull Request resolved: pytorch#162785
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656, pytorch#162658
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…non-root module (pytorch#162654)

**Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward

Pull Request resolved: pytorch#162654
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…iple times in a forward pass (pytorch#162656)

**Summary:** Verifies that Replicate works correctly when a module is used multiple times in a single forward pass.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_multi_forward_module

Pull Request resolved: pytorch#162656
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
**Summary:** Prefetching tests validate that distributed training systems can correctly overlap communication and computation by pre-loading parameters or data before they're needed. This test ensures the prefetching mechanism doesn't break training correctness while potentially improving performance by reducing idle time where computation waits for communication to complete.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_explicit_prefetching

Pull Request resolved: pytorch#162658
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…tes (pytorch#162785)

**Summary:**  In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. Verify replicate correctly handles post-optimizer events.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_post_optim_event

Pull Request resolved: pytorch#162785
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656, pytorch#162658
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…non-root module (pytorch#162654)

**Summary:** Verifies that Replicate correctly handles the scenario where forward and backward passes are run through both the root module and a non-root module.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_non_root_forward_backward

Pull Request resolved: pytorch#162654
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…iple times in a forward pass (pytorch#162656)

**Summary:** Verifies that Replicate works correctly when a module is used multiple times in a single forward pass.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_multi_forward_module

Pull Request resolved: pytorch#162656
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
**Summary:** Prefetching tests validate that distributed training systems can correctly overlap communication and computation by pre-loading parameters or data before they're needed. This test ensures the prefetching mechanism doesn't break training correctness while potentially improving performance by reducing idle time where computation waits for communication to complete.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_explicit_prefetching

Pull Request resolved: pytorch#162658
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…tes (pytorch#162785)

**Summary:**  In order to ensure that replicate acts as intended (a specialized version of hsdp) we need to make sure that it can pass the same tests that fully_shard can for training. Verify replicate correctly handles post-optimizer events.

**Test Cases**
1. pytest test/distributed/_composable/test_replicate_training.py -k test_post_optim_event

Pull Request resolved: pytorch#162785
Approved by: https://github.com/mori360
ghstack dependencies: pytorch#162631, pytorch#162636, pytorch#162650, pytorch#162654, pytorch#162656, pytorch#162658
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants