-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[SymmMem] Isolate set_device tests to avoid hang #161668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161668
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 822bec9 with merge base 763053d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
test_empty_strided_p2p_persistent allocates persistent symm memory tensors. However, it uses the same alloc_id for different tests, which could cause troubles if these tests are ran under the same process. This PR fixes the issue by using a different alloc_id for different test. #161668 should also fix the issue but we can land this PR for a safer test. ghstack-source-id: 72f67de Pull-Request-resolved: #161677
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
test_empty_strided_p2p_persistent allocates persistent symm memory tensors. However, it uses the same alloc_id for different tests, which could cause troubles if these tests are ran under the same process. This PR fixes the issue by using a different alloc_id for different test. #161668 should also fix the issue but we can land this PR for a safer test. ghstack-source-id: 0e89aff Pull-Request-resolved: #161677
test_empty_strided_p2p_persistent allocates persistent symm memory tensors. However, it uses the same alloc_id for different tests, which could cause troubles if these tests are ran under the same process. This PR fixes the issue by using a different alloc_id for different test. #161668 should also fix the issue but we can land this PR for a safer test. ghstack-source-id: fa91e94 Pull-Request-resolved: #161677
test_empty_strided_p2p_persistent allocates persistent symm memory tensors. However, it uses the same alloc_id for different tests, which could cause troubles if these tests are ran under the same process. This PR fixes the issue by using a different alloc_id for different test. #161668 should also fix the issue but we can land this PR for a safer test. Pull Request resolved: #161677 Approved by: https://github.com/kwen2501 ghstack dependencies: #161676
`test_symmetric_memory.py` hangs like this: ``` SymmetricMemoryTest::test_empty_strided_p2p_persistent_set_device_False PASSED [5.6364s] SymmetricMemoryTest::test_empty_strided_p2p_persistent_set_device_True ... ``` This set of tests parameterizes whether user sets the device before calling `symm_mem.emtpy`. However, such parametrization does not work well with `MultiProcContinuousTest` because the set device will "contaminate" the next test function. Solution is to move the "set device" tests to a separate test suite using the traditional `MultiProcessTestCase`, which would respawn processes every time. Hang is gone now. Pull Request resolved: pytorch#161668 Approved by: https://github.com/fegin
test_empty_strided_p2p_persistent allocates persistent symm memory tensors. However, it uses the same alloc_id for different tests, which could cause troubles if these tests are ran under the same process. This PR fixes the issue by using a different alloc_id for different test. pytorch#161668 should also fix the issue but we can land this PR for a safer test. Pull Request resolved: pytorch#161677 Approved by: https://github.com/kwen2501 ghstack dependencies: pytorch#161676
Stack from ghstack (oldest at bottom):
test_symmetric_memory.py
hangs like this:This set of tests parameterizes whether user sets the device before calling
symm_mem.emtpy
.However, such parametrization does not work well with
MultiProcContinuousTest
because the set device will "contaminate" the next test function.Solution is to move the "set device" tests to a separate test suite using the traditional
MultiProcessTestCase
, which would respawn processes every time.Hang is gone now.
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta