-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Restore _graph_executor_optimize flag after JIT test_profiler #96135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96135
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 73df80c: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes pytorch/pytorch#91483 Using a separate test class here, so that there is no need to run setup and teardown for all tests in TestJit. The root cause here is that test_profiler could be flaky and fail in the middle without the chance to restore `torch._C._set_graph_executor_optimize` to its original value (pytorch/pytorch#81626). This causes issues for all future tests running after as shown in pytorch/pytorch#91483. I suspect that is also the same root cause for several other flaky tests in the same file https://github.com/search?q=repo%3Apytorch%2Fpytorch+DISABLED+test_jit.TestScript&type=issues. After this fix is merged, I would let retry bot does it job and close these issues after 2 weeks. ### Testing The issue pytorch/pytorch#91483 can now be reproduced by adding `torch._C._set_graph_executor_optimize(False)` locally to see if the test fails: ``` diff --git a/test/test_jit.py b/test/test_jit.py index 2d1161d..17745d39182 100644 --- a/test/test_jit.py +++ b/test/test_jit.py @@ -5413,6 +5413,8 @@ a") FileCheck().check("int =").check("ListConstruct").check("aten::cat").run(str(g)) def test_stack(self): + torch._C._set_graph_executor_optimize(False) + with enable_profiling_mode_for_profiling_tests(): @torch.jit.script def func(x): ``` It indeed fails: ``` ====================================================================== FAIL [0.006s]: test_stack (test_jit.TestScript) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_jit.py", line 5437, in test_stack self.assertAutodiffNode(func2.graph_for(x, y), True, ['aten::stack'], []) File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/testing/_internal/common_jit.py", line 282, in assertAutodiffNode self.assertEqual(should_autodiff_node, ##[endgroup] File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 2975, in assertEqual raise error_metas[0].to_error( AssertionError: Booleans mismatch: True is not False Failure in testing nodes' autodifferentiation. One or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. Specifically: ['aten::stack'] were not in one of the DifferentiableGraphs when they were expected to be. Did you intend for these nodes to be autodiffed? If not, remove them from the list of nonfusible nodes. ---------------------------------------------------------------------- Ran 2677 tests in 84.596s FAILED (failures=1, skipped=136, expected failures=13) ``` Pull Request resolved: pytorch/pytorch#96135 Approved by: https://github.com/clee2000
Fixes pytorch/pytorch#91483 Using a separate test class here, so that there is no need to run setup and teardown for all tests in TestJit. The root cause here is that test_profiler could be flaky and fail in the middle without the chance to restore `torch._C._set_graph_executor_optimize` to its original value (pytorch/pytorch#81626). This causes issues for all future tests running after as shown in pytorch/pytorch#91483. I suspect that is also the same root cause for several other flaky tests in the same file https://github.com/search?q=repo%3Apytorch%2Fpytorch+DISABLED+test_jit.TestScript&type=issues. After this fix is merged, I would let retry bot does it job and close these issues after 2 weeks. ### Testing The issue pytorch/pytorch#91483 can now be reproduced by adding `torch._C._set_graph_executor_optimize(False)` locally to see if the test fails: ``` diff --git a/test/test_jit.py b/test/test_jit.py index 2d1161d..17745d39182 100644 --- a/test/test_jit.py +++ b/test/test_jit.py @@ -5413,6 +5413,8 @@ a") FileCheck().check("int =").check("ListConstruct").check("aten::cat").run(str(g)) def test_stack(self): + torch._C._set_graph_executor_optimize(False) + with enable_profiling_mode_for_profiling_tests(): @torch.jit.script def func(x): ``` It indeed fails: ``` ====================================================================== FAIL [0.006s]: test_stack (test_jit.TestScript) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_jit.py", line 5437, in test_stack self.assertAutodiffNode(func2.graph_for(x, y), True, ['aten::stack'], []) File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/testing/_internal/common_jit.py", line 282, in assertAutodiffNode self.assertEqual(should_autodiff_node, ##[endgroup] File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 2975, in assertEqual raise error_metas[0].to_error( AssertionError: Booleans mismatch: True is not False Failure in testing nodes' autodifferentiation. One or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. Specifically: ['aten::stack'] were not in one of the DifferentiableGraphs when they were expected to be. Did you intend for these nodes to be autodiffed? If not, remove them from the list of nonfusible nodes. ---------------------------------------------------------------------- Ran 2677 tests in 84.596s FAILED (failures=1, skipped=136, expected failures=13) ``` Pull Request resolved: pytorch/pytorch#96135 Approved by: https://github.com/clee2000
…h#96135) Fixes pytorch#91483 Using a separate test class here, so that there is no need to run setup and teardown for all tests in TestJit. The root cause here is that test_profiler could be flaky and fail in the middle without the chance to restore `torch._C._set_graph_executor_optimize` to its original value (pytorch#81626). This causes issues for all future tests running after as shown in pytorch#91483. I suspect that is also the same root cause for several other flaky tests in the same file https://github.com/search?q=repo%3Apytorch%2Fpytorch+DISABLED+test_jit.TestScript&type=issues. After this fix is merged, I would let retry bot does it job and close these issues after 2 weeks. ### Testing The issue pytorch#91483 can now be reproduced by adding `torch._C._set_graph_executor_optimize(False)` locally to see if the test fails: ``` diff --git a/test/test_jit.py b/test/test_jit.py index 2d1161d..17745d39182 100644 --- a/test/test_jit.py +++ b/test/test_jit.py @@ -5413,6 +5413,8 @@ a") FileCheck().check("int =").check("ListConstruct").check("aten::cat").run(str(g)) def test_stack(self): + torch._C._set_graph_executor_optimize(False) + with enable_profiling_mode_for_profiling_tests(): @torch.jit.script def func(x): ``` It indeed fails: ``` ====================================================================== FAIL [0.006s]: test_stack (test_jit.TestScript) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_jit.py", line 5437, in test_stack self.assertAutodiffNode(func2.graph_for(x, y), True, ['aten::stack'], []) File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/testing/_internal/common_jit.py", line 282, in assertAutodiffNode self.assertEqual(should_autodiff_node, ##[endgroup] File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 2975, in assertEqual raise error_metas[0].to_error( AssertionError: Booleans mismatch: True is not False Failure in testing nodes' autodifferentiation. One or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. Specifically: ['aten::stack'] were not in one of the DifferentiableGraphs when they were expected to be. Did you intend for these nodes to be autodiffed? If not, remove them from the list of nonfusible nodes. ---------------------------------------------------------------------- Ran 2677 tests in 84.596s FAILED (failures=1, skipped=136, expected failures=13) ``` Pull Request resolved: pytorch#96135 Approved by: https://github.com/clee2000
Fixes #91483
Using a separate test class here, so that there is no need to run setup and teardown for all tests in TestJit. The root cause here is that test_profiler could be flaky and fail in the middle without the chance to restore
torch._C._set_graph_executor_optimizeto its original value (#81626). This causes issues for all future tests running after as shown in #91483.I suspect that is also the same root cause for several other flaky tests in the same file https://github.com/search?q=repo%3Apytorch%2Fpytorch+DISABLED+test_jit.TestScript&type=issues. After this fix is merged, I would let retry bot does it job and close these issues after 2 weeks.
Testing
The issue #91483 can now be reproduced by adding
torch._C._set_graph_executor_optimize(False)locally to see if the test fails:It indeed fails: