-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[inductor] Separate Buffer and Operation into two concepts #130831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130831
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 53c49b0 with merge base eee76c8 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@Chillee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
[ghstack-poisoned]
[ghstack-poisoned]
Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. ghstack-source-id: 89a1a67 Pull Request resolved: pytorch#130831
…0832) Resubmit of #129325 Previously each mutation was represented by a `MutationOutput` operation which was a new scheduler node that must be scheduled immediately afterwards. Now we have a single scheduler node, which produces mutiple `MutationOutput` buffers as its output. Pull Request resolved: #130832 Approved by: https://github.com/lezcano ghstack dependencies: #130831
Resubmit of #129344 This fixes the DCE issue for attention output Pull Request resolved: #130833 Approved by: https://github.com/lezcano ghstack dependencies: #130831, #130832
…30831) Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. Differential Revision: [D59876059](https://our.internmc.facebook.com/intern/diff/D59876059) Pull Request resolved: pytorch#130831 Approved by: https://github.com/lezcano
…orch#130832) Resubmit of pytorch#129325 Previously each mutation was represented by a `MutationOutput` operation which was a new scheduler node that must be scheduled immediately afterwards. Now we have a single scheduler node, which produces mutiple `MutationOutput` buffers as its output. Pull Request resolved: pytorch#130832 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831
Resubmit of pytorch#129344 This fixes the DCE issue for attention output Pull Request resolved: pytorch#130833 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832
Resubmit of pytorch#129346 Pull Request resolved: pytorch#130834 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832, pytorch#130833
Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. ghstack-source-id: 14fdad7 Pull Request resolved: pytorch#130831
for i in range(1, len(comm_nodes)): | ||
# Enforce ordering by making previous comm a `WeakDep` dependency of the next comm | ||
comm_nodes[i].add_fake_dep(WeakDep(comm_nodes[i - 1].get_name())) | ||
comm_nodes[i].add_fake_dep(WeakDep(item(comm_nodes[i - 1].get_buffer_names()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterbell10 I believe this might be causing an error in multi-gpu runs: https://productionresultssa6.blob.core.windows.net/actions-results/fd465c4e-dbd7-42de-bbe3-a1aadc133a4b/workflow-job-run-958b70a4-e428-5161-9e96-1ac739b382a7/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-07-22T20%3A35%3A12Z&sig=AMvNE0N7lR5HFJsAqcB0GFtgXnlavcD1Ny6WPf7w9Kc%3D&ske=2024-07-23T05%3A54%3A19Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-07-22T17%3A54%3A19Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2023-11-03&sp=r&spr=https&sr=b&st=2024-07-22T20%3A25%3A07Z&sv=2023-11-03 cc. @atalman @yifuwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your link requires some kind of authorization, can you share the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the error:
2024-07-22T14:29:20.7393861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1638, in codegen
2024-07-22T14:29:20.7394041Z self.scheduler = Scheduler(self.operations)
2024-07-22T14:29:20.7394662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 246, in time_wrapper
2024-07-22T14:29:20.7394786Z r = func(*args, **kwargs)
2024-07-22T14:29:20.7395433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 1551, in __init__
2024-07-22T14:29:20.7395638Z comms.decide_global_ordering_of_comms(self.nodes)
2024-07-22T14:29:20.7396393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/comms.py", line 238, in decide_global_ordering_of_comms
2024-07-22T14:29:20.7396802Z comm_nodes[i].add_fake_dep(WeakDep(item(comm_nodes[i - 1].get_buffer_names())))
2024-07-22T14:29:20.7397376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/comms.py", line 233, in item
2024-07-22T14:29:20.7397534Z assert len(x) == 1
2024-07-22T14:29:20.7397895Z torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-07-22T14:29:20.7398013Z AssertionError:
2024-07-22T14:29:20.7398019Z
2024-07-22T14:29:20.7398319Z Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
2024-07-22T14:29:20.7398333Z
2024-07-22T14:29:20.7398338Z
2024-07-22T14:29:20.7398627Z You can suppress this exception and fall back to eager by setting:
2024-07-22T14:29:20.7398753Z import torch._dynamo
2024-07-22T14:29:20.7398940Z torch._dynamo.config.suppress_errors = True
2024-07-22T14:29:20.7398950Z
2024-07-22T14:29:20.7398955Z
2024-07-22T14:29:20.7399219Z To execute this test, run the following from the base repo dir:
2024-07-22T14:29:20.7399972Z python test/distributed/test_compute_comm_reordering.py -k TestComputeCommReorderingMultiProc.test_reorder_compute_for_overlap
This PR should make sure the unit test runs on regular PR CI: #131415, it will be cool to have this PR stack on top of that PR for easier testing. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added ciflow/periodic in the reverted PRs
…30831) Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. Differential Revision: [D59876059](https://our.internmc.facebook.com/intern/diff/D59876059) Pull Request resolved: pytorch#130831 Approved by: https://github.com/lezcano
…orch#130832) Resubmit of pytorch#129325 Previously each mutation was represented by a `MutationOutput` operation which was a new scheduler node that must be scheduled immediately afterwards. Now we have a single scheduler node, which produces mutiple `MutationOutput` buffers as its output. Pull Request resolved: pytorch#130832 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831
Resubmit of pytorch#129344 This fixes the DCE issue for attention output Pull Request resolved: pytorch#130833 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832
Resubmit of pytorch#129346 Pull Request resolved: pytorch#130834 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832, pytorch#130833
Stack from ghstack (oldest at bottom):
Resubmit of #128893
Currently a buffer represents both a tensor with physical storage and a
computation that produces the tensor as a result.
This PR attempts to split these into two different concepts in the scheduler.
This should allow us to have multiple outputs from a single operation.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang
Differential Revision: D59876059