[inductor] Separate Buffer and Operation into two concepts #130831

peterbell10 · 2024-07-16T12:29:33Z

Stack from ghstack (oldest at bottom):

Resubmit of #128893

Currently a buffer represents both a tensor with physical storage and a
computation that produces the tensor as a result.

This PR attempts to split these into two different concepts in the scheduler.
This should allow us to have multiple outputs from a single operation.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Differential Revision: D59876059

[ghstack-poisoned]

pytorch-bot · 2024-07-16T12:29:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130831

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 53c49b0 with merge base eee76c8 ():

NEW FAILURES - The following jobs have failed:

inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh)
Process completed with exit code 1.
inductor / rocm6.1-py3.8-inductor / test (inductor, 2, 2, linux.rocm.gpu.2) (gh)
test_ops 1/5 failed!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

Chillee · 2024-07-17T20:13:44Z

@Chillee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]

Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. ghstack-source-id: 89a1a67 Pull Request resolved: pytorch#130831

[ghstack-poisoned]

…0832) Resubmit of #129325 Previously each mutation was represented by a `MutationOutput` operation which was a new scheduler node that must be scheduled immediately afterwards. Now we have a single scheduler node, which produces mutiple `MutationOutput` buffers as its output. Pull Request resolved: #130832 Approved by: https://github.com/lezcano ghstack dependencies: #130831

Resubmit of #129344 This fixes the DCE issue for attention output Pull Request resolved: #130833 Approved by: https://github.com/lezcano ghstack dependencies: #130831, #130832

Resubmit of #129346 Pull Request resolved: #130834 Approved by: https://github.com/lezcano ghstack dependencies: #130831, #130832, #130833

…30831) Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. Differential Revision: [D59876059](https://our.internmc.facebook.com/intern/diff/D59876059) Pull Request resolved: pytorch#130831 Approved by: https://github.com/lezcano

…orch#130832) Resubmit of pytorch#129325 Previously each mutation was represented by a `MutationOutput` operation which was a new scheduler node that must be scheduled immediately afterwards. Now we have a single scheduler node, which produces mutiple `MutationOutput` buffers as its output. Pull Request resolved: pytorch#130832 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831

Resubmit of pytorch#129344 This fixes the DCE issue for attention output Pull Request resolved: pytorch#130833 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832

Resubmit of pytorch#129346 Pull Request resolved: pytorch#130834 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832, pytorch#130833

Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. ghstack-source-id: 14fdad7 Pull Request resolved: pytorch#130831

yf225 · 2024-07-22T20:28:40Z

torch/_inductor/comms.py

    for i in range(1, len(comm_nodes)):
        # Enforce ordering by making previous comm a `WeakDep` dependency of the next comm
-        comm_nodes[i].add_fake_dep(WeakDep(comm_nodes[i - 1].get_name()))
+        comm_nodes[i].add_fake_dep(WeakDep(item(comm_nodes[i - 1].get_buffer_names())))


@peterbell10 I believe this might be causing an error in multi-gpu runs: https://productionresultssa6.blob.core.windows.net/actions-results/fd465c4e-dbd7-42de-bbe3-a1aadc133a4b/workflow-job-run-958b70a4-e428-5161-9e96-1ac739b382a7/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-07-22T20%3A35%3A12Z&sig=AMvNE0N7lR5HFJsAqcB0GFtgXnlavcD1Ny6WPf7w9Kc%3D&ske=2024-07-23T05%3A54%3A19Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-07-22T17%3A54%3A19Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2023-11-03&sp=r&spr=https&sr=b&st=2024-07-22T20%3A25%3A07Z&sv=2023-11-03 cc. @atalman @yifuwang

Your link requires some kind of authorization, can you share the error?

I believe this is the error:

2024-07-22T14:29:20.7393861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1638, in codegen 2024-07-22T14:29:20.7394041Z self.scheduler = Scheduler(self.operations) 2024-07-22T14:29:20.7394662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 246, in time_wrapper 2024-07-22T14:29:20.7394786Z r = func(*args, **kwargs) 2024-07-22T14:29:20.7395433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 1551, in __init__ 2024-07-22T14:29:20.7395638Z comms.decide_global_ordering_of_comms(self.nodes) 2024-07-22T14:29:20.7396393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/comms.py", line 238, in decide_global_ordering_of_comms 2024-07-22T14:29:20.7396802Z comm_nodes[i].add_fake_dep(WeakDep(item(comm_nodes[i - 1].get_buffer_names()))) 2024-07-22T14:29:20.7397376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/comms.py", line 233, in item 2024-07-22T14:29:20.7397534Z assert len(x) == 1 2024-07-22T14:29:20.7397895Z torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: 2024-07-22T14:29:20.7398013Z AssertionError: 2024-07-22T14:29:20.7398019Z 2024-07-22T14:29:20.7398319Z Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information 2024-07-22T14:29:20.7398333Z 2024-07-22T14:29:20.7398338Z 2024-07-22T14:29:20.7398627Z You can suppress this exception and fall back to eager by setting: 2024-07-22T14:29:20.7398753Z import torch._dynamo 2024-07-22T14:29:20.7398940Z torch._dynamo.config.suppress_errors = True 2024-07-22T14:29:20.7398950Z 2024-07-22T14:29:20.7398955Z 2024-07-22T14:29:20.7399219Z To execute this test, run the following from the base repo dir: 2024-07-22T14:29:20.7399972Z python test/distributed/test_compute_comm_reordering.py -k TestComputeCommReorderingMultiProc.test_reorder_compute_for_overlap

This PR should make sure the unit test runs on regular PR CI: #131415, it will be cool to have this PR stack on top of that PR for easier testing. Thanks!

I've added ciflow/periodic in the reverted PRs

…30831) Resubmit of pytorch#128893 Currently a buffer represents both a tensor with physical storage and a computation that produces the tensor as a result. This PR attempts to split these into two different concepts in the scheduler. This should allow us to have multiple outputs from a single operation. Differential Revision: [D59876059](https://our.internmc.facebook.com/intern/diff/D59876059) Pull Request resolved: pytorch#130831 Approved by: https://github.com/lezcano

…orch#130832) Resubmit of pytorch#129325 Previously each mutation was represented by a `MutationOutput` operation which was a new scheduler node that must be scheduled immediately afterwards. Now we have a single scheduler node, which produces mutiple `MutationOutput` buffers as its output. Pull Request resolved: pytorch#130832 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831

Resubmit of pytorch#129344 This fixes the DCE issue for attention output Pull Request resolved: pytorch#130833 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832

Resubmit of pytorch#129346 Pull Request resolved: pytorch#130834 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#130831, pytorch#130832, pytorch#130833

Update

9e9bdf5

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Jul 16, 2024

pytorchbot added the open source label Jul 16, 2024

peterbell10 added the topic: not user facing topic category label Jul 16, 2024

peterbell10 requested a review from lezcano July 16, 2024 12:31

lezcano approved these changes Jul 16, 2024

View reviewed changes

Update

c902731

[ghstack-poisoned]

Resolve conflict

92bc306

[ghstack-poisoned]

Fix more conflicts

b3bc1c1

[ghstack-poisoned]

Update

53c49b0

[ghstack-poisoned]

pytorchmergebot closed this in 27c2a0d Jul 20, 2024

pytorchmergebot added the Merged label Jul 20, 2024

pytorchmergebot pushed a commit that referenced this pull request Jul 20, 2024

[inductor] Kill mark_node_as_mutating (#130834)

33f036a

Resubmit of #129346 Pull Request resolved: #130834 Approved by: https://github.com/lezcano ghstack dependencies: #130831, #130832, #130833

yf225 reviewed Jul 22, 2024

View reviewed changes

henrylhtsang mentioned this pull request Jul 31, 2024

[BE][typing] fix types in common pruning #132309

Closed

github-actions bot deleted the gh/peterbell10/758/head branch August 23, 2024 01:59

This was referenced Sep 3, 2024

svg figures generations are wrong #135036

Closed

Fix SVG graph generations #135046

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] Separate Buffer and Operation into two concepts #130831

[inductor] Separate Buffer and Operation into two concepts #130831

Uh oh!

peterbell10 commented Jul 16, 2024 •

edited by Chillee

Loading

Uh oh!

pytorch-bot bot commented Jul 16, 2024 •

edited

Loading

Uh oh!

Chillee commented Jul 17, 2024

Uh oh!

yf225 Jul 22, 2024

Uh oh!

peterbell10 Jul 22, 2024

Uh oh!

yf225 Jul 23, 2024

Uh oh!

peterbell10 Jul 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[inductor] Separate Buffer and Operation into two concepts #130831

[inductor] Separate Buffer and Operation into two concepts #130831

Uh oh!

Conversation

peterbell10 commented Jul 16, 2024 • edited by Chillee Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130831

❌ 2 New Failures

Uh oh!

Chillee commented Jul 17, 2024

Uh oh!

yf225 Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

peterbell10 Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

yf225 Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

peterbell10 Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

peterbell10 commented Jul 16, 2024 •

edited by Chillee

Loading

pytorch-bot bot commented Jul 16, 2024 •

edited

Loading