Skip to content

Conversation

peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Jul 16, 2024

Stack from ghstack (oldest at bottom):

Resubmit of #128893

Currently a buffer represents both a tensor with physical storage and a
computation that produces the tensor as a result.

This PR attempts to split these into two different concepts in the scheduler.
This should allow us to have multiple outputs from a single operation.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Differential Revision: D59876059

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jul 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130831

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 53c49b0 with merge base eee76c8 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
@Chillee
Copy link
Collaborator

Chillee commented Jul 17, 2024

@Chillee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]
[ghstack-poisoned]
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Jul 19, 2024
Resubmit of pytorch#128893

Currently a buffer represents both a tensor with physical storage and a
computation that produces the tensor as a result.

This PR attempts to split these into two different concepts in the scheduler.
This should allow us to have multiple outputs from a single operation.

ghstack-source-id: 89a1a67
Pull Request resolved: pytorch#130831
[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Jul 20, 2024
…0832)

Resubmit of #129325

Previously each mutation was represented by a `MutationOutput` operation which
was a new scheduler node that must be scheduled immediately afterwards.

Now we have a single scheduler node, which produces mutiple `MutationOutput`
buffers as its output.

Pull Request resolved: #130832
Approved by: https://github.com/lezcano
ghstack dependencies: #130831
pytorchmergebot pushed a commit that referenced this pull request Jul 20, 2024
Resubmit of #129344

This fixes the DCE issue for attention output

Pull Request resolved: #130833
Approved by: https://github.com/lezcano
ghstack dependencies: #130831, #130832
pytorchmergebot pushed a commit that referenced this pull request Jul 20, 2024
Resubmit of #129346

Pull Request resolved: #130834
Approved by: https://github.com/lezcano
ghstack dependencies: #130831, #130832, #130833
DiweiSun pushed a commit to DiweiSun/pytorch that referenced this pull request Jul 22, 2024
…30831)

Resubmit of pytorch#128893

Currently a buffer represents both a tensor with physical storage and a
computation that produces the tensor as a result.

This PR attempts to split these into two different concepts in the scheduler.
This should allow us to have multiple outputs from a single operation.

Differential Revision: [D59876059](https://our.internmc.facebook.com/intern/diff/D59876059)
Pull Request resolved: pytorch#130831
Approved by: https://github.com/lezcano
DiweiSun pushed a commit to DiweiSun/pytorch that referenced this pull request Jul 22, 2024
…orch#130832)

Resubmit of pytorch#129325

Previously each mutation was represented by a `MutationOutput` operation which
was a new scheduler node that must be scheduled immediately afterwards.

Now we have a single scheduler node, which produces mutiple `MutationOutput`
buffers as its output.

Pull Request resolved: pytorch#130832
Approved by: https://github.com/lezcano
ghstack dependencies: pytorch#130831
DiweiSun pushed a commit to DiweiSun/pytorch that referenced this pull request Jul 22, 2024
Resubmit of pytorch#129344

This fixes the DCE issue for attention output

Pull Request resolved: pytorch#130833
Approved by: https://github.com/lezcano
ghstack dependencies: pytorch#130831, pytorch#130832
DiweiSun pushed a commit to DiweiSun/pytorch that referenced this pull request Jul 22, 2024
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Jul 22, 2024
Resubmit of pytorch#128893

Currently a buffer represents both a tensor with physical storage and a
computation that produces the tensor as a result.

This PR attempts to split these into two different concepts in the scheduler.
This should allow us to have multiple outputs from a single operation.

ghstack-source-id: 14fdad7
Pull Request resolved: pytorch#130831
for i in range(1, len(comm_nodes)):
# Enforce ordering by making previous comm a `WeakDep` dependency of the next comm
comm_nodes[i].add_fake_dep(WeakDep(comm_nodes[i - 1].get_name()))
comm_nodes[i].add_fake_dep(WeakDep(item(comm_nodes[i - 1].get_buffer_names())))
Copy link
Contributor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your link requires some kind of authorization, can you share the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the error:

2024-07-22T14:29:20.7393861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1638, in codegen
2024-07-22T14:29:20.7394041Z     self.scheduler = Scheduler(self.operations)
2024-07-22T14:29:20.7394662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 246, in time_wrapper
2024-07-22T14:29:20.7394786Z     r = func(*args, **kwargs)
2024-07-22T14:29:20.7395433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 1551, in __init__
2024-07-22T14:29:20.7395638Z     comms.decide_global_ordering_of_comms(self.nodes)
2024-07-22T14:29:20.7396393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/comms.py", line 238, in decide_global_ordering_of_comms
2024-07-22T14:29:20.7396802Z     comm_nodes[i].add_fake_dep(WeakDep(item(comm_nodes[i - 1].get_buffer_names())))
2024-07-22T14:29:20.7397376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/comms.py", line 233, in item
2024-07-22T14:29:20.7397534Z     assert len(x) == 1
2024-07-22T14:29:20.7397895Z torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-07-22T14:29:20.7398013Z AssertionError: 
2024-07-22T14:29:20.7398019Z 
2024-07-22T14:29:20.7398319Z Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
2024-07-22T14:29:20.7398333Z 
2024-07-22T14:29:20.7398338Z 
2024-07-22T14:29:20.7398627Z You can suppress this exception and fall back to eager by setting:
2024-07-22T14:29:20.7398753Z     import torch._dynamo
2024-07-22T14:29:20.7398940Z     torch._dynamo.config.suppress_errors = True
2024-07-22T14:29:20.7398950Z 
2024-07-22T14:29:20.7398955Z 
2024-07-22T14:29:20.7399219Z To execute this test, run the following from the base repo dir:
2024-07-22T14:29:20.7399972Z     python test/distributed/test_compute_comm_reordering.py -k TestComputeCommReorderingMultiProc.test_reorder_compute_for_overlap

This PR should make sure the unit test runs on regular PR CI: #131415, it will be cool to have this PR stack on top of that PR for easier testing. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added ciflow/periodic in the reverted PRs

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
…30831)

Resubmit of pytorch#128893

Currently a buffer represents both a tensor with physical storage and a
computation that produces the tensor as a result.

This PR attempts to split these into two different concepts in the scheduler.
This should allow us to have multiple outputs from a single operation.

Differential Revision: [D59876059](https://our.internmc.facebook.com/intern/diff/D59876059)
Pull Request resolved: pytorch#130831
Approved by: https://github.com/lezcano
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
…orch#130832)

Resubmit of pytorch#129325

Previously each mutation was represented by a `MutationOutput` operation which
was a new scheduler node that must be scheduled immediately afterwards.

Now we have a single scheduler node, which produces mutiple `MutationOutput`
buffers as its output.

Pull Request resolved: pytorch#130832
Approved by: https://github.com/lezcano
ghstack dependencies: pytorch#130831
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
Resubmit of pytorch#129344

This fixes the DCE issue for attention output

Pull Request resolved: pytorch#130833
Approved by: https://github.com/lezcano
ghstack dependencies: pytorch#130831, pytorch#130832
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
@github-actions github-actions bot deleted the gh/peterbell10/758/head branch August 23, 2024 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants