Skip to content

Conversation

jbschlosser
Copy link
Contributor

@jbschlosser jbschlosser commented Jan 19, 2024

Stack from ghstack (oldest at bottom):

Fixes #117794

Fix tripped the assert here:

assert sub.stride() == outer_stride, \

From investigation: I found that functionalization of an in-place op (mul_ in this test case) results in the strides of TwoTensor's a / b components being mutated to be contiguous. This is not reflected in the outer tensor, causing the assert to be tripped.

After discussion with Brian, I address this in this PR by disallowing input mutations on non-contiguous tensor subclass inputs for now.

Differential Revision: D54214617

Copy link

pytorch-bot bot commented Jan 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117860

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 2ff2303 with merge base 26fbbc3 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jbschlosser added a commit that referenced this pull request Jan 19, 2024
ghstack-source-id: 42ac01f
Pull Request resolved: #117860
@jbschlosser jbschlosser marked this pull request as draft January 19, 2024 16:17
@jbschlosser jbschlosser requested a review from bdhirsh January 19, 2024 16:17
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Jan 22, 2024
ghstack-source-id: 6537276
Pull Request resolved: #117860
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Feb 8, 2024
ghstack-source-id: e512baf
Pull Request resolved: #117860
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Feb 13, 2024
ghstack-source-id: 8982014
Pull Request resolved: #117860
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Feb 14, 2024
ghstack-source-id: 987a79b
Pull Request resolved: #117860
Fixes #117794

TODO: investigate why the fix trips the assert mentioned in the issue

[ghstack-poisoned]
Fixes #117794

Fix tripped the assert here: https://github.com/pytorch/pytorch/blob/86dedebeafdd7b08d21432cebd7538437d3b7509/torch/utils/_python_dispatch.py#L216

From investigation: I found that functionalization of an in-place op (`mul_` in this test case) results in the strides of `TwoTensor`'s `a` / `b` components being mutated to be contiguous. This is not reflected in the outer tensor, causing the assert to be tripped.

To address this, I set the `dispatch_sizes_strides_policy` of `TwoTensor` so that sizes / strides / storage offset are always queried from the underlying components. Now the stride mutation of the inner tensors due to functionalization is properly reflected in the outer tensor.

This seems like a broader issue affecting subclasses whose sizes / strides are dependent on those of the inner tensors; for these to work properly with functionalization / PT2 as a whole, they probably want to follow this `dispatch_sizes_strides_policy="sizes"` pattern.

[ghstack-poisoned]
@jbschlosser jbschlosser requested a review from Chillee as a code owner February 16, 2024 23:20
jbschlosser added a commit that referenced this pull request Feb 16, 2024
ghstack-source-id: f37c442
Pull Request resolved: #117860
@jbschlosser jbschlosser requested a review from bdhirsh February 16, 2024 23:21
Fixes #117794

Fix tripped the assert here: https://github.com/pytorch/pytorch/blob/86dedebeafdd7b08d21432cebd7538437d3b7509/torch/utils/_python_dispatch.py#L216

From investigation: I found that functionalization of an in-place op (`mul_` in this test case) results in the strides of `TwoTensor`'s `a` / `b` components being mutated to be contiguous. This is not reflected in the outer tensor, causing the assert to be tripped.

After discussion with Brian, I address this by disallowing input mutations on non-contiguous tensor subclass inputs for now.

[ghstack-poisoned]
Fixes #117794

Fix tripped the assert here: https://github.com/pytorch/pytorch/blob/86dedebeafdd7b08d21432cebd7538437d3b7509/torch/utils/_python_dispatch.py#L216

From investigation: I found that functionalization of an in-place op (`mul_` in this test case) results in the strides of `TwoTensor`'s `a` / `b` components being mutated to be contiguous. This is not reflected in the outer tensor, causing the assert to be tripped.

After discussion with Brian, I address this in this PR by disallowing input mutations on non-contiguous tensor subclass inputs for now.

[ghstack-poisoned]
@jbschlosser jbschlosser requested a review from bdhirsh February 21, 2024 16:59
Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 21, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@jbschlosser
Copy link
Contributor Author

@pytorchbot merge -f "ignore pre-existing failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@davidberard98
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@github-actions github-actions bot deleted the gh/jbschlosser/117/head branch March 28, 2024 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants