Skip to content

Conversation

eellison
Copy link
Contributor

@eellison eellison commented Sep 28, 2023

Stack from ghstack (oldest at bottom):

Partially fixes test_memory_format_factory_like_functions_preserve with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor.

This preserves the layout of _like operators when it corresponds to a torch.memory_format. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either #92920 or a to variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the replace_random pass, and because the two pointwise ops will get fused anyway.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 28, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110242

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 3 Pending, 1 Unrelated Failure

As of commit f1ec498 with merge base 06464a3 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@eellison eellison requested a review from int3 September 29, 2023 15:02
@eellison eellison added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 29, 2023
@eellison eellison requested a review from ezyang September 29, 2023 20:02
Copy link
Contributor

@int3 int3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some questions for my own understanding


Partially fixes `test_memory_format_factory_like_functions_preserve` with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor. 

This preserves the layout of `_like` operators when it corresponds to a `torch.memory_format`. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either #92920 or a `to` variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the `replace_random` pass, and because the two pointwise ops will get fused anyway.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]

Partially fixes `test_memory_format_factory_like_functions_preserve` with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor. 

This preserves the layout of `_like` operators when it corresponds to a `torch.memory_format`. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either #92920 or a `to` variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the `replace_random` pass, and because the two pointwise ops will get fused anyway.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]

Partially fixes `test_memory_format_factory_like_functions_preserve` with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor. 

This preserves the layout of `_like` operators when it corresponds to a `torch.memory_format`. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either #92920 or a `to` variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the `replace_random` pass, and because the two pointwise ops will get fused anyway.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]

Partially fixes `test_memory_format_factory_like_functions_preserve` with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor. 

This preserves the layout of `_like` operators when it corresponds to a `torch.memory_format`. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either #92920 or a `to` variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the `replace_random` pass, and because the two pointwise ops will get fused anyway.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]

Partially fixes `test_memory_format_factory_like_functions_preserve` with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor. 

This preserves the layout of `_like` operators when it corresponds to a `torch.memory_format`. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either #92920 or a `to` variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the `replace_random` pass, and because the two pointwise ops will get fused anyway.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@eellison eellison added the topic: not user facing topic category label Oct 2, 2023
@eellison
Copy link
Contributor Author

eellison commented Oct 2, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ezyang
Copy link
Contributor

ezyang commented Oct 2, 2023

Is it that difficult to also do the permutations? (This doesn't block land). Is it because we don't have copy_permuted? Maybe I will add this 🤔

@eellison
Copy link
Contributor Author

eellison commented Oct 2, 2023

Right, it's because we don't have copy_permuted

@ezyang
Copy link
Contributor

ezyang commented Oct 2, 2023

It's not too bad if you allow mutations (empty_permuted and then copy_) but I don't remember if these decomps have to be functional lol

@eellison
Copy link
Contributor Author

eellison commented Oct 2, 2023

They do have to be functional, yea.. I double checked with @bdhirsh for that reason. We could allow them and add another round of functionalization at the expense of compile time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants