New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Decomposition] Trunc #109319
[Decomposition] Trunc #109319
Conversation
This pull request was exported from Phabricator. Differential Revision: D49042033 |
This PR needs a
|
Summary: bypass-github-pytorch-ci-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Differential Revision: D49042033
5391170
to
bc8772d
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
Summary: bypass-github-pytorch-ci-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Differential Revision: D49042033
bc8772d
to
9a54627
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
Summary: bypass-github-export-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Differential Revision: D49042033
9a54627
to
169a7ce
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
Summary: bypass-github-pytorch-ci-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Differential Revision: D49042033
169a7ce
to
01c2185
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
Summary: bypass-github-pytorch-ci-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Differential Revision: D49042033
01c2185
to
f3c7343
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
Summary: bypass-github-pytorch-ci-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Differential Revision: D49042033
f3c7343
to
0b70643
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
Summary: bypass-github-pytorch-ci-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Reviewed By: SS-JIA, kirklandsign Differential Revision: D49042033
Summary: bypass-github-pytorch-ci-checks Add Decomp for Trunc and add it to core_aten_decompositions Test Plan: Phabricator + OSS Tests Reviewed By: SS-JIA, kirklandsign Differential Revision: D49042033
0b70643
to
fb5c01a
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
fb5c01a
to
5f743af
Compare
This pull request was exported from Phabricator. Differential Revision: D49042033 |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@@ -4020,6 +4020,11 @@ def scaled_dot_product_flash_attention( | |||
) | |||
|
|||
|
|||
@register_decomposition([aten.trunc]) | |||
def trunc(self: Tensor, **kwargs) -> Tensor: | |||
return torch.where(self > 0, torch.floor(self), torch.ceil(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we decomposing what is likely a hardware intrinsic operation into multiple operations? It seems to me that just because you can doesn't mean you should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterbell10 makes sense. We will remove this decomp and promote this operator to core, and provide similar treatment to other operators that map cleanly to hardware intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterbell10 to be sure I understand. If we were to do this decomp here it results in 1) perf loss and 2) potential numerics mismatch. Perf loss should be recoverable by pattern matching + fusion. 2 is a tough one and for that I agree with you that we should probably make this core aten op. But would you agree with me on 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also be asking how many consumers do you expect would want to "re-compose" this operator. I suspect since this is a hardware intrinsic, most backends would want it to be specialized but now they have to pattern match it back into trunc first. So in the most common case I expect this to create more work, not save work.
On the other hand if you could show that most backends had a hard time implementing trunc and this would save a lot of maintenance then I could see the performance trade-off being worth it.
Following up from [this comment](#109319 (comment)). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. [ghstack-poisoned]
Following up from [this comment](#109319 (comment)). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. [ghstack-poisoned]
…c to core" Following up from [this comment](#109319 (comment)). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Following up from [this comment](#109319 (comment)). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Following up from [this comment](#109319 (comment)). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: #109902 Approved by: https://github.com/peterbell10
Following up from [this comment](pytorch#109319 (comment)). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: pytorch#109902 Approved by: https://github.com/peterbell10
Summary:
Add Decomp for Trunc and add it to core_aten_decompositions
Differential Revision: D49042033
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov