-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove redundant to_dtype in Fused Schedular Nodes #118365
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118365
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit d40433e with merge base cc7ef43 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 5e545931882744f7956dde1a110e4b588eb90ffb Pull Request resolved: #118365
[ghstack-poisoned]
ghstack-source-id: 3a5c59be4e055b89138160d11d76082a230d17f0 Pull Request resolved: #118365
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Successfully rebased |
ghstack-source-id: 8eb24f77ea895cdbe1c8c26c39698fb846c59b88 Pull Request resolved: #118365
ghstack-source-id: c4d1795ac6794efa08f17d009dd764e375ad412c Pull Request resolved: #118365
Fix #115260. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
ghstack-source-id: 082a0003f49294b2068b9af7855ff8cea64ce044 Pull Request resolved: #118365
Fix #115260. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
ghstack-source-id: c47d03d0ac05b02f76a2d369eb3a9dbf309d5601 Pull Request resolved: #118365
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will also store `float buffer` so we can load `float buffer` from `store_cache` and directly do compute on it. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
ghstack-source-id: 0dc3c4ad581f16f853ac202b64ed2b6b99edfac7 Pull Request resolved: #118365
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will also store `float buffer` so we can load `float buffer` from `store_cache` and directly do compute on it. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
ghstack-source-id: 931c10e82940e96e0c797b01e028d76f31bac3f7 Pull Request resolved: #118365
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`. Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
ghstack-source-id: 48eecd6052ebe5b1b3b05d80689e8bb3f4cac6f7 Pull Request resolved: #118365
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`. Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
@jansel Yes, please review. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
ghstack-source-id: 7f256eb8aa4f6aaed628ab2566defd0179ee320f Pull Request resolved: #118365
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`. Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
#120243) This reverts commit cc7ef43. Manual revert because of the conflict in: test/inductor/test_cpu_repro.py , conflict with this PR: #118365 Pull Request resolved: #120243 Approved by: https://github.com/malfet, https://github.com/huydhn
Fix #115260.
This issue is triggered by
FusedSchedularNodes
cases.We always store
lowp buffer
tostore_cache
then loadlowp buffer
fromstore_cache
andconvert it to float
beforecompute ops
.Now we will generate a
{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}
incse.cache
.Then the
to_dtype(float32)
afterload
will hit this cache and not generate a new var with cast codes.Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames