Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove redundant to_dtype in Fused Schedular Nodes #118365

Closed
wants to merge 15 commits into from

Conversation

zhuhaozhe
Copy link
Contributor

@zhuhaozhe zhuhaozhe commented Jan 26, 2024

Fix #115260.
This issue is triggered by FusedSchedularNodes cases.
We always store lowp buffer to store_cache then load lowp buffer from store_cache and convert it to float before compute ops.
Now we will generate a {key: to(float32)_expr, value: the float32 cse var before to_dtype and store} in cse.cache.
Then the to_dtype(float32) after load will hit this cache and not generate a new var with cast codes.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames

Copy link

pytorch-bot bot commented Jan 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118365

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit d40433e with merge base cc7ef43 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zhuhaozhe added a commit that referenced this pull request Jan 26, 2024
ghstack-source-id: 5e545931882744f7956dde1a110e4b588eb90ffb
Pull Request resolved: #118365
zhuhaozhe added a commit that referenced this pull request Jan 26, 2024
ghstack-source-id: 3a5c59be4e055b89138160d11d76082a230d17f0
Pull Request resolved: #118365
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@zhuhaozhe zhuhaozhe added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 26, 2024
@zhuhaozhe
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/zhuhaozhe/9/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/118365)

pytorchmergebot pushed a commit that referenced this pull request Jan 30, 2024
ghstack-source-id: 8eb24f77ea895cdbe1c8c26c39698fb846c59b88
Pull Request resolved: #118365
@zhuhaozhe zhuhaozhe marked this pull request as draft January 31, 2024 04:59
zhuhaozhe added a commit that referenced this pull request Jan 31, 2024
ghstack-source-id: c4d1795ac6794efa08f17d009dd764e375ad412c
Pull Request resolved: #118365
Fix #115260.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request Jan 31, 2024
ghstack-source-id: 082a0003f49294b2068b9af7855ff8cea64ce044
Pull Request resolved: #118365
Fix #115260.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request Jan 31, 2024
ghstack-source-id: c47d03d0ac05b02f76a2d369eb3a9dbf309d5601
Pull Request resolved: #118365
Fix #115260.
This issue is triggered by `FusedSchedularNodes` cases.
We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`.
Now we will also store `float buffer` so we can load `float buffer` from `store_cache` and directly do compute on it.





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@zhuhaozhe zhuhaozhe requested a review from jgong5 January 31, 2024 07:53
zhuhaozhe added a commit that referenced this pull request Feb 2, 2024
ghstack-source-id: 0dc3c4ad581f16f853ac202b64ed2b6b99edfac7
Pull Request resolved: #118365
Fix #115260.
This issue is triggered by `FusedSchedularNodes` cases.
We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`.
Now we will also store `float buffer` so we can load `float buffer` from `store_cache` and directly do compute on it.





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request Feb 2, 2024
ghstack-source-id: 931c10e82940e96e0c797b01e028d76f31bac3f7
Pull Request resolved: #118365
Fix #115260.
This issue is triggered by `FusedSchedularNodes` cases.
We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`.
Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`.
Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes.





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
test/inductor/test_cpu_repro.py Outdated Show resolved Hide resolved
torch/_inductor/codegen/cpp.py Outdated Show resolved Hide resolved
torch/_inductor/codegen/cpp.py Show resolved Hide resolved
zhuhaozhe added a commit that referenced this pull request Feb 4, 2024
ghstack-source-id: 48eecd6052ebe5b1b3b05d80689e8bb3f4cac6f7
Pull Request resolved: #118365
Fix #115260.
This issue is triggered by `FusedSchedularNodes` cases.
We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`.
Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`.
Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes.





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@jgong5 jgong5 marked this pull request as ready for review February 19, 2024 06:01
@jgong5 jgong5 changed the title remove rebundant to_dtype in Fused Schedular Nodes remove redundant to_dtype in Fused Schedular Nodes Feb 19, 2024
@jgong5
Copy link
Collaborator

jgong5 commented Feb 19, 2024

Is this ready for review? Still marked as draft.

@jansel Yes, please review.

@zhuhaozhe zhuhaozhe added the topic: not user facing topic category label Feb 20, 2024
@zhuhaozhe
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch rebase origin/main returned non-zero exit code 1

Rebasing (1/1)
Auto-merging test/inductor/test_cpu_repro.py
CONFLICT (content): Merge conflict in test/inductor/test_cpu_repro.py
Auto-merging torch/_inductor/codegen/cpp.py
error: could not apply 443f30899f6... remove redundant to_dtype in Fused Schedular Nodes (#118365)
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 443f30899f6... remove redundant to_dtype in Fused Schedular Nodes (#118365)
Details for Dev Infra team Raised by workflow job

zhuhaozhe added a commit that referenced this pull request Feb 20, 2024
ghstack-source-id: 7f256eb8aa4f6aaed628ab2566defd0179ee320f
Pull Request resolved: #118365
Fix #115260.
This issue is triggered by `FusedSchedularNodes` cases.
We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`.
Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`.
Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes.





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames

[ghstack-poisoned]
@zhuhaozhe
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@zhuhaozhe
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #118365, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@zhuhaozhe
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Feb 20, 2024
#120243)

This reverts commit cc7ef43.

Manual revert because of the conflict in: test/inductor/test_cpu_repro.py , conflict with this PR: #118365

Pull Request resolved: #120243
Approved by: https://github.com/malfet, https://github.com/huydhn
@github-actions github-actions bot deleted the gh/zhuhaozhe/9/head branch March 22, 2024 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants