-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Populate the eviction_policy field for load/store properly #91316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91316
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 2d263b3: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. ghstack-source-id: 23b441d Pull Request resolved: #91316
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. ghstack-source-id: 79e029b Pull Request resolved: #91316
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at: Details for Dev Infra teamRaised by workflow job |
run torchbench ? i tried a few ops with this and got mixed results. what was the benchmarking script ? try |
sure, let me try that |
It seems to get a consistent 1% speed-up running the command in #91316 (comment). I have to say that the benchmarks are not as stable as I would like them to, so I run them 3 times each:
Edit. well, I needed to run it on float16 because some op would complain with a hard error, but that shouldn't change much. |
What was the error you encountered? |
There's this check that was triggered when I run that command pytorch/aten/src/ATen/native/cuda/SoftMax.cu Lines 698 to 701 in f62a3ca
|
If |
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. ghstack-source-id: 79e029b Pull Request resolved: #91316
|
Yes, I meant |
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. ghstack-source-id: 9ca4f96 Pull Request resolved: #91316
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel mlazos soumith yanboliang anijain2305 chunyuan-w desertfire [ghstack-poisoned]
This helps with kernels that make use of caching like mid-range softmax which reads the data three times. Selecting `eviction_policy=evict_first` in the last loop of the softmax operation seems to give a 7-10% speed-up vs. selecting `evict_last` which was the previous option. I'll put up some benchmarks soon™. ghstack-source-id: f725e45 Pull Request resolved: #91316
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Should we attempt to revert this or do a forward fix? |
Given |
I am going to be on PTO for the next 3 weeks, so I'd say it's best to revert and I'll then look into this when I'm back. Sorry for that, the benchmarks in #91316 (comment) looked alright! |
Note to self: Perhaps we could get some extra speed-up by using |
FWIW, I believe that the performance drop of |
Stack from ghstack (oldest at bottom):
This helps with kernels that make use of caching like mid-range softmax
which reads the data three times.
Selecting
eviction_policy=evict_first
in the last loop of the softmaxoperation seems to give a 7-10% speed-up vs. selecting
evict_last
whichwas the previous option. I'll put up some benchmarks soon™.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @mlazos @soumith @yanboliang @anijain2305 @chunyuan-w @desertfire