- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.7k
[inductor] correctly generate grid info for benchmark_kernel #118202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, we generated the grid argument with tree.numel for a benchmark TritonKernel. This was not correct, because it didn't match the launch config used for profiling and running. This PR fixed the issue by emitting the grid value computed by the kernel's grid_fn, which is used by the profiler and the kernel's runner. [ghstack-poisoned]
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118202
 Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 71b31d2 with merge base eebe7e1 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
Previously, we generated the grid argument with tree.numel for a benchmark TritonKernel. This was not correct, because it didn't match the launch config used for profiling and running. This PR fixed the issue by emitting the grid value computed by the kernel's grid_fn, which is used by the profiler and the kernel's runner. ghstack-source-id: 5473036 Pull Request resolved: #118202
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the grid size computation for the benchmark harness of triton templates.
        
          
                torch/_inductor/codegen/triton.py
              
                Outdated
          
        
      |  | ||
| if config.benchmark_kernel: | ||
| src_code = f"{kernel.imports_for_benchmark_kernel()}\n{src_code}\n{kernel.codegen_kernel_benchmark().getvalue()}" | ||
| grid_args = [V.graph.sizevars.size_hint(s) for s in kernel.call_sizes] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call sizevars.size_hints directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
| with open(compiled_module.__file__) as f: | ||
| source_code = f.read() | ||
| FileCheck().check_count( | ||
| "grid=(608, 1, 1)", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet this test is potentially quite flaky since the grid size depends on the config we pick for triton matmul template which is decided by the autotuning result.
We probably can either read back the block size from the generated wrapper code and use that to compute the grid, or mock the config list to a single config so we know for sure what grid size should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. Fixed. Thanks.
Previously, we generated the grid argument with tree.numel for a benchmark TritonKernel. This was not correct, because it didn't match the launch config used for profiling and running. This PR fixed the issue by emitting the grid value computed by the kernel's grid_fn, which is used by the profiler and the kernel's runner. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Previously, we generated the grid argument with tree.numel for a benchmark TritonKernel. This was not correct, because it didn't match the launch config used for profiling and running. This PR fixed the issue by emitting the grid value computed by the kernel's grid_fn, which is used by the profiler and the kernel's runner. ghstack-source-id: eb0a5e2 Pull Request resolved: #118202
| @pytorchbot merge | 
| Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team | 
Stack from ghstack (oldest at bottom):
Previously, we generated the grid argument with tree.numel for
a benchmark TritonKernel. This was not correct, because it
didn't match the launch config used for profiling and running.
This PR fixed the issue by emitting the grid value computed
by the kernel's grid_fn, which is used by the profiler and
the kernel's runner.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @kadeng @muchulee8 @aakhundov @ColinPeppler