Skip to content

Conversation

@chenyang78
Copy link
Contributor

@chenyang78 chenyang78 commented Jan 24, 2024

Stack from ghstack (oldest at bottom):

Previously, we generated the grid argument with tree.numel for
a benchmark TritonKernel. This was not correct, because it
didn't match the launch config used for profiling and running.

This PR fixed the issue by emitting the grid value computed
by the kernel's grid_fn, which is used by the profiler and
the kernel's runner.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @kadeng @muchulee8 @aakhundov @ColinPeppler

Previously, we generated the grid argument with tree.numel for
a benchmark TritonKernel. This was not correct, because it
didn't match the launch config used for profiling and running.

This PR fixed the issue by emitting the grid value computed
by the kernel's grid_fn, which is used by the profiler and
the kernel's runner.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118202

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 71b31d2 with merge base eebe7e1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

chenyang78 added a commit that referenced this pull request Jan 24, 2024
Previously, we generated the grid argument with tree.numel for
a benchmark TritonKernel. This was not correct, because it
didn't match the launch config used for profiling and running.

This PR fixed the issue by emitting the grid value computed
by the kernel's grid_fn, which is used by the profiler and
the kernel's runner.

ghstack-source-id: 5473036
Pull Request resolved: #118202
@chenyang78 chenyang78 added the topic: not user facing topic category label Jan 24, 2024
Copy link
Contributor

@shunting314 shunting314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the grid size computation for the benchmark harness of triton templates.


if config.benchmark_kernel:
src_code = f"{kernel.imports_for_benchmark_kernel()}\n{src_code}\n{kernel.codegen_kernel_benchmark().getvalue()}"
grid_args = [V.graph.sizevars.size_hint(s) for s in kernel.call_sizes]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call sizevars.size_hints directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

with open(compiled_module.__file__) as f:
source_code = f.read()
FileCheck().check_count(
"grid=(608, 1, 1)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet this test is potentially quite flaky since the grid size depends on the config we pick for triton matmul template which is decided by the autotuning result.

We probably can either read back the block size from the generated wrapper code and use that to compute the grid, or mock the config list to a single config so we know for sure what grid size should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. Fixed. Thanks.

Previously, we generated the grid argument with tree.numel for
a benchmark TritonKernel. This was not correct, because it
didn't match the launch config used for profiling and running.

This PR fixed the issue by emitting the grid value computed
by the kernel's grid_fn, which is used by the profiler and
the kernel's runner.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
chenyang78 added a commit that referenced this pull request Jan 25, 2024
Previously, we generated the grid argument with tree.numel for
a benchmark TritonKernel. This was not correct, because it
didn't match the launch config used for profiling and running.

This PR fixed the issue by emitting the grid value computed
by the kernel's grid_fn, which is used by the profiler and
the kernel's runner.

ghstack-source-id: eb0a5e2
Pull Request resolved: #118202
@chenyang78
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 25, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants