Skip to content

Conversation

yf225
Copy link
Contributor

@yf225 yf225 commented Aug 18, 2025

Common benchmark suites like TritonBench uses triton.testing.do_bench for kernel timing measurement which is not always fair for all backends. E.g. it includes torch.compile Dynamo invocation overhead and hence doesn't reflect real-world model use case where Dynamo overhead is usually hidden.

I also opened a PR to use this timing measurement function on TritonBench side: meta-pytorch/tritonbench#333. But regardless of whether that PR can land, I think we should enhance Inductor benchmark_gpu to match do_bench features, to make it easier to people to migrate.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@yf225 yf225 requested review from eellison and BoyuanFeng August 18, 2025 22:13
Copy link

pytorch-bot bot commented Aug 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160921

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 75bc584 with merge base 82c7a1e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@yf225 yf225 force-pushed the benchmarker_compat_with_do_bench branch 2 times, most recently from de03957 to 93e10f3 Compare August 18, 2025 22:41
@@ -183,7 +183,7 @@ def L2_cache_size(self: Self) -> int:

def get_event_pairs(
self: Self, iters: int
) -> list[tuple[torch.cuda.Event, torch.cuda.Event]]:
) -> List[tuple[torch.cuda.Event, torch.cuda.Event]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use list instead of List

@yf225 yf225 added the topic: not user facing topic category label Aug 18, 2025
@yf225 yf225 force-pushed the benchmarker_compat_with_do_bench branch from 93e10f3 to d3d38c9 Compare August 18, 2025 22:48
@yf225 yf225 force-pushed the benchmarker_compat_with_do_bench branch from d3d38c9 to 75bc584 Compare August 18, 2025 23:20
@yf225
Copy link
Contributor Author

yf225 commented Aug 18, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 18, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

can-gaa-hou pushed a commit to can-gaa-hou/pytorch that referenced this pull request Aug 22, 2025
…ch#160921)

Common benchmark suites like TritonBench uses `triton.testing.do_bench` for kernel timing measurement which is not always fair for all backends. E.g. it includes torch.compile Dynamo invocation overhead and hence doesn't reflect real-world model use case where Dynamo overhead is usually hidden.

I also opened a PR to use this timing measurement function on TritonBench side: meta-pytorch/tritonbench#333. But regardless of whether that PR can land, I think we should enhance Inductor benchmark_gpu to match do_bench features, to make it easier to people to migrate.

Pull Request resolved: pytorch#160921
Approved by: https://github.com/BoyuanFeng
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants