Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Profiler] Provide a method to profile Triton XPU Kernel's accuracy execution time. #1066

Open
chengjunlu opened this issue May 8, 2024 · 2 comments · Fixed by #1136
Open
Assignees
Labels
enhancement New feature or request performance

Comments

@chengjunlu
Copy link
Contributor

There is no stand along profiler tools for Triton XPU now.

We used to use:

  1. the Torch legacy profiler with the IPEX extension. (This is going to be removed by IPEX)
  2. The new torch profiler with the Kineto extended by IPEX. (This depends on the Kineto and Torch)
  3. Use the synchronization wait on the host to measure the performance. (This is not accurate with host overheads.)

The Triton has a new component for profiling performance of the Triton kernel. It worth to support it for the Triton XPU.

@vlad-penkin vlad-penkin added enhancement New feature or request performance labels May 8, 2024
@tdeng5 tdeng5 changed the title [Profiler] Support Triton XPU kernel profiling thru the Proton [Profiler] Provide an accuracy method to profile Triton XPU Kernel's execution time. May 16, 2024
@tdeng5
Copy link

tdeng5 commented May 16, 2024

It is the highest priority for collecting accurate Triton performance data for the coming Triton Demo on Jun 25.

@tdeng5 tdeng5 changed the title [Profiler] Provide an accuracy method to profile Triton XPU Kernel's execution time. [Profiler] Provide a method to profile Triton XPU Kernel's accuracy execution time. May 16, 2024
@etiotto etiotto reopened this May 17, 2024
@etiotto
Copy link
Contributor

etiotto commented May 17, 2024

I have added post review comments to the PR that closed this issue, see #1136 (comment).

I am concerned the benchmarks use a different way than the do_bench Triton uses to compute timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
4 participants