-
Notifications
You must be signed in to change notification settings - Fork 14k
vulkan: perf_logger improvements #17672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jeffbolznv
commented
Dec 2, 2025
- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.
|
Please resolve the conflict. |
9e24ded to
faf2b4b
Compare
|
Rebased. |
- Move perf_logger from device to ctx. - Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed. - Add a fusion info string to the tracking, only log one item per fused op. - Fix MUL_MAT_ID flops calculation.
faf2b4b to
e3f771b
Compare
|
There's another conflict, but more importantly I'm getting a segfault with Qwen3-Next-80B-A3B-Instruct-Q4_0: |
|
I'm not able to reproduce the crash locally, and I don't see the merge conflict in the UI. The crash sounds a lot like what I fixed in e3f771b, but I can't correlate the line number in your call stack to anything. Are you sure you were on the latest commit? |
|
You are right, I didn't notice that it failed updating the PR-branch and I tested only the first commit. My bad, sorry. |
* vulkan: perf_logger improvements - Move perf_logger from device to ctx. - Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed. - Add a fusion info string to the tracking, only log one item per fused op. - Fix MUL_MAT_ID flops calculation. * fix vector sizes