vulkan: perf_logger improvements #17672

jeffbolznv · 2025-12-02T01:50:17Z

Move perf_logger from device to ctx.
Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed.
Add a fusion info string to the tracking, only log one item per fused op.
Fix MUL_MAT_ID flops calculation.

0cc4m · 2025-12-02T18:25:59Z

Please resolve the conflict.

jeffbolznv · 2025-12-02T18:40:37Z

Rebased.

- Move perf_logger from device to ctx. - Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed. - Add a fusion info string to the tracking, only log one item per fused op. - Fix MUL_MAT_ID flops calculation.

0cc4m · 2025-12-06T07:23:54Z

There's another conflict, but more importantly I'm getting a segfault with Qwen3-Next-80B-A3B-Instruct-Q4_0:

Core was generated by `build_vk_debug/bin/llama-bench -m models/Qwen3-Next-80B-A3B-Instruct-Q4_0.gguf -fa 1 --mmap 0'.
Program terminated with signal SIGABRT, Aborted.

#0  0x00007f8f3c49890c in ?? () from /usr/lib/libc.so.6
#1  0x00007f8f3c43e3a0 in raise () from /usr/lib/libc.so.6
#2  0x00007f8f3c42557a in abort () from /usr/lib/libc.so.6
#3  0x00007f8f3c89a41f in std::__glibcxx_assert_fail (file=<optimized out>, line=<optimized out>, function=<optimized out>, condition=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/assert_fail.cc:41
#4  0x00007f8f4075dc96 in std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >::operator[] (this=0x55fdfca3d2d0, __n=3) at /usr/include/c++/15.2.1/bits/stl_vector.h:1263
#5  0x00007f8f3ce523cc in ggml_backend_vk_graph_compute (backend=0x55fdfca30890, cgraph=0x55fdfca378a8) at ggml/src/ggml-vulkan/ggml-vulkan.cpp:13173
#6  0x00007f8f4036bc25 in ggml_backend_graph_compute_async (backend=0x55fdfca30890, cgraph=0x55fdfca378a8) at ggml/src/ggml-backend.cpp:359
#7  0x00007f8f40370aa4 in ggml_backend_sched_compute_splits (sched=0x55fdfc98b840) at ggml/src/ggml-backend.cpp:1575
#8  0x00007f8f403718c3 in ggml_backend_sched_graph_compute_async (sched=0x55fdfc98b840, graph=0x7f8f28d70030) at ggml/src/ggml-backend.cpp:1784
#9  0x00007f8f4079ba93 in llama_context::graph_compute (this=0x55fdff05c900, gf=0x7f8f28d70030, batched=true) at src/llama-context.cpp:1488
#10 0x00007f8f407986e9 in llama_context::process_ubatch (this=0x55fdff05c900, ubatch=..., gtype=LLM_GRAPH_TYPE_DECODER, mctx=0x55fdfca2d350, ret=@0x7ffc0c26da58: GGML_STATUS_SUCCESS) at src/llama-context.cpp:809
#11 0x00007f8f40799bef in llama_context::decode (this=0x55fdff05c900, batch_inp=...) at src/llama-context.cpp:1113
#12 0x00007f8f407a091d in llama_decode (ctx=0x55fdff05c900, batch=...) at src/llama-context.cpp:2780
#13 0x000055fde84babcb in test_prompt (ctx=0x55fdff05c900, n_prompt=512, n_batch=2048, n_threads=16) at tools/llama-bench/llama-bench.cpp:1945
#14 0x000055fde84bb882 in main (argc=7, argv=0x7ffc0c26ea38) at tools/llama-bench/llama-bench.cpp:2125

jeffbolznv · 2025-12-06T15:52:57Z

I'm not able to reproduce the crash locally, and I don't see the merge conflict in the UI. The crash sounds a lot like what I fixed in e3f771b, but I can't correlate the line number in your call stack to anything. Are you sure you were on the latest commit?

0cc4m · 2025-12-06T17:45:24Z

You are right, I didn't notice that it failed updating the PR-branch and I tested only the first commit. My bad, sorry.

* vulkan: perf_logger improvements - Move perf_logger from device to ctx. - Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed. - Add a fusion info string to the tracking, only log one item per fused op. - Fix MUL_MAT_ID flops calculation. * fix vector sizes

jeffbolznv requested a review from 0cc4m as a code owner December 2, 2025 01:50

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 2, 2025

loci-dev mentioned this pull request Dec 2, 2025

UPSTREAM PR #17672: vulkan: perf_logger improvements auroralabs-loci/llama.cpp#398

Open

jeffbolznv force-pushed the perf_logger branch from 9e24ded to faf2b4b Compare December 2, 2025 18:40

jeffbolznv added 2 commits December 4, 2025 09:49

fix vector sizes

e3f771b

jeffbolznv force-pushed the perf_logger branch from faf2b4b to e3f771b Compare December 4, 2025 16:03

0cc4m approved these changes Dec 6, 2025

View reviewed changes

0cc4m merged commit db97837 into ggml-org:master Dec 6, 2025
77 of 78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: perf_logger improvements #17672

vulkan: perf_logger improvements #17672

jeffbolznv commented Dec 2, 2025

Uh oh!

0cc4m commented Dec 2, 2025

Uh oh!

jeffbolznv commented Dec 2, 2025

Uh oh!

0cc4m commented Dec 6, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented Dec 6, 2025

Uh oh!

0cc4m commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vulkan: perf_logger improvements #17672

vulkan: perf_logger improvements #17672

Conversation

jeffbolznv commented Dec 2, 2025

Uh oh!

0cc4m commented Dec 2, 2025

Uh oh!

jeffbolznv commented Dec 2, 2025

Uh oh!

0cc4m commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Dec 6, 2025

Uh oh!

0cc4m commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0cc4m commented Dec 6, 2025 •

edited

Loading