Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

  • Move perf_logger from device to ctx.
  • Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed.
  • Add a fusion info string to the tracking, only log one item per fused op.
  • Fix MUL_MAT_ID flops calculation.

@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner December 2, 2025 01:50
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 2, 2025
@0cc4m
Copy link
Collaborator

0cc4m commented Dec 2, 2025

Please resolve the conflict.

@jeffbolznv
Copy link
Collaborator Author

Rebased.

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.
@0cc4m
Copy link
Collaborator

0cc4m commented Dec 6, 2025

There's another conflict, but more importantly I'm getting a segfault with Qwen3-Next-80B-A3B-Instruct-Q4_0:

Core was generated by `build_vk_debug/bin/llama-bench -m models/Qwen3-Next-80B-A3B-Instruct-Q4_0.gguf -fa 1 --mmap 0'.
Program terminated with signal SIGABRT, Aborted.

#0  0x00007f8f3c49890c in ?? () from /usr/lib/libc.so.6
#1  0x00007f8f3c43e3a0 in raise () from /usr/lib/libc.so.6
#2  0x00007f8f3c42557a in abort () from /usr/lib/libc.so.6
#3  0x00007f8f3c89a41f in std::__glibcxx_assert_fail (file=<optimized out>, line=<optimized out>, function=<optimized out>, condition=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/assert_fail.cc:41
#4  0x00007f8f4075dc96 in std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >::operator[] (this=0x55fdfca3d2d0, __n=3) at /usr/include/c++/15.2.1/bits/stl_vector.h:1263
#5  0x00007f8f3ce523cc in ggml_backend_vk_graph_compute (backend=0x55fdfca30890, cgraph=0x55fdfca378a8) at ggml/src/ggml-vulkan/ggml-vulkan.cpp:13173
#6  0x00007f8f4036bc25 in ggml_backend_graph_compute_async (backend=0x55fdfca30890, cgraph=0x55fdfca378a8) at ggml/src/ggml-backend.cpp:359
#7  0x00007f8f40370aa4 in ggml_backend_sched_compute_splits (sched=0x55fdfc98b840) at ggml/src/ggml-backend.cpp:1575
#8  0x00007f8f403718c3 in ggml_backend_sched_graph_compute_async (sched=0x55fdfc98b840, graph=0x7f8f28d70030) at ggml/src/ggml-backend.cpp:1784
#9  0x00007f8f4079ba93 in llama_context::graph_compute (this=0x55fdff05c900, gf=0x7f8f28d70030, batched=true) at src/llama-context.cpp:1488
#10 0x00007f8f407986e9 in llama_context::process_ubatch (this=0x55fdff05c900, ubatch=..., gtype=LLM_GRAPH_TYPE_DECODER, mctx=0x55fdfca2d350, ret=@0x7ffc0c26da58: GGML_STATUS_SUCCESS) at src/llama-context.cpp:809
#11 0x00007f8f40799bef in llama_context::decode (this=0x55fdff05c900, batch_inp=...) at src/llama-context.cpp:1113
#12 0x00007f8f407a091d in llama_decode (ctx=0x55fdff05c900, batch=...) at src/llama-context.cpp:2780
#13 0x000055fde84babcb in test_prompt (ctx=0x55fdff05c900, n_prompt=512, n_batch=2048, n_threads=16) at tools/llama-bench/llama-bench.cpp:1945
#14 0x000055fde84bb882 in main (argc=7, argv=0x7ffc0c26ea38) at tools/llama-bench/llama-bench.cpp:2125

@jeffbolznv
Copy link
Collaborator Author

I'm not able to reproduce the crash locally, and I don't see the merge conflict in the UI. The crash sounds a lot like what I fixed in e3f771b, but I can't correlate the line number in your call stack to anything. Are you sure you were on the latest commit?

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 6, 2025

You are right, I didn't notice that it failed updating the PR-branch and I tested only the first commit. My bad, sorry.

@0cc4m 0cc4m merged commit db97837 into ggml-org:master Dec 6, 2025
77 of 78 checks passed
JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants