Skip to content

Conversation

@gabe-l-hart
Copy link
Collaborator

Description

This PR is extracted from #16982 since it's an isolated change that's not strictly related to implementing the SSD algorithm.

The changes in this PR add some extra output to the llama-gguf tool to show each tensor's type and element count.

Example Output

...
gguf_ex_read_1: tensor[0]: name = token_embd.weight, size = 513802240, offset = 0, type = bf16, n_elts = 256901120
gguf_ex_read_1: tensor[1]: name = blk.0.attn_norm.weight, size = 10240, offset = 513802240, type = f32, n_elts = 2560
gguf_ex_read_1: tensor[2]: name = blk.0.ffn_norm.weight, size = 10240, offset = 513812480, type = f32, n_elts = 2560
gguf_ex_read_1: tensor[3]: name = blk.0.attn_k.weight, size = 2621440, offset = 513822720, type = bf16, n_elts = 1310720
gguf_ex_read_1: tensor[4]: name = blk.0.attn_output.weight, size = 13107200, offset = 516444160, type = bf16, n_elts = 6553600
gguf_ex_read_1: tensor[5]: name = blk.0.attn_q.weight, size = 13107200, offset = 529551360, type = bf16, n_elts = 6553600
gguf_ex_read_1: tensor[6]: name = blk.0.attn_v.weight, size = 2621440, offset = 542658560, type = bf16, n_elts = 1310720
gguf_ex_read_1: tensor[7]: name = blk.0.ffn_gate.weight, size = 41943040, offset = 545280000, type = bf16, n_elts = 20971520
...

Branch: Mamba2Perf

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Comment on lines 187 to 190
const auto type = gguf_get_tensor_type(ctx, i);
const char * type_name = ggml_type_name(type);
const size_t type_size = ggml_type_size(type);
const size_t n_elements = size / type_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valign:

Suggested change
const auto type = gguf_get_tensor_type(ctx, i);
const char * type_name = ggml_type_name(type);
const size_t type_size = ggml_type_size(type);
const size_t n_elements = size / type_size;
const auto type = gguf_get_tensor_type (ctx, i);
const char * type_name = ggml_type_name(type);
const size_t type_size = ggml_type_size(type);
const size_t n_elements = size / type_size;

Branch: GGUFToolOutputs

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@ggerganov ggerganov merged commit 5886f4f into ggml-org:master Nov 5, 2025
8 checks passed
@gabe-l-hart gabe-l-hart deleted the GGUFToolOutputs branch November 5, 2025 17:58
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Nov 5, 2025
* origin/master: (21 commits)
vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-org#16919)
examples(gguf): GGUF example outputs (ggml-org#17025)
mtmd: allow QwenVL to process larger image by default (ggml-org#17020)
server : do not default to multiple slots with speculative decoding (ggml-org#17017)
mtmd: improve struct initialization (ggml-org#16981)
docs: Clarify the endpoint that webui uses (ggml-org#17001)
model : add openPangu-Embedded (ggml-org#16941)
ggml webgpu: minor set rows optimization (ggml-org#16810)
sync : ggml
ggml : fix conv2d_dw SVE path (ggml/1380)
CUDA: update ops.md (ggml-org#17005)
opencl: update doc (ggml-org#17011)
refactor: replace sprintf with snprintf for safer string handling in dump functions (ggml-org#16913)
vulkan: remove the need for the dryrun (ggml-org#16826)
server : do context shift only while generating (ggml-org#17000)
readme : update hot topics (ggml-org#17002)
ggml-cpu : bicubic interpolation (ggml-org#16891)
ci : apply model label to models (ggml-org#16994)
chore : fix models indent after refactor (ggml-org#16992)
Fix garbled output with REPACK at high thread counts (ggml-org#16956)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants