-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Description
Name and Version
❯ llama.cpp/build/bin/llama-cli --version
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.010 sec
ggml_metal_device_init: GPU name: Apple M3 Ultra
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 498216.21 MB
version: 6955 (6105e855)
built with Apple clang version 17.0.0 (clang-1700.3.19.1) for arm64-apple-darwin25.0.0
Operating systems
Mac
GGML backends
Metal
Hardware
Mac Studio M3 Ultra 512GB memory
Models
Dream 7B Base
Problem description & steps to reproduce
I am comparing the logits of the Dream 7B model implementation on HF v.s. llama.cpp on BF16, and there seems to be logit mismatches. For example, I pass in the exact same random tokens matching to both the HF and llama.cpp models. Here are examples of some input tokens I pass in with the mismatch errors:
I have checked Qwen3Moe-30B-A3B BF16 for logits mismatches and they are nearly identical.
Things I have checked
- GGUF files seem to be right
- Divergence across layers seems to be linear (will look more quantitatively)
- RMS Norm eps values are the same
I am doing this because I am trying to integrate another diffusion model, but I am getting logit mismatches there as well.
I am not sure if this is due to a backend issue with build_attn_inp_no_cache();
Happy to share over my code to check logits too
First Bad Commit
No response
Relevant log output
--------------------------------
Tokens: 4
Random tokens: tensor([124518, 75548, 87446, 101107])
Mean Error: 0.12463831156492233
Max Error: 0.5572605133056641
Cosine Similarity: 0.9974273443222046
Percentage Error: 85.55838012695312%
--------------------------------
Tokens: 5
Random tokens: tensor([ 45758, 20906, 147022, 15555, 109180])
Mean Error: 0.27511870861053467
Max Error: 1.4357795715332031
Cosine Similarity: 0.9923816919326782
Percentage Error: 172.04852294921875%
--------------------------------
Tokens: 6
Random tokens: tensor([ 75036, 32437, 87977, 94129, 116228, 30788])
Mean Error: 0.15428200364112854
Max Error: 0.6780109405517578
Cosine Similarity: 0.9978612661361694
Percentage Error: 6.23129415512085%
--------------------------------
Tokens: 7
Random tokens: tensor([ 21585, 129721, 115606, 43139, 122473, 22167, 57813])
Mean Error: 0.15736085176467896
Max Error: 0.8004646301269531
Cosine Similarity: 0.9974124431610107
Percentage Error: 9.917763710021973%
--------------------------------