Skip to content

Eval bug: Diffusion Model (Dream 7B) Logits Not Matching to HF implementation #17291

@wp4032

Description

@wp4032

Name and Version

❯ llama.cpp/build/bin/llama-cli --version
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.010 sec
ggml_metal_device_init: GPU name: Apple M3 Ultra
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 498216.21 MB
version: 6955 (6105e855)
built with Apple clang version 17.0.0 (clang-1700.3.19.1) for arm64-apple-darwin25.0.0

Operating systems

Mac

GGML backends

Metal

Hardware

Mac Studio M3 Ultra 512GB memory

Models

Dream 7B Base

Problem description & steps to reproduce

I am comparing the logits of the Dream 7B model implementation on HF v.s. llama.cpp on BF16, and there seems to be logit mismatches. For example, I pass in the exact same random tokens matching to both the HF and llama.cpp models. Here are examples of some input tokens I pass in with the mismatch errors:

I have checked Qwen3Moe-30B-A3B BF16 for logits mismatches and they are nearly identical.

Things I have checked

  • GGUF files seem to be right
  • Divergence across layers seems to be linear (will look more quantitatively)
  • RMS Norm eps values are the same

I am doing this because I am trying to integrate another diffusion model, but I am getting logit mismatches there as well.

I am not sure if this is due to a backend issue with build_attn_inp_no_cache();

Happy to share over my code to check logits too

@am17an

First Bad Commit

No response

Relevant log output

--------------------------------
Tokens: 4
	Random tokens: tensor([124518,  75548,  87446, 101107])
	Mean Error: 0.12463831156492233
	Max Error: 0.5572605133056641
	Cosine Similarity: 0.9974273443222046
	Percentage Error: 85.55838012695312%
--------------------------------
Tokens: 5
	Random tokens: tensor([ 45758,  20906, 147022,  15555, 109180])
	Mean Error: 0.27511870861053467
	Max Error: 1.4357795715332031
	Cosine Similarity: 0.9923816919326782
	Percentage Error: 172.04852294921875%
--------------------------------
Tokens: 6
	Random tokens: tensor([ 75036,  32437,  87977,  94129, 116228,  30788])
	Mean Error: 0.15428200364112854
	Max Error: 0.6780109405517578
	Cosine Similarity: 0.9978612661361694
	Percentage Error: 6.23129415512085%
--------------------------------
Tokens: 7
	Random tokens: tensor([ 21585, 129721, 115606,  43139, 122473,  22167,  57813])
	Mean Error: 0.15736085176467896
	Max Error: 0.8004646301269531
	Cosine Similarity: 0.9974124431610107
	Percentage Error: 9.917763710021973%
--------------------------------

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions