Eval bug: Diffusion Model (Dream 7B) Logits Not Matching to HF implementation

### Name and Version

❯ llama.cpp/build/bin/llama-cli --version
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.010 sec
ggml_metal_device_init: GPU name:   Apple M3 Ultra
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 498216.21 MB
version: 6955 (6105e855)
built with Apple clang version 17.0.0 (clang-1700.3.19.1) for arm64-apple-darwin25.0.0

### Operating systems

Mac

### GGML backends

Metal

### Hardware

Mac Studio M3 Ultra 512GB memory

### Models

Dream 7B Base

### Problem description & steps to reproduce

I am comparing the logits of the Dream 7B model implementation on HF v.s. llama.cpp on BF16, and there seems to be logit mismatches. For example, I pass in the exact same random tokens matching to both the HF and llama.cpp models. Here are examples of some input tokens I pass in with the mismatch errors:

I have checked Qwen3Moe-30B-A3B BF16 for logits mismatches and they are nearly identical.

Things I have checked
- GGUF files seem to be right
- Divergence across layers seems to be linear (will look more quantitatively)
- RMS Norm eps values are the same

I am doing this because I am trying to integrate another diffusion model, but I am getting logit mismatches there as well.

I am not sure if this is due to a backend issue with `build_attn_inp_no_cache();` 

Happy to share over my code to check logits too

@am17an 

### First Bad Commit

_No response_

### Relevant log output

```shell
--------------------------------
Tokens: 4
	Random tokens: tensor([124518,  75548,  87446, 101107])
	Mean Error: 0.12463831156492233
	Max Error: 0.5572605133056641
	Cosine Similarity: 0.9974273443222046
	Percentage Error: 85.55838012695312%
--------------------------------
Tokens: 5
	Random tokens: tensor([ 45758,  20906, 147022,  15555, 109180])
	Mean Error: 0.27511870861053467
	Max Error: 1.4357795715332031
	Cosine Similarity: 0.9923816919326782
	Percentage Error: 172.04852294921875%
--------------------------------
Tokens: 6
	Random tokens: tensor([ 75036,  32437,  87977,  94129, 116228,  30788])
	Mean Error: 0.15428200364112854
	Max Error: 0.6780109405517578
	Cosine Similarity: 0.9978612661361694
	Percentage Error: 6.23129415512085%
--------------------------------
Tokens: 7
	Random tokens: tensor([ 21585, 129721, 115606,  43139, 122473,  22167,  57813])
	Mean Error: 0.15736085176467896
	Max Error: 0.8004646301269531
	Cosine Similarity: 0.9974124431610107
	Percentage Error: 9.917763710021973%
--------------------------------
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Diffusion Model (Dream 7B) Logits Not Matching to HF implementation #17291

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Diffusion Model (Dream 7B) Logits Not Matching to HF implementation #17291

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions