Eval bug: Qwen3.5-35B-a3b full prompt re-processing with Claude Code

### Name and Version

version: 8179 (https://github.com/ggml-org/llama.cpp/commit/ecbcb7ea9d3303097519723b264a8b5f1e977028)

### Operating systems

Windows

### GGML backends

CUDA

### Hardware

5090 Mobile (24GB Vram), cuda 13.1.

### Models

unsloth's Qwen3.5-35B-A3B-UD-Q4_K_M.gguf

### Problem description & steps to reproduce

args that I use to launch llama.cpp:
-m C:\Users\anubh.lmstudio\models\lmstudio-community\Qwen3.5-35b-a3b-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_M.gguf -ngl 99 -t 23 --temp 0.6 --top-k 20 --top-p 0.95 --parallel 1 --mlock --swa-full -c 200000 -ctk q8_0 -ctv q8_0 -fa on --jinja --reasoning-budget 0 --host 0.0.0.0 -fit off


When using the model with Claude code, full prompt re-processing happens every time(as can be seen in the logs), although this behaviour is not observed with OpenCode.

Repo Steps:
1 Load the model
2 Configure Claude code to run it with the model locally.
3 go inside any codebase and delete any existing CLAUED.md file
4 run /init inside claude code.

Expected:
- Full prompt re-processing should not happen. i.e: "forcing full prompt re-processing due to lack of cache data" log.

Actual:
- Full prompt re-processing happens.


### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
STDERR: srv  params_from_: Chat format: peg-constructed

STDERR: slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.810 (> 0.100 thold), f_keep = 0.811

STDERR: slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 745 | processing task, is_child = 0
slot update_slots: id  0 | task 745 | new prompt, n_ctx_slot = 200192, n_keep = 0, task.n_tokens = 18673
slot update_slots: id  0 | task 745 | n_past = 15126, slot.prompt.tokens.size() = 18662, seq_id = 0, pos_min = 18661, n_swa = 1
slot update_slots: id  0 | task 745 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 745 | erased invalidated context checkpoint (pos_min = 18134, pos_max = 18134, n_tokens = 18135, n_swa = 1, size = 62.813 MiB)

STDERR: slot update_slots: id  0 | task 745 | n_tokens = 0, memory_seq_rm [0, end)
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Qwen3.5-35B-a3b full prompt re-processing with Claude Code #20003

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Qwen3.5-35B-a3b full prompt re-processing with Claude Code #20003

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions