-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Description
Name and Version
version: 8179 (ecbcb7e)
Operating systems
Windows
GGML backends
CUDA
Hardware
5090 Mobile (24GB Vram), cuda 13.1.
Models
unsloth's Qwen3.5-35B-A3B-UD-Q4_K_M.gguf
Problem description & steps to reproduce
args that I use to launch llama.cpp:
-m C:\Users\anubh.lmstudio\models\lmstudio-community\Qwen3.5-35b-a3b-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_M.gguf -ngl 99 -t 23 --temp 0.6 --top-k 20 --top-p 0.95 --parallel 1 --mlock --swa-full -c 200000 -ctk q8_0 -ctv q8_0 -fa on --jinja --reasoning-budget 0 --host 0.0.0.0 -fit off
When using the model with Claude code, full prompt re-processing happens every time(as can be seen in the logs), although this behaviour is not observed with OpenCode.
Repo Steps:
1 Load the model
2 Configure Claude code to run it with the model locally.
3 go inside any codebase and delete any existing CLAUED.md file
4 run /init inside claude code.
Expected:
- Full prompt re-processing should not happen. i.e: "forcing full prompt re-processing due to lack of cache data" log.
Actual:
- Full prompt re-processing happens.
First Bad Commit
No response
Relevant log output
Logs
STDERR: srv params_from_: Chat format: peg-constructed
STDERR: slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.810 (> 0.100 thold), f_keep = 0.811
STDERR: slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 745 | processing task, is_child = 0
slot update_slots: id 0 | task 745 | new prompt, n_ctx_slot = 200192, n_keep = 0, task.n_tokens = 18673
slot update_slots: id 0 | task 745 | n_past = 15126, slot.prompt.tokens.size() = 18662, seq_id = 0, pos_min = 18661, n_swa = 1
slot update_slots: id 0 | task 745 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 745 | erased invalidated context checkpoint (pos_min = 18134, pos_max = 18134, n_tokens = 18135, n_swa = 1, size = 62.813 MiB)
STDERR: slot update_slots: id 0 | task 745 | n_tokens = 0, memory_seq_rm [0, end)