Eval bug: Thinking mode broken with Qwen3-30B-A3B-Thinking in new webui

### Name and Version

version: 7124 (f1ffbba68)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu


### Operating systems

Linux

### GGML backends

CPU

### Hardware

CPU only / 36 GB Ram, CPU i5-8250U

### Models

Qwen_Qwen3-30B-A3B-Thinking-2507-Q6_K

### Problem description & steps to reproduce

llama-server -m GPT4ALL/Qwen_Qwen3-30B-A3B-Thinking-2507-Q6_K.gguf -c 32768 --reasoning-format none --host 192.168.2.218 --port 16060

When I interrupt the model while thinking, and enter a new user prompt, the tokens generated in previous thinking are discarded, the model does not remember what it thought before. No matter what I set as ```--reasoning-format```. Neither ```none``` nor ```deepseek``` nor ```legacy-deepseek``` works. It can also be seen from the context size in the generation statistics. When the new thought process starts the context size drops below the level after previous thinking.

It worked perfectly this summer with: 
build: 5985 (3f4fc97f) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
and this command line:
llama-server -m GPT4ALL/Qwen_Qwen3-30B-A3B-Thinking-2507-Q6_K.gguf -c 32768 --host 192.168.2.218 --port 6061 
and in webui "Exclude thought process when sending requests to API" setting OFF


### First Bad Commit

_No response_

### Relevant log output

```shell
main: model loaded
main: server is listening on http://192.168.2.218:16060
main: starting the main loop...
srv  update_slots: all slots are idle
srv  log_server_r: request: GET /props 192.168.2.218 200
srv  log_server_r: request: GET /props 192.168.2.218 200
srv  log_server_r: request: GET /props 192.168.2.218 200
srv  log_server_r: request: GET /props 192.168.2.218 200
srv  params_from_: Chat format: Content-only
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
slot launch_slot_: id  3 | task 0 | processing task
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 16
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 16, batch.n_tokens = 16, progress = 1.000000
slot update_slots: id  3 | task 0 | prompt done, n_tokens = 16, batch.n_tokens = 16
srv  log_server_r: request: GET /props 192.168.2.218 200
srv  log_server_r: request: GET /slots 192.168.2.218 200
srv  log_server_r: request: GET /slots 192.168.2.218 200
srv  log_server_r: request: POST /v1/chat/completions 192.168.2.218 200
srv          stop: cancel task, id_task = 0
slot      release: id  3 | task 0 | stop processing: n_tokens = 103, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Content-only
slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.400 (> 0.100 thold), f_keep = 0.155
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 103, total state size = 9.659 MiB
srv          load:  - looking for better prompt, base f_keep = 0.155, sim = 0.400
srv        update:  - cache state: 1 prompts, 9.659 MiB (limits: 8192.000 MiB, 32768 tokens, 87360 est)
srv        update:    - prompt 0x557356dcf790:     103 tokens, checkpoints:  0,     9.659 MiB
srv  get_availabl: prompt cache update took 8.09 ms
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
slot launch_slot_: id  3 | task 92 | processing task
slot update_slots: id  3 | task 92 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 40
slot update_slots: id  3 | task 92 | n_tokens = 16, memory_seq_rm [16, end)
slot update_slots: id  3 | task 92 | prompt processing progress, n_tokens = 40, batch.n_tokens = 24, progress = 1.000000
slot update_slots: id  3 | task 92 | prompt done, n_tokens = 40, batch.n_tokens = 24
srv  log_server_r: request: GET /props 192.168.2.218 200
srv  log_server_r: request: GET /slots 192.168.2.218 200
srv  log_server_r: request: GET /slots 192.168.2.218 200
srv  log_server_r: request: POST /v1/chat/completions 192.168.2.218 200
srv          stop: cancel task, id_task = 92
slot      release: id  3 | task 92 | stop processing: n_tokens = 91, truncated = 0
srv  update_slots: all slots are idle

{
  "conv": {
    "id": "05c34f99-b6f1-47b4-bff3-a9f0009fd6f9",
    "name": "what is 3 + 3?",
    "lastModified": 1763734870803,
    "currNode": "71c251c7-e00e-4e3d-8a13-46833e6e946a"
  },
  "messages": [
    {
      "convId": "05c34f99-b6f1-47b4-bff3-a9f0009fd6f9",
      "role": "user",
      "content": "what is 3 + 3?",
      "type": "text",
      "timestamp": 1763734870909,
      "thinking": "",
      "toolCalls": "",
      "children": [],
      "extra": [],
      "id": "515e601e-561f-4fce-a86e-e2366ad19ede",
      "parent": "7b514752-3ecc-4bf2-b9fb-f9a56b47ec19"
    },
    {
      "convId": "05c34f99-b6f1-47b4-bff3-a9f0009fd6f9",
      "type": "text",
      "role": "assistant",
      "content": "",
      "timestamp": 1763734870942,
      "thinking": "Okay, the user asked, \"what is 3 + 3?\" That seems really straightforward. Let me think... 3 plus 3 equals 6. It's basic arithmetic. \n\nHmm, why would someone ask this? Maybe they're testing if I can handle simple math. Or perhaps they're a young child just learning addition. Could also be someone checking if I'm functioning properly. \n\nI should keep it",
      "toolCalls": "",
      "children": [],
      "model": "Qwen_Qwen3-30B-A3B-Thinking-2507-Q6_K.gguf",
      "id": "26a674f0-27e1-44b3-9b90-9ebdb88fe552",
      "parent": "515e601e-561f-4fce-a86e-e2366ad19ede"
    },
    {
      "convId": "05c34f99-b6f1-47b4-bff3-a9f0009fd6f9",
      "role": "user",
      "content": "No I do not want to test you. Just provide the answer.",
      "type": "text",
      "timestamp": 1763734904570,
      "thinking": "",
      "toolCalls": "",
      "children": [],
      "extra": [],
      "id": "71c251c7-e00e-4e3d-8a13-46833e6e946a",
      "parent": "26a674f0-27e1-44b3-9b90-9ebdb88fe552"
    },
    {
      "convId": "05c34f99-b6f1-47b4-bff3-a9f0009fd6f9",
      "type": "text",
      "role": "assistant",
      "content": "",
      "timestamp": 1763734904662,
      "thinking": "Okay, the user is asking for a straightforward answer to \"what is 3 + 3?\" but with a note that they don't want to test me. Hmm, they seem a bit impatient or maybe frustrated, given the \"No",
      "toolCalls": "",
      "children": [],
      "model": "Qwen_Qwen3-30B-A3B-Thinking-2507-Q6_K.gguf",
      "id": "82bc0b9e-6121-4ee0-ac1d-1d71c3852299",
      "parent": "71c251c7-e00e-4e3d-8a13-46833e6e946a"
    }
  ]
}

After the first turn we have 103 tokens, with the next turn it starts at 16 and ends with 91 tokens. So the previous thinking is not taken into account.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Thinking mode broken with Qwen3-30B-A3B-Thinking in new webui #17430

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Thinking mode broken with Qwen3-30B-A3B-Thinking in new webui #17430

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions