Skip to content

server: thinking type rejected as invalid but used by Ministral 3 #17700

@isaac-mcfadyen

Description

@isaac-mcfadyen

EDIT: Apologies for marking this as a bug, probably should have marked as an enhancement - feel free to re-tag.

Name and Version

❯ ./llama.cpp/build/bin/llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 7236 (a2b0fe8)
built with cc (Gentoo 13.4.1_p20250807 p8) 13.4.1 20250807 for x86_64-pc-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server --port 8000 -fa on -c 8192 -ngl 99 -cram -1 -m /models/Ministral-3-14B-Reasoning-2512-UD-Q6_K_XL.gguf

Problem description & steps to reproduce

The new Ministral 3 reasoning models use a (I believe non-standard) content[].type of thinking which is used to pass the reasoning history traces back to the model:
https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512/blob/main/chat_template.jinja#L94

However, llama-server currently rejects this with an error:

srv    operator(): got exception: {"error":{"code":500,"message":"unsupported content[].type","type":"server_error"}}

Is this something that should/could be fixed on the llama-server side? Or is there a more standardized way of passing reasoning traces back in that e.g. we should encourage people converting the Magistral 3 GGUFs to add into the chat template?

First Bad Commit

N/A, not a regression

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions