-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Closed
Description
Name and Version
llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
version: 6531 (da30ab5)
built with cc (GCC) 15.2.1 20250808 (Red Hat 15.2.1-1) for x86_64-redhat-linux
Operating systems
Fedora Linux 42
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server \
--alias "magistral-small-1.2-24b-2509:q5_k_xl" \
--model ~/AI/models/magistral-small-1.2-24b-2509/q5_k_xl.gguf \
--jinja \
--special \
--reasoning-format deepseek \
--verbose-prompt \
--flash-attn on \
--temp 0.7 \
--min-p 0.01 \
--top-p 0.95 \
--top-k -1 \
--context-shift \
--gpu-layers 99 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--ctx-size 73728
Problem description & steps to reproduce
When using Magistral Small 2509 from Unsloth, the web ui should render text between [THINK]
and [/THINK]
as reasoning and that's not the case currently.
First Bad Commit
No response
Relevant log output
Master-Pr0grammer and repsac-by