-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Description
Name and Version
./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Radeon PRO W7900, gfx1100 (0x1100), VMM: no, Wave Size: 32
Device 1: AMD Radeon PRO W7900, gfx1100 (0x1100), VMM: no, Wave Size: 32
version: 6250 (e92734d)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
HIP
Hardware
2x Radeon Pro W7900
Models
ggml-org/gpt-oss-120b-GGUF
Problem description & steps to reproduce
On longer prompts 10k+ tokens, GPT-OSS 120B seems to repeatedly print "Dissolution" or "oooooooooooooo...", I've attached examples of the outputs on random test chats that are longer tokens:





Launch command is:
./llama-server -m /home/ultimis/LLM/Models/ggml-org/gpt-oss-120b-GGUF/gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 131072 -ngl 999 -b 2048 -ub 2048 -fa --reasoning-format none --jinja --chat-template-kwargs '{"reasoning_effort":"high"}' -host 0.0.0.0 --port 8081
First Bad Commit
No response
Relevant log output
Raw outputs:
https://gist.github.com/AbdullahMPrograms/a7a3b96dc1713387fc93911704b2d483