Skip to content

Eval bug: Partial Gibberish on Long Prompts GPT-OSS 120B #15516

@AbdullahMPrograms

Description

@AbdullahMPrograms

Name and Version

./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Radeon PRO W7900, gfx1100 (0x1100), VMM: no, Wave Size: 32
Device 1: AMD Radeon PRO W7900, gfx1100 (0x1100), VMM: no, Wave Size: 32
version: 6250 (e92734d)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

HIP

Hardware

2x Radeon Pro W7900

Models

ggml-org/gpt-oss-120b-GGUF

Problem description & steps to reproduce

On longer prompts 10k+ tokens, GPT-OSS 120B seems to repeatedly print "Dissolution" or "oooooooooooooo...", I've attached examples of the outputs on random test chats that are longer tokens:

Image Image Image Image Image

Launch command is:
./llama-server -m /home/ultimis/LLM/Models/ggml-org/gpt-oss-120b-GGUF/gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 131072 -ngl 999 -b 2048 -ub 2048 -fa --reasoning-format none --jinja --chat-template-kwargs '{"reasoning_effort":"high"}' -host 0.0.0.0 --port 8081

First Bad Commit

No response

Relevant log output

Raw outputs:
https://gist.github.com/AbdullahMPrograms/a7a3b96dc1713387fc93911704b2d483

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions