Skip to content

Eval bug: Granite-4.0-h-1B fails long context (>16k); same model in Transformers works, Granite-4.0-h-MicroGGUF works)` #17610

@mramendi

Description

@mramendi

Name and Version

$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 7122 (21d31e081)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

$ llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 7122 (21d31e081)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
root@C.28352397:~/models$ 

Operating systems

Linux

GGML backends

CUDA
(does NOT happen on CPU)

Hardware

RTX 5090

Models

IBM Granite 4.0-h-1b, several quants (including BF16 GGUF)

Problem description & steps to reproduce

When I infer at a 32k context (simple script generating a NIAH text), the model returns "???????" or "G".

The same script at 16k context works wine with the model.

The same script with the same llama.cpp and with Granite-4-h-micro (3.5B) works fine at 32k context. The same script with a CPU build of llama.cpp and Granite-4-h-1b works fine at 32k context too.

Similar content, inferring on Transformers with Granite-4-h-1b (original safetensors version) works.

First Bad Commit

No response

Relevant log output

(not sure what to paste - no error message is displayed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    CUDARelated to the CUDA backendbugSomething isn't workingmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions