Eval bug: Granite-4.0-h-1B fails long context (>16k); same model in Transformers works, Granite-4.0-h-MicroGGUF works)`

### Name and Version

```
$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 7122 (21d31e081)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

$ llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 7122 (21d31e081)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
root@C.28352397:~/models$ 
```

### Operating systems

Linux

### GGML backends

CUDA
(does NOT happen on CPU)

### Hardware

RTX 5090

### Models

IBM Granite 4.0-h-1b, several quants (including BF16 GGUF)

### Problem description & steps to reproduce

When I infer at a 32k context (simple script generating a NIAH text), the model returns "???????" or "G".

The same script at 16k context works wine with the model.

The same script with the same llama.cpp and with Granite-4-h-micro (3.5B) works fine at 32k context. The same script with a CPU build of llama.cpp and Granite-4-h-1b works fine at 32k context too.

Similar content, inferring on Transformers with Granite-4-h-1b (original safetensors version) works.

### First Bad Commit

_No response_

### Relevant log output

```shell
(not sure what to paste - no error message is displayed)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Granite-4.0-h-1B fails long context (>16k); same model in Transformers works, Granite-4.0-h-MicroGGUF works)` #17610

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Granite-4.0-h-1B fails long context (>16k); same model in Transformers works, Granite-4.0-h-MicroGGUF works)` #17610

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions