Eval bug: Docling issues

### Name and Version

./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-alderlake.so
version: 6692 (ca71fb9b)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

 NVIDIA GeForce RTX 4090

### Models

ggml-org/granite-docling-258M-GGUF

### Problem description & steps to reproduce

First of all, it's fantastic to have docling accessible with llama.cpp! Thank you!

Unfortunately it's not really working for me. I know this release is just a few hours old, but i thought i should let you know about my experience. 
The f16 generates pure garbage ([[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[....] and q8 doesn't do much better either (e.g. "Body refers to nagging as the fall of the left trunk, the right trunk, and the middle trunk being clear." is what's able to come up with when processing https://huggingface.co/ibm-granite/granite-docling-258M/blob/main/assets/new_arxiv.png). 

Are there any specific settings i need to get it to work properly? Here are mine:
```
docker run  --gpus all -v /home/user/models_test/gguf:/models -p 5050:5050 local/llama.cpp:server-cuda -m /models/granite-docling-258M-f16.gguf --mmproj /models/mmproj-granite-docling-258M-f16.gguf --n-gpu-layers 999 -c 8192 --host 0.0.0.0 --port 5050 --temp 0.6 --top-p 0.9 --top-k 1000 --min-p 0.01
```



### First Bad Commit

_No response_

### Relevant log output

```shell
common_sampler_types_from_names: unable to match sampler by name 'edkypmxt'
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  0 | task 2 | processing task
slot update_slots: id  0 | task 2 | new prompt, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 1126
slot update_slots: id  0 | task 2 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 259, n_tokens = 259, progress = 0.230018
slot update_slots: id  0 | task 2 | n_past = 259, memory_seq_rm [259, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 252 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 325, n_tokens = 2, progress = 0.288632
slot update_slots: id  0 | task 2 | n_past = 325, memory_seq_rm [325, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 59 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 391, n_tokens = 2, progress = 0.347247
slot update_slots: id  0 | task 2 | n_past = 391, memory_seq_rm [391, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 458, n_tokens = 3, progress = 0.406750
slot update_slots: id  0 | task 2 | n_past = 458, memory_seq_rm [458, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 14 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 524, n_tokens = 2, progress = 0.465364
slot update_slots: id  0 | task 2 | n_past = 524, memory_seq_rm [524, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 590, n_tokens = 2, progress = 0.523979
slot update_slots: id  0 | task 2 | n_past = 590, memory_seq_rm [590, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 14 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 657, n_tokens = 3, progress = 0.583481
slot update_slots: id  0 | task 2 | n_past = 657, memory_seq_rm [657, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 15 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 723, n_tokens = 2, progress = 0.642096
slot update_slots: id  0 | task 2 | n_past = 723, memory_seq_rm [723, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 789, n_tokens = 2, progress = 0.700710
slot update_slots: id  0 | task 2 | n_past = 789, memory_seq_rm [789, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 12 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 856, n_tokens = 3, progress = 0.760213
slot update_slots: id  0 | task 2 | n_past = 856, memory_seq_rm [856, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 922, n_tokens = 2, progress = 0.818828
slot update_slots: id  0 | task 2 | n_past = 922, memory_seq_rm [922, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 988, n_tokens = 2, progress = 0.877442
slot update_slots: id  0 | task 2 | n_past = 988, memory_seq_rm [988, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 1055, n_tokens = 3, progress = 0.936945
slot update_slots: id  0 | task 2 | n_past = 1055, memory_seq_rm [1055, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 12 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 1126, n_tokens = 7, progress = 1.000000
slot update_slots: id  0 | task 2 | prompt done, n_past = 1126, n_tokens = 7
srv  log_server_r: request: GET /slots 172.17.0.1 200
slot process_toke: id  0 | task 2 | n_predict (-1) is set for infinite generation. Limiting generated tokens to n_ctx_train (8192) to avoid EOS-less generation infinite loop
slot      release: id  0 | task 2 | stop processing: n_past = 8191, truncated = 1
slot print_timing: id  0 | task 2 |
prompt eval time =     716.27 ms /  1126 tokens (    0.64 ms per token,  1572.04 tokens per second)
       eval time =   16400.78 ms /  7066 tokens (    2.32 ms per token,   430.83 tokens per second)
      total time =   17117.05 ms /  8192 tokens
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Docling issues #16435

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Docling issues #16435

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions