Skip to content

Eval bug: Docling issues #16435

@mpetruc

Description

@mpetruc

Name and Version

./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-alderlake.so
version: 6692 (ca71fb9)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA GeForce RTX 4090

Models

ggml-org/granite-docling-258M-GGUF

Problem description & steps to reproduce

First of all, it's fantastic to have docling accessible with llama.cpp! Thank you!

Unfortunately it's not really working for me. I know this release is just a few hours old, but i thought i should let you know about my experience.
The f16 generates pure garbage ([[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[....] and q8 doesn't do much better either (e.g. "Body refers to nagging as the fall of the left trunk, the right trunk, and the middle trunk being clear." is what's able to come up with when processing https://huggingface.co/ibm-granite/granite-docling-258M/blob/main/assets/new_arxiv.png).

Are there any specific settings i need to get it to work properly? Here are mine:

docker run  --gpus all -v /home/user/models_test/gguf:/models -p 5050:5050 local/llama.cpp:server-cuda -m /models/granite-docling-258M-f16.gguf --mmproj /models/mmproj-granite-docling-258M-f16.gguf --n-gpu-layers 999 -c 8192 --host 0.0.0.0 --port 5050 --temp 0.6 --top-p 0.9 --top-k 1000 --min-p 0.01

First Bad Commit

No response

Relevant log output

common_sampler_types_from_names: unable to match sampler by name 'edkypmxt'
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  0 | task 2 | processing task
slot update_slots: id  0 | task 2 | new prompt, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 1126
slot update_slots: id  0 | task 2 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 259, n_tokens = 259, progress = 0.230018
slot update_slots: id  0 | task 2 | n_past = 259, memory_seq_rm [259, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 252 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 325, n_tokens = 2, progress = 0.288632
slot update_slots: id  0 | task 2 | n_past = 325, memory_seq_rm [325, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 59 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 391, n_tokens = 2, progress = 0.347247
slot update_slots: id  0 | task 2 | n_past = 391, memory_seq_rm [391, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 458, n_tokens = 3, progress = 0.406750
slot update_slots: id  0 | task 2 | n_past = 458, memory_seq_rm [458, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 14 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 524, n_tokens = 2, progress = 0.465364
slot update_slots: id  0 | task 2 | n_past = 524, memory_seq_rm [524, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 590, n_tokens = 2, progress = 0.523979
slot update_slots: id  0 | task 2 | n_past = 590, memory_seq_rm [590, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 14 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 657, n_tokens = 3, progress = 0.583481
slot update_slots: id  0 | task 2 | n_past = 657, memory_seq_rm [657, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 15 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 723, n_tokens = 2, progress = 0.642096
slot update_slots: id  0 | task 2 | n_past = 723, memory_seq_rm [723, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 789, n_tokens = 2, progress = 0.700710
slot update_slots: id  0 | task 2 | n_past = 789, memory_seq_rm [789, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 12 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 856, n_tokens = 3, progress = 0.760213
slot update_slots: id  0 | task 2 | n_past = 856, memory_seq_rm [856, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 922, n_tokens = 2, progress = 0.818828
slot update_slots: id  0 | task 2 | n_past = 922, memory_seq_rm [922, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 988, n_tokens = 2, progress = 0.877442
slot update_slots: id  0 | task 2 | n_past = 988, memory_seq_rm [988, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 13 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 1055, n_tokens = 3, progress = 0.936945
slot update_slots: id  0 | task 2 | n_past = 1055, memory_seq_rm [1055, end)
srv  process_chun: processing image...
srv  process_chun: image processed in 12 ms
slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 1126, n_tokens = 7, progress = 1.000000
slot update_slots: id  0 | task 2 | prompt done, n_past = 1126, n_tokens = 7
srv  log_server_r: request: GET /slots 172.17.0.1 200
slot process_toke: id  0 | task 2 | n_predict (-1) is set for infinite generation. Limiting generated tokens to n_ctx_train (8192) to avoid EOS-less generation infinite loop
slot      release: id  0 | task 2 | stop processing: n_past = 8191, truncated = 1
slot print_timing: id  0 | task 2 |
prompt eval time =     716.27 ms /  1126 tokens (    0.64 ms per token,  1572.04 tokens per second)
       eval time =   16400.78 ms /  7066 tokens (    2.32 ms per token,   430.83 tokens per second)
      total time =   17117.05 ms /  8192 tokens

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions