Skip to content

Misc. bug: parallel in server text generate stuck when a image is decoding/encoding #16046

@sableangle

Description

@sableangle

Name and Version

version: 6496
model: gemma-3-4b-it-q4_0.gguf
mmproj: mmproj-model-f16-4B.gguf

command

{path-to-binary}/llama-server -m {path-to-model}/gemma-3-4b-it-q4_0.gguf --mmproj {path-to-model}/mmproj-model-f16-4B.gguf --jinja -c 16384 -np 4

When using 4 parallel text generations, everything works fine.

However, when one of the conversations tries to send an image for analysis, it causes the other text generations to stuck until the image decoding is complete.

Even ongoing text generations will immediately pause until the image decoding finishes.
Is there any parameter that can control this behavior?

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

{path-to-binary}/llama-server -m {path-to-model}/gemma-3-4b-it-q4_0.gguf --mmproj {path-to-model}/mmproj-model-f16-4B.gguf --jinja -c 16384 -np 4

Problem description & steps to reproduce

launch the server with -np 4 and gemma3 and mmproj
the use 4 browser instance try to chat with the LLM (one of the chat try to send an image)

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions