-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Open
Labels
Description
Name and Version
version: 6496
model: gemma-3-4b-it-q4_0.gguf
mmproj: mmproj-model-f16-4B.gguf
command
{path-to-binary}/llama-server -m {path-to-model}/gemma-3-4b-it-q4_0.gguf --mmproj {path-to-model}/mmproj-model-f16-4B.gguf --jinja -c 16384 -np 4
When using 4 parallel text generations, everything works fine.
However, when one of the conversations tries to send an image for analysis, it causes the other text generations to stuck until the image decoding is complete.
Even ongoing text generations will immediately pause until the image decoding finishes.
Is there any parameter that can control this behavior?
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
{path-to-binary}/llama-server -m {path-to-model}/gemma-3-4b-it-q4_0.gguf --mmproj {path-to-model}/mmproj-model-f16-4B.gguf --jinja -c 16384 -np 4
Problem description & steps to reproduce
launch the server with -np 4 and gemma3 and mmproj
the use 4 browser instance try to chat with the LLM (one of the chat try to send an image)
First Bad Commit
No response