Qwen3-VL co-ordinate and bounding box errors (grounding errors)

Hi Qwen3-VL bounding boxes and co-ordinates appear to be incorrect in both 4B (no co-ordinates at all) and 8B (poor localisation). This occurs even in the FP16 versions of these models so not quabtisation related.

<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/f4f897c0-c362-4b3b-8cc2-b6ebea3303ce" />

I can see theat when the convert_hf_to_gguf.py is run the non vison layers of the vision tower are removedr - im not sure if this is the cause of the problem.

this does not occur in huggingface transformers even for the same base model quantised to 4 bits

<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/a057ffcd-1f81-469d-b493-1f17444912d2" />

The rtoblem is not isolated to python-api it occurs also in llama-mtmd-cli.exe

<img width="1062" height="877" alt="Image" src="https://github.com/user-attachments/assets/151d73b9-44d5-43fb-929d-621b6c05115f" />

see also here.. https://github.com/JamePeng/llama-cpp-python/issues/20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3-VL co-ordinate and bounding box errors (grounding errors) #17131

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-VL co-ordinate and bounding box errors (grounding errors) #17131

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions