Skip to content

Qwen3-VL co-ordinate and bounding box errors (grounding errors) #17131

@sujitvasanth

Description

@sujitvasanth

Hi Qwen3-VL bounding boxes and co-ordinates appear to be incorrect in both 4B (no co-ordinates at all) and 8B (poor localisation). This occurs even in the FP16 versions of these models so not quabtisation related.

Image

I can see theat when the convert_hf_to_gguf.py is run the non vison layers of the vision tower are removedr - im not sure if this is the cause of the problem.

this does not occur in huggingface transformers even for the same base model quantised to 4 bits

Image

The rtoblem is not isolated to python-api it occurs also in llama-mtmd-cli.exe

Image

see also here.. JamePeng/llama-cpp-python#20

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions