Hi Qwen3-VL bounding boxes and co-ordinates appear to be incorrect in both 4B (no co-ordinates at all) and 8B (poor localisation). This occurs even in the FP16 versions of these models so not quabtisation related.
I can see theat when the convert_hf_to_gguf.py is run the non vison layers of the vision tower are removedr - im not sure if this is the cause of the problem.
this does not occur in huggingface transformers even for the same base model quantised to 4 bits
The rtoblem is not isolated to python-api it occurs also in llama-mtmd-cli.exe
see also here.. JamePeng/llama-cpp-python#20