Skip to content

Conversation

wbruna
Copy link
Contributor

@wbruna wbruna commented Oct 5, 2025

For #851 . Allow the model loading logic to tolerate missing layers, which is enough to run the 12B Pruning variant:

https://huggingface.co/OPPOer/Qwen-Image-Pruning

Tested with the Q4_K_M quant from https://huggingface.co/wsbagnsv1/Qwen-Image-Pruning-GGUF :

teste_1759693079

@wbruna
Copy link
Contributor Author

wbruna commented Oct 5, 2025

Quality seems a little worse than the Lightning model, with ~30% less peak VRAM usage, and similar speed gains.

wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 6, 2025
wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 9, 2025
wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 9, 2025
wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 10, 2025
LostRuins pushed a commit to LostRuins/koboldcpp that referenced this pull request Oct 10, 2025
@wbruna
Copy link
Contributor Author

wbruna commented Oct 10, 2025

@leejet , looks like a123e25 from the qwen_image_edit branch is enough to support the '13b' pruned model. Thanks!

The '12b' variant still doesn't work, though, maybe because it has non-contiguous layers. I guess they're keeping the non-pruned layers with the same number as they have on the original model.

@wbruna wbruna marked this pull request as draft October 10, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant