Update GGUF CUDA kernel code path with MMQ support by Isotr0py · Pull Request #12509 · huggingface/diffusers

Isotr0py · 2025-10-19T09:59:51Z

What does this PR do?

Enable MMQ code path with new GGUF CUDA kernel with MMA (standard quant) support.
Remove MMVQ code path because it's unlikely to be used in diffusion model without decoding stage.

Benchmark Results
GPU: RTX 3090, Model: Flux-1.0-dev + Q4_0 transformer

Native dequantize mulmat

28/28 [00:52<00:00,  1.86s/it]

Kernel dequantize mulmat

28/28 [00:47<00:00,  1.70s/it]

Kernel quantized mulmat (MMQ)

28/28 [00:43<00:00,  1.56s/it]

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

HuggingFaceDocBuilderDev · 2025-10-19T10:08:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

dxqb · 2025-10-29T09:33:59Z

Native dequantize mulmat
28/28 [00:52<00:00,  1.86s/it]

It might be worth benchmarking any custom kernel against native, but also native compiled by torch.compile, because torch.compile has a major impact on the dequantization code

github-actions · 2026-01-09T15:08:06Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Isotr0py added 3 commits October 19, 2025 17:37

update mma branch

5b6c988

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

remove mmvq

942252e

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

remove k-quant tempoarily

665aacd

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

ooops

6897c60

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

github-actions Bot added the stale Issues that haven't received updates label Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update GGUF CUDA kernel code path with MMQ support#12509

Update GGUF CUDA kernel code path with MMQ support#12509
Isotr0py wants to merge 4 commits intohuggingface:mainfrom
Isotr0py:gguf-kernel-update

Isotr0py commented Oct 19, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 19, 2025

Uh oh!

dxqb commented Oct 29, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Isotr0py commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 19, 2025

Uh oh!

dxqb commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Isotr0py commented Oct 19, 2025 •

edited

Loading

dxqb commented Oct 29, 2025 •

edited

Loading