Skip to content

Update GGUF CUDA kernel code path with MMQ support#12509

Draft
Isotr0py wants to merge 4 commits intohuggingface:mainfrom
Isotr0py:gguf-kernel-update
Draft

Update GGUF CUDA kernel code path with MMQ support#12509
Isotr0py wants to merge 4 commits intohuggingface:mainfrom
Isotr0py:gguf-kernel-update

Conversation

@Isotr0py
Copy link
Copy Markdown
Contributor

@Isotr0py Isotr0py commented Oct 19, 2025

What does this PR do?

  • Enable MMQ code path with new GGUF CUDA kernel with MMA (standard quant) support.
  • Remove MMVQ code path because it's unlikely to be used in diffusion model without decoding stage.

Benchmark Results
GPU: RTX 3090, Model: Flux-1.0-dev + Q4_0 transformer

Native dequantize mulmat

28/28 [00:52<00:00,  1.86s/it]

Kernel dequantize mulmat

28/28 [00:47<00:00,  1.70s/it]

Kernel quantized mulmat (MMQ)

28/28 [00:43<00:00,  1.56s/it]

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@dxqb
Copy link
Copy Markdown
Contributor

dxqb commented Oct 29, 2025

Native dequantize mulmat

28/28 [00:52<00:00,  1.86s/it]

It might be worth benchmarking any custom kernel against native, but also native compiled by torch.compile, because torch.compile has a major impact on the dequantization code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 9, 2026

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot added the stale Issues that haven't received updates label Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants