ggml-webgpu: add Q1_0 support by SharmaRithik · Pull Request #22374 · ggml-org/llama.cpp

SharmaRithik · 2026-04-25T23:02:32Z

Overview

Adds WebGPU support for the Q1_0 quantization type, including a fast mat-vec kernel (MUL_ACC_Q1_0 in mul_mat_vec.wgsl), a fast mat-mat block (INIT_SRC0_SHMEM_Q1_0 in mul_mat_decls.tmpl) that enables both the register-tile and subgroup-matrix paths, and a GET_ROWS dequant (Q1_0 block in get_rows.wgsl), along with the dispatcher and supports_op updates for MUL_MAT and MUL_MAT_ID.

Additional information

Q1_0 was previously not supported on the WebGPU backend, so both mat-vec and mat-mat dispatched to the CPU fallback. With this PR the kernels run on WebGPU.

Numbers below are from llama-bench -m Bonsai-1.7B-Q1_0.gguf -p 512 -n 128 -r 3 -ngl 99 on Intel Arc B580 (Mesa 25.2.8, Dawn 4654ba883e), using the model from prism-ml/Bonsai-1.7B-gguf.

test	master (tok/s)	this branch (tok/s)
pp512 (prefill)	137.44 ± 0.25	2775.24 ± 11.56
tg128 (decode)	12.59 ± 0.14	59.96 ± 0.50

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: no

reeselevine

only minor change is we shouldn't need to initialize shared memory, otherwise looks good!

reeselevine · 2026-04-27T15:28:54Z

+
+        if (global_m >= params.m) {
+            for (var bit = 0u; bit < NQ; bit++) {
+                shmem[i + bit] = f16(0.0);


we actually don't need shared memory initialization to 0 because WebGPU guarantees it will be.

Thanks, added a fix.

reeselevine · 2026-04-27T15:33:00Z

+            for (var bit = 0u; bit < NQ; bit++) {
+                shmem[i + bit] = f16(0.0);
+            }
+            continue;


break instead of continue?

Thanks, added a fix. The continue was treating each iteration as if it might come back in-bounds, but global_m only goes up, so once it passes params.m the rest are OOB too. break works better here and exits early.

SharmaRithik · 2026-04-27T18:11:14Z

Thanks Reese for the feedback! I have made the required changes.

SharmaRithik requested a review from a team as a code owner April 25, 2026 23:02

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning WebGPU labels Apr 25, 2026

add fast matmul matvec q1_0 kernel

0c4b40e

SharmaRithik force-pushed the webgpu-q1_0-support branch from a355539 to 0c4b40e Compare April 26, 2026 04:13

CISC approved these changes Apr 26, 2026

View reviewed changes

Aflah012 approved these changes Apr 26, 2026

View reviewed changes

reeselevine reviewed Apr 27, 2026

View reviewed changes

ggml-webgpu: drop redundant zero-fills in Q1_0 shmem init

18e1122

reeselevine approved these changes Apr 27, 2026

View reviewed changes

reeselevine merged commit 434b2a1 into ggml-org:master Apr 27, 2026
45 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-webgpu: add Q1_0 support#22374

ggml-webgpu: add Q1_0 support#22374
reeselevine merged 2 commits intoggml-org:masterfrom
SharmaRithik:webgpu-q1_0-support

SharmaRithik commented Apr 25, 2026 •

edited

Loading

Uh oh!

reeselevine left a comment

Uh oh!

reeselevine Apr 27, 2026

Uh oh!

SharmaRithik Apr 27, 2026

Uh oh!

reeselevine Apr 27, 2026

Uh oh!

SharmaRithik Apr 27, 2026

Uh oh!

SharmaRithik commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SharmaRithik commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

reeselevine left a comment

Choose a reason for hiding this comment

Uh oh!

reeselevine Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

SharmaRithik Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

SharmaRithik Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

SharmaRithik commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SharmaRithik commented Apr 25, 2026 •

edited

Loading