Skip to content

ggml-wegpu: handle the buffer aliasing for rms fuse#22266

Merged
reeselevine merged 3 commits intoggml-org:masterfrom
noumena-labs:fix/rms-fuse
Apr 23, 2026
Merged

ggml-wegpu: handle the buffer aliasing for rms fuse#22266
reeselevine merged 3 commits intoggml-org:masterfrom
noumena-labs:fix/rms-fuse

Conversation

@Constannnnnt
Copy link
Copy Markdown
Contributor

@Constannnnnt Constannnnnt commented Apr 22, 2026

Overview

This PR addressed an edge case of #21983. I load and run a model in the browser, and I met this error:

ggml_webgpu: Device error! Reason: 2, Message: Writable storage buffer binding aliasing found between [BindGroup "RMS_NORM_MUL"] set at bind group index 0, binding index 0, and [BindGroup "RMS_NORM_MUL"] set at bind group index 0, binding index 2, with overlapping ranges (offset: 5242880, size: 4096) and (offset: 5242880, size: 4096) in [Buffer "tensor_buf3"].
While encoding [ComputePassEncoder (unlabeled)].DispatchWorkgroups(1, 1, 1).
While finishing [CommandEncoder (unlabeled)].

As the error showed, it was associated with the buffer overlapping. I used a coding agent to analyze the logs:

  • the previous inplace flag was only checking if mul_src and dst overlapped
  • it missed a scenario where the rn_src overlapped with dst, i.e. (rn_src==dst), leading to this tensor_buf3 issue.

I reused the convention from the 'binary' shader:

inplace means src0 == dst. => rn_src == dst
overlap means src1 == dst. => mul_src  == dst
src_overlap means src0 == src1.

Additional information

I didn't run any benchmark tests and only tested the model behaviour in the browser.

Requirements

@Constannnnnt Constannnnnt requested a review from a team as a code owner April 22, 2026 22:46
@Constannnnnt Constannnnnt changed the title fix(shader): handle the buffer aliasing for rms fuse ggml-wegpu: handle the buffer aliasing for rms fuse Apr 22, 2026
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning WebGPU labels Apr 22, 2026
@reeselevine
Copy link
Copy Markdown
Contributor

Thanks for the fix. I tested it out and it looks like even with skip_validation turned off, there is nothing in test-backend-ops that would have caught this right now. Potentially other tests would, we actually should turn off skip_validation, I originally added it for performance natively but I think the impact is minimal. FlashAttention needs to handle some aliasing first though, which would be a good target for #22199.

@yomaytk fyi, if you're working on other fusion chains it might be good to check to see what other kind of buffer aliasing can occur and add a test for it in test-backend-ops if possible.

@yomaytk
Copy link
Copy Markdown
Contributor

yomaytk commented Apr 23, 2026

Thanks for the fix, this looks good to me. How about adding a test case in test-backend-ops and confirming that it passes without skip_validation in this PR? If not, I'm happy to follow up with a separate PR.

@yomaytk fyi, if you're working on other fusion chains it might be good to check to see what other kind of buffer aliasing can occur and add a test for it in test-backend-ops if possible.

Got it, thanks.

@reeselevine reeselevine requested a review from CISC April 23, 2026 03:15
@reeselevine reeselevine merged commit e5f070a into ggml-org:master Apr 23, 2026
44 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning WebGPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants