Skip to content

Conversation

@reeselevine
Copy link
Collaborator

  • Better parallelization of SET_ROWS by having multiple threads work on each row, as well as vectorization
  • Adds more useful labels to buffers for debugging
  • Adds Dawn-specific toggles which disable some safety protections when running natively, for better performance

Better matrix multiplication coming soon!

reeselevine and others added 5 commits October 15, 2025 19:04
* updated optimization, fixed errors

* non vectorized version now dispatches one thread per element

* Simplify

* Change logic for set_rows pipelines

---------

Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan>
Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 28, 2025
@reeselevine reeselevine requested a review from slaren as a code owner November 1, 2025 20:55
@github-actions github-actions bot added the testing Everything test related label Nov 1, 2025
std::vector<ggml_tensor *> expert_views(n_expert_used);
for (int64_t i = 0; i < n_expert_used; ++i) {
expert_views[i] = ggml_view_2d(ctx, weighted, n_embd, n_tokens, weighted->nb[2], i * weighted->nb[1]);
expert_views[i] = ggml_view_2d(ctx, weighted, n_embd, n_tokens, weighted->nb[1], i * weighted->nb[1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update was testing some changes to the addition kernels in response to the discussion in #16857. But, it looks like the CUDA CI is failing with this change too, so if it's confirmed that the nb[2] is correct here I'll need to do a little more debugging to understand why the WebGPU add op is failing as currently written. I'll mark this PR as a draft for now to avoid it accidentally being merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick update: Realized this due to the non-contiguity in the view tensors, which isn't supported yet in the kernels. I disabled support for non-contiguous tensors here and added a note so it can be added in the future.

@reeselevine reeselevine marked this pull request as draft November 2, 2025 04:24
@reeselevine reeselevine marked this pull request as ready for review November 2, 2025 04:40
@reeselevine
Copy link
Collaborator Author

Just a quick ping here @slaren and/or @CISC, hopefully getting this merged will fix the WebGPU CI errors for now and allow some more PRs to be opened.

@CISC CISC merged commit 03ea041 into ggml-org:master Nov 5, 2025
71 of 72 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Nov 5, 2025
* origin/master: (21 commits)
vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-org#16919)
examples(gguf): GGUF example outputs (ggml-org#17025)
mtmd: allow QwenVL to process larger image by default (ggml-org#17020)
server : do not default to multiple slots with speculative decoding (ggml-org#17017)
mtmd: improve struct initialization (ggml-org#16981)
docs: Clarify the endpoint that webui uses (ggml-org#17001)
model : add openPangu-Embedded (ggml-org#16941)
ggml webgpu: minor set rows optimization (ggml-org#16810)
sync : ggml
ggml : fix conv2d_dw SVE path (ggml/1380)
CUDA: update ops.md (ggml-org#17005)
opencl: update doc (ggml-org#17011)
refactor: replace sprintf with snprintf for safer string handling in dump functions (ggml-org#16913)
vulkan: remove the need for the dryrun (ggml-org#16826)
server : do context shift only while generating (ggml-org#17000)
readme : update hot topics (ggml-org#17002)
ggml-cpu : bicubic interpolation (ggml-org#16891)
ci : apply model label to models (ggml-org#16994)
chore : fix models indent after refactor (ggml-org#16992)
Fix garbled output with REPACK at high thread counts (ggml-org#16956)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants