-
Notifications
You must be signed in to change notification settings - Fork 13.6k
ggml webgpu: minor set rows optimization #16810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com>
tests/test-backend-ops.cpp
Outdated
| std::vector<ggml_tensor *> expert_views(n_expert_used); | ||
| for (int64_t i = 0; i < n_expert_used; ++i) { | ||
| expert_views[i] = ggml_view_2d(ctx, weighted, n_embd, n_tokens, weighted->nb[2], i * weighted->nb[1]); | ||
| expert_views[i] = ggml_view_2d(ctx, weighted, n_embd, n_tokens, weighted->nb[1], i * weighted->nb[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update was testing some changes to the addition kernels in response to the discussion in #16857. But, it looks like the CUDA CI is failing with this change too, so if it's confirmed that the nb[2] is correct here I'll need to do a little more debugging to understand why the WebGPU add op is failing as currently written. I'll mark this PR as a draft for now to avoid it accidentally being merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick update: Realized this due to the non-contiguity in the view tensors, which isn't supported yet in the kernels. I disabled support for non-contiguous tensors here and added a note so it can be added in the future.
This reverts commit ed710b3.
…or future support
* origin/master: (21 commits) vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-org#16919) examples(gguf): GGUF example outputs (ggml-org#17025) mtmd: allow QwenVL to process larger image by default (ggml-org#17020) server : do not default to multiple slots with speculative decoding (ggml-org#17017) mtmd: improve struct initialization (ggml-org#16981) docs: Clarify the endpoint that webui uses (ggml-org#17001) model : add openPangu-Embedded (ggml-org#16941) ggml webgpu: minor set rows optimization (ggml-org#16810) sync : ggml ggml : fix conv2d_dw SVE path (ggml/1380) CUDA: update ops.md (ggml-org#17005) opencl: update doc (ggml-org#17011) refactor: replace sprintf with snprintf for safer string handling in dump functions (ggml-org#16913) vulkan: remove the need for the dryrun (ggml-org#16826) server : do context shift only while generating (ggml-org#17000) readme : update hot topics (ggml-org#17002) ggml-cpu : bicubic interpolation (ggml-org#16891) ci : apply model label to models (ggml-org#16994) chore : fix models indent after refactor (ggml-org#16992) Fix garbled output with REPACK at high thread counts (ggml-org#16956) ...
SET_ROWSby having multiple threads work on each row, as well as vectorizationBetter matrix multiplication coming soon!