-
Couldn't load subscription status.
- Fork 13.5k
vulkan: Update topk_moe fusion to handle gpt's late softmax #16656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CC @am17an I've included the ggml_check_edges change in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what this change is doing, but how do I test it? The topk_moe tests pass before and after this change. Which model architectures correspond to the three modes?
Usually I put a debug statement printing the number of nodes fused. We'll need to come up with a better way to assert that the nodes were actually fused |
I've added some logging in the latest commit that I use to verify fusion and the effects of graph_optimize. You can see the whole sequence of ops without a sync in between, which implies the fusion is working. Early softmax w/norm: qwen3 |
|
I've rebased this and updated it to handle the clamp added in #16655. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Are the non-Vulkan changes fine @slaren ? |
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Based on #16649.