-
Notifications
You must be signed in to change notification settings - Fork 13.9k
vulkan: allow graph_optimize for prompt processing workloads #17475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've found one case where this change reduces performance, on this model on RTX 3090 with coopmat2. Other models show small improvements. Do you see this on your hardware? Master:
PR:
|
|
Yes, I get a couple percent slowdown on this model on 5090 (less on 4070): I ran with ENABLE_SYNC_LOGGING before/after, and the number of syncs has decreased by 11%. The topk_moe fusion is still working. It looks like the multi-add might be getting split up, and maybe that's responsible for the slowdown? |
|
I tried grouping the adds but it didn't help. I don't see anything obviously bad about the order it's generating. Maybe this model is just unlucky? |
|
Maybe. Not great, but good improvements for the rest, so it's okay. |
See #17276 (comment).
When I first implemented this, I saw slowdowns in prompt processing when reordering the graph. But I've retested and now see a slight improvement. Not sure what has changed, maybe more fusions helps (though not all are enabled when nrows>1).