-
Notifications
You must be signed in to change notification settings - Fork 13.7k
vulkan: change graph_compute to be async and enable get_tensor_async #17158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize.
a66edc0 to
924df57
Compare
0cc4m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any performance differences, but no issues either.
Commit 38eaf32 makes my intel DG1 vomit gibberish. Will you be able to have another look? Thanks. |
|
Are you sure it's not just #17106? |
|
Please file an issue for this and share more details. What were you running? Does test-backend-ops pass? |
Thanks. I filed an issue here #17302 |
This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor.
Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them.
The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue.
The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize.
See #17033 (comment).