-
Notifications
You must be signed in to change notification settings - Fork 13.3k
CUDA: use fastdiv + ggml_cuda_mad for mmvf #16557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I can confirm a speedup, though a smaller one. Presumably it will depend on the model.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mul_mat_vec_f
should always pass float2
, half2
, or nv_bfloat162
to ggml_cuda_mad
and then let that function decide how to do the calculation. For example, on I think Hopper and Blackwell there are mixed-precision instructions that can be used (possibly in a future PR) and there definitely are such instructions on AMD GPUs (which are already supported).
23f2ccc
to
9d74b8f
Compare
7560a47
to
ec9a51c
Compare
ec9a51c
to
e1afe75
Compare
Sorry I am not able to fix the HIP builds |
For now keep the problematic code in |
6dce339
to
d6c71e9
Compare
ill take a look |
Would be appreciated, otherwise I would have tried to fix this myself. My preferred approach would be to merge this PR as-is and to fix the HIP issues in a follow-up PR. Is that fine with both of you? |
sure, yes |
I see speedups in my 3090, but not so much on a 4090. I suspect it due to better integer division hardware on newer cards, but I did not find any documentation to confirm.
on 3090:
on 4090: