-
Notifications
You must be signed in to change notification settings - Fork 13.9k
ggml : use svcntb() for SVE vector length detection #17474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : use svcntb() for SVE vector length detection #17474
Conversation
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
I dont know why this code exists in its current form, I guess it is mostly historical/legacy. |
|
I did some git archaeology this morning to understand why we use Anyway, I reopen the PR so we can at least discuss it and move things forward |
|
@ggerganov do you have any insights on why the code is currently in this state? I'm having trouble understanding how we got there, and right now llama.cpp just doesn’t build on many OS. |
| int ggml_cpu_get_sve_cnt(void) { | ||
| #if defined(__ARM_ARCH) && defined(__ARM_FEATURE_SVE) | ||
| return ggml_arm_arch_features.sve_cnt; | ||
| return svcntb(); | ||
| #else | ||
| return 0; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a system call that we want to avoid in hot loops. See the discussion starting here: #8709 (comment)
@angt Does this answer the question? I am not very familiar with the Arm feature set logic. Probably it can be improved/extended. I believe the main difficulty is that Arm hardware is not very ubiquitous among regular users (apart from Macs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw the PR (I reopened this PR after reading it). svcntb() is not a syscall it's a SVE instruction (cntb) and I would be very surprised if a global var read were faster.
But more importantly, it’s already used in several hot paths without any issues:
ggml_vec_dot_f16:
llama.cpp/ggml/src/ggml-cpu/vec.cpp
Line 219 in 4574f29
const int sve_register_length = svcntb() * 8; //get vector length ggml_vec_dot_q2_K_q8_K:
llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c
Line 1431 in 682e665
const int vector_length = svcntb()*8; ggml_vec_dot_q3_K_q8_K:
llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c
Line 1773 in 4574f29
const int vector_length = svcntb()*8;
And no one wanted to replace it so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov If I remove the last commit to keep the global var, would this be mergeable for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Even if it not a system call, the person there measured a significant difference, so unless someone else measures another result, I would go with that information. These svcntb() calls should be replaced - probably were overlooked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, let's go the safe way, just replacing prctl() for svcntb() to set the global var.
502cc78 to
115f0b1
Compare
This allows SVE on FreeBSD and others