Skip to content

Conversation

@angt
Copy link
Collaborator

@angt angt commented Nov 24, 2025

This allows SVE on FreeBSD and others

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
@angt
Copy link
Collaborator Author

angt commented Nov 24, 2025

I dont know why this code exists in its current form, I guess it is mostly historical/legacy.
However, afaik runtime detection does not work on aarch64, and the current implementation fails to compile on FreeBSD.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 24, 2025
@angt angt closed this Nov 24, 2025
@angt
Copy link
Collaborator Author

angt commented Nov 25, 2025

I did some git archaeology this morning to understand why we use prctl, and to be honest, I’m even more confused than yesterday 😵‍💫 (starting from #8709 as a reference).

Anyway, I reopen the PR so we can at least discuss it and move things forward

@angt
Copy link
Collaborator Author

angt commented Dec 2, 2025

@ggerganov do you have any insights on why the code is currently in this state? I'm having trouble understanding how we got there, and right now llama.cpp just doesn’t build on many OS.

Comment on lines 3526 to 3545
int ggml_cpu_get_sve_cnt(void) {
#if defined(__ARM_ARCH) && defined(__ARM_FEATURE_SVE)
return ggml_arm_arch_features.sve_cnt;
return svcntb();
#else
return 0;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a system call that we want to avoid in hot loops. See the discussion starting here: #8709 (comment)

@angt Does this answer the question? I am not very familiar with the Arm feature set logic. Probably it can be improved/extended. I believe the main difficulty is that Arm hardware is not very ubiquitous among regular users (apart from Macs).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw the PR (I reopened this PR after reading it). svcntb() is not a syscall it's a SVE instruction (cntb) and I would be very surprised if a global var read were faster.

But more importantly, it’s already used in several hot paths without any issues:

And no one wanted to replace it so far.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov If I remove the last commit to keep the global var, would this be mergeable for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Even if it not a system call, the person there measured a significant difference, so unless someone else measures another result, I would go with that information. These svcntb() calls should be replaced - probably were overlooked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let's go the safe way, just replacing prctl() for svcntb() to set the global var.

@angt angt force-pushed the ggml-use-svcntb-for-sve-vector-length-detection branch from 502cc78 to 115f0b1 Compare December 2, 2025 10:04
@ggerganov ggerganov merged commit e148380 into ggml-org:master Dec 2, 2025
70 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants