Add functionality to test if the host CPU supports native SIMD instructions #107406
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit adds the ability to check if the host CPU supports the instruction set we use in our native vector code.
This is to avoid having the process terminated (badly) because we tried to execute a non-supported instruction; better know this early, and avoid to take the native code path entirely.
The
int vec_caps()
function, implemented in the native code (C) of the library, returns 0 if the processor does not support (i.e. cannot run) the instructions used to implement the other functions.On ARM, we use the NEON vector instruction set (#106133):
vec_caps()
is trivial and returns1
; for M-series CPUs, we can safely assume NEON is always present.getauxval
withAT_HWCAP
, and test for theHWCAP_NEON
bit (4096).I've tested
getauxval
on various ARM systems and emulators (including Graviton on AWS); the NEON instruction set seems to be widely available - I was not able to find a processor supportingarmv8
and not supporting NEON; nevertheless, I think that adding this functionality is good for 3 reasons:The last point: since
vec_caps()
returns an integer, we may want to return different values (e.g. 2) to indicate some "advanced support", and dynamically bind to a different flavor of the function (e.g.dot8s_2
) that is implemented with a more performant but less diffused instruction set (e.g. AVX512, armv9, etc.)