#include <algorithm>
extern char arr[32];
auto test(char c) {
return std::find_if(arr, arr + 26, [&](char v) { return v == c; });
}
currently generates a huge unrolled loop instead of generating a few vector instructions that it could be.
extern char arr[32];
template <class T, int N>
using vec [[clang::ext_vector_type(N)]] = T;
auto test(char c) {
auto match = vec<bool, 26>(__builtin_masked_load(vec<bool, 26>{1}, arr) == c);
return __builtin_clzg(match, 26);
}
is equivalent, but generates much better code.