-
Notifications
You must be signed in to change notification settings - Fork 15k
Description
I am using the Linux x64_64 release of LLVM 21.1.4.
Reduced test case (also in godbolt):
test.cpp
#include <cmath>
#include <iostream>
using T = double;
using U = float;
void __attribute__((noinline)) computePow(T *dst, T *base, U *exponent, int n)
{
for (int i = 0; i < n; ++i) {
dst[i] = static_cast<T>(std::pow(base[i], exponent[i]));
}
}
int main()
{
constexpr int N = 4;
T x[N] = {2, 4, 6, 8};
U y[N] = {7, 5, 3, 1};
T z[N];
computePow(z, x, y, N);
for (int i = 0; i < N; ++i) {
std::cout << "pow(" << x[i] << ", " << y[i] << ") = " << z[i]
<< std::endl;
}
}$ clang++ test.cpp -o test -O3 -fveclib=libmvec -fno-math-errno
./test
pow(2, 7) = 64
pow(4, 5) = 65536
pow(6, 3) = 0
pow(8, 1) = 0
However, the result should be
pow(2, 7) = 128
pow(4, 5) = 1024
pow(6, 3) = 216
pow(8, 1) = 8
The same wrong result happens for other combinations of types for T and U, where T != U, for example T = double and U = int. The result is instead correct for T = U = double and T = U = float.
From what I have investigated, it seems like the issue is that the compiled program uses the libmvec function _ZGVdN4vv_pow for computing 4 powers at once, but the values are not placed properly in the registers.
In the assembly generated (see godbolt), the registers xmm0 and xmm1 are used for the base and the registers xmm2 and xmm3 for the exponent. I think that ymm0 should instead be used for the base and ymm1 for the exponent.