-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
| Bugzilla Link | 52126 |
| Version | trunk |
| OS | Linux |
| CC | @alexey-bataev,@fhahn,@RKSimon |
Extended Description
433.milc
typedef struct {
double real;
double imag;
} complex;
typedef struct {
complex e[3][3];
} su3_matrix;
#define CSUM(a, b)
{
(a).real += (b).real;
(a).imag += (b).imag;
}
#define CMUL(a, b, c)
{
(c).real = (a).real * (b).real - (a).imag * (b).imag;
(c).imag = (a).real * (b).imag + (a).imag * (b).real;
}
void mult_su3_nn2(su3_matrix *a, su3_matrix *b, su3_matrix *c) {
int i, j, k;
complex x, y;
for (i = 0; i < 3; i++)
for (j = 0; j < 3; j++) {
x.real = x.imag = 0.0;
for (k = 0; k < 3; k++) {
CMUL(a->e[i][k], b->e[k][j], y);
CSUM(x, y);
}
c->e[i][j] = x;
}
}
Flags: -Ofast -mavx
https://godbolt.org/z/b6nq5sKaW
example.cpp:10:5: remark: the cost-model indicates that vectorization is not beneficial [-Rpass-missed=loop-vectorize]
for(i=0;i<3;i++)
^
example.cpp:10:5: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-missed=loop-vectorize]
GCC/ICC vectorizes mult_su3_nn_unrolled. LLVM does not vectorize it with AVX/AVX, only with -AVX512 (but does not use vaddsubpd as GCC and ICC do; GCC recently added pattern detection for addsub to SLP vectorizer)