-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
There is a missed opportunity to remove the following dead code on latest Clang when restraining optimizations to O2
, that GCC seems to catch via CDDCE (Godbolt here).
#define SIZE 100
void square(float x[SIZE], const float A[SIZE][SIZE], const int b[SIZE]) {
float y[SIZE];
for (int i = 0; i < SIZE; ++i)
for (int j = 0; j < SIZE; ++j)
y[j] += A[j][i] * b[i];
}
Dumping the LLVM IR at the different stages seems to reveal that the cause may stem from the current tradeoff on the threshold of loop unrolling. With O2
and --unroll-threshold=13000
, the optimization is accomplished by BDCE. Indeed when SIZE
is relatively small (< 40, different between O2
and O3
) the optimization succeeds.
Yet, while adjusting --unroll-threshold-aggressive
option might provide a more balanced tradeoff in O3
scenario, I believe this would be an orthogonal solution, as the core issue seems to lie within ADCE (which does some control dependence analysis).
Very minor, but I'm tagging x86-64-backend as well, since, for some reason, only on x86_64 the above seems to be lowered into a dead loop that increments rax
till reaching zero on small sizes (though this would likely be solved if it were DCE'd).