Skip to content

[ADCE] Dead-code elimination fails when increasing loop count beyond threshold #63473

@antoniofrighetto

Description

@antoniofrighetto

There is a missed opportunity to remove the following dead code on latest Clang when restraining optimizations to O2, that GCC seems to catch via CDDCE (Godbolt here).

#define SIZE 100

void square(float x[SIZE], const float A[SIZE][SIZE], const int b[SIZE]) {
    float y[SIZE];

    for (int i = 0; i < SIZE; ++i)
        for (int j = 0; j < SIZE; ++j)
            y[j] += A[j][i] * b[i];
}

Dumping the LLVM IR at the different stages seems to reveal that the cause may stem from the current tradeoff on the threshold of loop unrolling. With O2 and --unroll-threshold=13000, the optimization is accomplished by BDCE. Indeed when SIZE is relatively small (< 40, different between O2 and O3) the optimization succeeds.

Yet, while adjusting --unroll-threshold-aggressive option might provide a more balanced tradeoff in O3 scenario, I believe this would be an orthogonal solution, as the core issue seems to lie within ADCE (which does some control dependence analysis).

Very minor, but I'm tagging x86-64-backend as well, since, for some reason, only on x86_64 the above seems to be lowered into a dead loop that increments rax till reaching zero on small sizes (though this would likely be solved if it were DCE'd).

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions