[InstCombine] Remove some of the complexity-based canonicalization #91185

nikic · 2024-05-06T10:59:56Z

The idea behind is that the canonicalization allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right.

However, the fact that arguments are also canonicalized to the right seems like it may be doing more damage than good: This means that writing tests to cover both commuted forms requires special care ("thwart complexity-based canonicalization").

I think we should consider dropping this canonicalization to make testing simpler.

(Draft because there are some obvious regressions in tests that I need to look at, and because I haven't updated all tests yet.)

nikic · 2024-05-06T11:01:50Z

cc @dtcxzyw @goldsteinn to get some early feedback.

PR Link: llvm/llvm-project#91185

dtcxzyw · 2024-05-06T13:29:14Z

It is surprising that this patch slightly improves the performance on MultiSource/Benchmarks/McCat/05-eks/eks...

goldsteinn · 2024-05-06T15:15:27Z

llvm/test/Transforms/PhaseOrdering/AArch64/peel-multiple-unreachable-exits-for-vectorization.ll

 ; CHECK:       exit:
-; CHECK-NEXT:    [[SUM_NEXT_LCSSA:%.*]] = phi i64 [ [[SUM_NEXT_PEEL]], [[AT_WITH_INT_CONVERSION_EXIT11_PEEL:%.*]] ], [ [[SUM_NEXT]], [[AT_WITH_INT_CONVERSION_EXIT11]] ]
-; CHECK-NEXT:    ret i64 [[SUM_NEXT_LCSSA]]
+; CHECK-NEXT:    ret i64 [[SUM_NEXT]]


Looks like it breaks vectorization here?

This seems to be a difference in LICM behavior depending on how you write the comparison :/ https://llvm.godbolt.org/z/b6TWfbMGc

The LICM difference is due to the code in

llvm-project/llvm/lib/Analysis/MustExecute.cpp

Line 117 in f0a6816

static bool CanProveNotTakenFirstIteration(const BasicBlock *ExitBlock,

I have a fix for it, but need to figure out what's up with this PhaseOrdering test before landing it...

I think the core problem here is that SCEV cannot compute an exit count for something like this:

define void @test(i64 %M, i64 %N) { entry: br label %loop loop: %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] %cmp1 = icmp ule i64 %iv, %M br i1 %cmp1, label %latch, label %error error: call void @error() unreachable latch: %iv.next = add nuw i64 %iv, 1 %exitcond.not = icmp eq i64 %iv, %N br i1 %exitcond.not, label %exit, label %loop exit: ret void } declare void @error()

We know that this is not an infinite loop (thanks to the add nuw even if we only look at the first exit). However, the %loop exit count would be %M + 1, which may overflow. We know that if this exit is taken, then that overflow can't happen. But if another exit is taken, it can. Computing the BECount of the whole loop as umin(%M + 1, %N) would not be correct. I guess we could return the exit count as zext(%M) + 1 though, which is what we sometimes do when converting from BECount to trip count.

@fhahn It seems like the whole "peel to make load dereferenceable" approach from cd0ba9d basically just fixed things by accident. Peeling was never necessary in the first place to make the load dereferenceable, we just failed to see it due to a simple implementation bug. What the peeling actually did is make the exit count of the loop computable...

I've put up #92206 for the SCEV issue.

goldsteinn · 2024-05-06T15:17:07Z

llvm/test/Transforms/PGOProfile/chr.ll

-; CHECK-NEXT:    [[TMP1:%.*]] = and i1 [[TMP0]], [[CMP3]]
-; CHECK-NEXT:    [[TMP2:%.*]] = and i1 [[TMP1]], [[CMP_I]]
-; CHECK-NEXT:    br i1 [[TMP2]], label [[BB1:%.*]], label [[ENTRY_SPLIT_NONCHR:%.*]], !prof [[PROF15]]
+; CHECK-NEXT:    [[TMP1:%.*]] = freeze i1 [[CMP3]]


This looks a bit suspect.

llvm/test/Transforms/InstCombine/add.ll

goldsteinn · 2024-05-06T15:29:54Z

llvm/test/Transforms/InstCombine/bit-checks.ll

+; CHECK-NEXT:    [[AND2:%.*]] = and i32 [[ARGC]], [[ARGC3:%.*]]
+; CHECK-NEXT:    [[TOBOOL3:%.*]] = icmp ne i32 [[ARGC3]], [[AND2]]
+; CHECK-NEXT:    [[AND_COND_NOT:%.*]] = or i1 [[TOBOOL]], [[TOBOOL3]]
+; CHECK-NEXT:    [[STOREMERGE:%.*]] = zext i1 [[AND_COND_NOT]] to i32


regression + below

llvm/test/Transforms/InstCombine/fadd.ll

goldsteinn · 2024-05-06T15:32:04Z

llvm/test/Transforms/InstCombine/fast-basictest.ll

-; CHECK-NEXT:    [[F:%.*]] = fmul fast float [[TMP1]], [[A:%.*]]
+; CHECK-NEXT:    [[C:%.*]] = fmul fast float [[Z:%.*]], -4.000000e+01
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg fast float [[A:%.*]]
+; CHECK-NEXT:    [[F:%.*]] = fmul fast float [[C]], [[TMP1]]


regression + below

I think this is related to this weird transform:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Line 2685 in 97be79c

Instruction *InstCombinerImpl::hoistFNegAboveFMulFDiv(Value *FNegOp,

It pushes the fneg into one of the operands, so it's fundamentally asymmetric. I believe for integers we perform the transform the other way around, such that it is symmetric.

goldsteinn · 2024-05-06T15:33:01Z

llvm/test/Transforms/InstCombine/fast-math.ll

-; CHECK-NEXT:    [[SQRT:%.*]] = fmul fast double [[FABS]], [[SQRT1]]
+; CHECK-NEXT:    [[MUL:%.*]] = fmul fast double [[X:%.*]], [[X]]
+; CHECK-NEXT:    [[MUL2:%.*]] = fmul fast double [[Y:%.*]], [[MUL]]
+; CHECK-NEXT:    [[SQRT:%.*]] = call fast double @llvm.sqrt.f64(double [[MUL2]])


llvm/test/Transforms/InstCombine/or-xor.ll

goldsteinn · 2024-05-06T15:36:15Z

llvm/test/Transforms/InstCombine/pr14365.ll

@@ -31,7 +31,7 @@ define i32 @test1(i32 %a0) {
 ; CHECK-LABEL: @test1(
 ; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[A0:%.*]], 1
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 1431655765
-; CHECK-NEXT:    [[TMP3:%.*]] = sub nsw i32 [[A0]], [[TMP2]]
+; CHECK-NEXT:    [[TMP3:%.*]] = sub i32 [[A0]], [[TMP2]]


regression + below

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h

goldsteinn · 2024-05-06T15:47:29Z

The freeze reassos and vectorization change I think deserve a closer look. Otherwise just seems like some misc regressions in InstCombine none of which seems like a huge deal (Maybe the B == (B & A) & D == (D & A) case is important).

@dtcxzyw am I reading it right that you had no diff in your benchmarks? That seems surprising...

edit: Also might be nice to regen all the affected tests as a pre-commit to reduce noise in this gigantic diff.

dtcxzyw · 2024-05-06T16:06:54Z

am I reading it right that you had no diff in your benchmarks? That seems surprising...

No. I just canceled the run as I thought it would produce tons of diff :(

goldsteinn · 2024-05-06T16:27:32Z

am I reading it right that you had no diff in your benchmarks? That seems surprising...

No. I just canceled the run as I thought it would produce tons of diff :(

I see (another case for: dtcxzyw/llvm-opt-benchmark#355). Can you run it? Ill try and get -stats to work locally and post them here.

dtcxzyw · 2024-05-06T16:57:15Z

Can you run it? Ill try and get -stats to work locally and post them here.

I cannot run it as LLVM doesn't support the cost estimation of struct types :(

goldsteinn · 2024-05-06T17:08:08Z

Can you run it? Ill try and get -stats to work locally and post them here.

I cannot run it as LLVM doesn't support the cost estimation of struct types :(

Err misunderstanding, I mean can you just generate your normal diffs. Ill do the cost estimation stuff locally and just post the results here.

PR Link: llvm/llvm-project#91185

dtcxzyw · 2024-05-07T04:59:02Z

Can you run it? Ill try and get -stats to work locally and post them here.

I cannot run it as LLVM doesn't support the cost estimation of struct types :(

Err misunderstanding, I mean can you just generate your normal diffs. Ill do the cost estimation stuff locally and just post the results here.

Done. The IR diff basically looks fine to me.

The main problem is that some icmp pred A, B and icmp swap(pred) B, A pairs are not CSEed now.
See dtcxzyw/llvm-opt-benchmark#583 (comment).

BTW I find that SimplifyDemandBits breaks is_pow2 idioms.
See dtcxzyw/llvm-opt-benchmark#583 (comment).
I will post a patch later.

There are some missed optimizations which should be handled explicitly.
See dtcxzyw/llvm-opt-benchmark#583 (comment)
and dtcxzyw/llvm-opt-benchmark#583 (comment).

goldsteinn · 2024-05-07T18:35:20Z

Can you run it? Ill try and get -stats to work locally and post them here.

I cannot run it as LLVM doesn't support the cost estimation of struct types :(

Err misunderstanding, I mean can you just generate your normal diffs. Ill do the cost estimation stuff locally and just post the results here.

Done. The IR diff basically looks fine to me.

The main problem is that some icmp pred A, B and icmp swap(pred) B, A pairs are not CSEed now. See dtcxzyw/llvm-opt-benchmark#583 (comment).

Ah, truthfully that and commutative binops imo make a case for keeping this as is.

nikic · 2024-05-08T00:44:33Z

Can you run it? Ill try and get -stats to work locally and post them here.

I cannot run it as LLVM doesn't support the cost estimation of struct types :(

Err misunderstanding, I mean can you just generate your normal diffs. Ill do the cost estimation stuff locally and just post the results here.

Done. The IR diff basically looks fine to me.
The main problem is that some icmp pred A, B and icmp swap(pred) B, A pairs are not CSEed now. See dtcxzyw/llvm-opt-benchmark#583 (comment).

Ah, truthfully that and commutative binops imo make a case for keeping this as is.

Our main CSE passes (EarlyCSE and GVN) have support for handling commutative operations. I think in this case the failure is probably from a pass like SimplifyCFG where we may fail to hoist/sink "non-identical" instructions.

The problem with relying on InstCombine complexity-based canonicalization for this purpose is that it's very weak. It will handle the case where the icmp has an instruction and argument operand, but if both are instructions, then they will not be canonicalized and we're back to the same problem. So I think to handle this properly we still need explicit commutative operation support.

goldsteinn · 2024-05-08T01:13:32Z

Can you run it? Ill try and get -stats to work locally and post them here.

I cannot run it as LLVM doesn't support the cost estimation of struct types :(

Err misunderstanding, I mean can you just generate your normal diffs. Ill do the cost estimation stuff locally and just post the results here.

Done. The IR diff basically looks fine to me.
The main problem is that some icmp pred A, B and icmp swap(pred) B, A pairs are not CSEed now. See dtcxzyw/llvm-opt-benchmark#583 (comment).

Ah, truthfully that and commutative binops imo make a case for keeping this as is.

Our main CSE passes (EarlyCSE and GVN) have support for handling commutative operations. I think in this case the failure is probably from a pass like SimplifyCFG where we may fail to hoist/sink "non-identical" instructions.

The problem with relying on InstCombine complexity-based canonicalization for this purpose is that it's very weak. It will handle the case where the icmp has an instruction and argument operand, but if both are instructions, then they will not be canonicalized and we're back to the same problem. So I think to handle this properly we still need explicit commutative operation support.

What about making the matchers for commutative ops default to the commutative version (thinks like cmp could be included).
We do that in the new sd_match API.
Or have the builder do the canonicalization... although that seems tricky.

The idea behind is that the canonicalization allows us to handle less pattern, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, the fact that arguments are also canonicalized to the right seems like it may be doing more damage than good: This means that writing tests to cover both commuted forms requires special care ("thwart complexity-based canonicalization"). I think we should consider dropping this canonicalization to make testing simpler.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 6, 2024

pre-commit: test PR91185

0b6e284

PR Link: llvm/llvm-project#91185

This was referenced May 6, 2024

pre-commit: test PR91185 dtcxzyw/llvm-opt-benchmark#583

Open

Test PR91185 dtcxzyw/llvm-ci#1139

Open

goldsteinn reviewed May 6, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/add.ll Outdated Show resolved Hide resolved

goldsteinn reviewed May 6, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/fadd.ll Outdated Show resolved Hide resolved

goldsteinn reviewed May 6, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/or-xor.ll Outdated Show resolved Hide resolved

goldsteinn reviewed May 6, 2024

View reviewed changes

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h Show resolved Hide resolved

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 6, 2024

pre-commit: test PR91185

d2ae703

PR Link: llvm/llvm-project#91185

nikic force-pushed the instcombine-complexity branch from 3f4b8da to 391f8d6 Compare May 9, 2024 06:21

nikic mentioned this pull request May 15, 2024

[SCEV] Support ule/sle exit counts via widening #92206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Remove some of the complexity-based canonicalization #91185

[InstCombine] Remove some of the complexity-based canonicalization #91185

nikic commented May 6, 2024 •

edited

nikic commented May 6, 2024

dtcxzyw commented May 6, 2024

goldsteinn May 6, 2024

nikic May 9, 2024

nikic May 14, 2024

nikic May 14, 2024

nikic May 15, 2024

goldsteinn May 6, 2024

goldsteinn May 6, 2024

goldsteinn May 6, 2024

nikic May 9, 2024

goldsteinn May 6, 2024

goldsteinn May 6, 2024

goldsteinn commented May 6, 2024 •

edited

dtcxzyw commented May 6, 2024

goldsteinn commented May 6, 2024

dtcxzyw commented May 6, 2024

goldsteinn commented May 6, 2024

dtcxzyw commented May 7, 2024

goldsteinn commented May 7, 2024

nikic commented May 8, 2024

goldsteinn commented May 8, 2024

[InstCombine] Remove some of the complexity-based canonicalization #91185

Are you sure you want to change the base?

[InstCombine] Remove some of the complexity-based canonicalization #91185

Conversation

nikic commented May 6, 2024 • edited

nikic commented May 6, 2024

dtcxzyw commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldsteinn commented May 6, 2024 • edited

dtcxzyw commented May 6, 2024

goldsteinn commented May 6, 2024

dtcxzyw commented May 6, 2024

goldsteinn commented May 6, 2024

dtcxzyw commented May 7, 2024

goldsteinn commented May 7, 2024

nikic commented May 8, 2024

goldsteinn commented May 8, 2024

nikic commented May 6, 2024 •

edited

goldsteinn commented May 6, 2024 •

edited