-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Remove some of the complexity-based canonicalization #91185
base: main
Are you sure you want to change the base?
Conversation
cc @dtcxzyw @goldsteinn to get some early feedback. |
It is surprising that this patch slightly improves the performance on MultiSource/Benchmarks/McCat/05-eks/eks... |
; CHECK: exit: | ||
; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.*]] = phi i64 [ [[SUM_NEXT_PEEL]], [[AT_WITH_INT_CONVERSION_EXIT11_PEEL:%.*]] ], [ [[SUM_NEXT]], [[AT_WITH_INT_CONVERSION_EXIT11]] ] | ||
; CHECK-NEXT: ret i64 [[SUM_NEXT_LCSSA]] | ||
; CHECK-NEXT: ret i64 [[SUM_NEXT]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it breaks vectorization here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a difference in LICM behavior depending on how you write the comparison :/ https://llvm.godbolt.org/z/b6TWfbMGc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LICM difference is due to the code in
static bool CanProveNotTakenFirstIteration(const BasicBlock *ExitBlock, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the core problem here is that SCEV cannot compute an exit count for something like this:
define void @test(i64 %M, i64 %N) {
entry:
br label %loop
loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
%cmp1 = icmp ule i64 %iv, %M
br i1 %cmp1, label %latch, label %error
error:
call void @error()
unreachable
latch:
%iv.next = add nuw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv, %N
br i1 %exitcond.not, label %exit, label %loop
exit:
ret void
}
declare void @error()
We know that this is not an infinite loop (thanks to the add nuw
even if we only look at the first exit). However, the %loop
exit count would be %M + 1
, which may overflow. We know that if this exit is taken, then that overflow can't happen. But if another exit is taken, it can. Computing the BECount of the whole loop as umin(%M + 1, %N)
would not be correct. I guess we could return the exit count as zext(%M) + 1
though, which is what we sometimes do when converting from BECount to trip count.
@fhahn It seems like the whole "peel to make load dereferenceable" approach from cd0ba9d basically just fixed things by accident. Peeling was never necessary in the first place to make the load dereferenceable, we just failed to see it due to a simple implementation bug. What the peeling actually did is make the exit count of the loop computable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put up #92206 for the SCEV issue.
; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[TMP0]], [[CMP3]] | ||
; CHECK-NEXT: [[TMP2:%.*]] = and i1 [[TMP1]], [[CMP_I]] | ||
; CHECK-NEXT: br i1 [[TMP2]], label [[BB1:%.*]], label [[ENTRY_SPLIT_NONCHR:%.*]], !prof [[PROF15]] | ||
; CHECK-NEXT: [[TMP1:%.*]] = freeze i1 [[CMP3]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit suspect.
; CHECK-NEXT: [[AND2:%.*]] = and i32 [[ARGC]], [[ARGC3:%.*]] | ||
; CHECK-NEXT: [[TOBOOL3:%.*]] = icmp ne i32 [[ARGC3]], [[AND2]] | ||
; CHECK-NEXT: [[AND_COND_NOT:%.*]] = or i1 [[TOBOOL]], [[TOBOOL3]] | ||
; CHECK-NEXT: [[STOREMERGE:%.*]] = zext i1 [[AND_COND_NOT]] to i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regression + below
; CHECK-NEXT: [[F:%.*]] = fmul fast float [[TMP1]], [[A:%.*]] | ||
; CHECK-NEXT: [[C:%.*]] = fmul fast float [[Z:%.*]], -4.000000e+01 | ||
; CHECK-NEXT: [[TMP1:%.*]] = fneg fast float [[A:%.*]] | ||
; CHECK-NEXT: [[F:%.*]] = fmul fast float [[C]], [[TMP1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regression + below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to this weird transform:
Instruction *InstCombinerImpl::hoistFNegAboveFMulFDiv(Value *FNegOp, |
; CHECK-NEXT: [[SQRT:%.*]] = fmul fast double [[FABS]], [[SQRT1]] | ||
; CHECK-NEXT: [[MUL:%.*]] = fmul fast double [[X:%.*]], [[X]] | ||
; CHECK-NEXT: [[MUL2:%.*]] = fmul fast double [[Y:%.*]], [[MUL]] | ||
; CHECK-NEXT: [[SQRT:%.*]] = call fast double @llvm.sqrt.f64(double [[MUL2]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regression
@@ -31,7 +31,7 @@ define i32 @test1(i32 %a0) { | |||
; CHECK-LABEL: @test1( | |||
; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[A0:%.*]], 1 | |||
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 1431655765 | |||
; CHECK-NEXT: [[TMP3:%.*]] = sub nsw i32 [[A0]], [[TMP2]] | |||
; CHECK-NEXT: [[TMP3:%.*]] = sub i32 [[A0]], [[TMP2]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regression + below
The @dtcxzyw am I reading it right that you had no diff in your benchmarks? That seems surprising... edit: Also might be nice to regen all the affected tests as a pre-commit to reduce noise in this gigantic diff. |
No. I just canceled the run as I thought it would produce tons of diff :( |
I see (another case for: dtcxzyw/llvm-opt-benchmark#355). Can you run it? Ill try and get |
I cannot run it as LLVM doesn't support the cost estimation of struct types :( |
Err misunderstanding, I mean can you just generate your normal diffs. Ill do the cost estimation stuff locally and just post the results here. |
Done. The IR diff basically looks fine to me. The main problem is that some BTW I find that There are some missed optimizations which should be handled explicitly. |
Ah, truthfully that and commutative binops imo make a case for keeping this as is. |
Our main CSE passes (EarlyCSE and GVN) have support for handling commutative operations. I think in this case the failure is probably from a pass like SimplifyCFG where we may fail to hoist/sink "non-identical" instructions. The problem with relying on InstCombine complexity-based canonicalization for this purpose is that it's very weak. It will handle the case where the icmp has an instruction and argument operand, but if both are instructions, then they will not be canonicalized and we're back to the same problem. So I think to handle this properly we still need explicit commutative operation support. |
What about making the matchers for commutative ops default to the commutative version (thinks like cmp could be included). |
The idea behind is that the canonicalization allows us to handle less pattern, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, the fact that arguments are also canonicalized to the right seems like it may be doing more damage than good: This means that writing tests to cover both commuted forms requires special care ("thwart complexity-based canonicalization"). I think we should consider dropping this canonicalization to make testing simpler.
3f4b8da
to
391f8d6
Compare
The idea behind is that the canonicalization allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right.
However, the fact that arguments are also canonicalized to the right seems like it may be doing more damage than good: This means that writing tests to cover both commuted forms requires special care ("thwart complexity-based canonicalization").
I think we should consider dropping this canonicalization to make testing simpler.
(Draft because there are some obvious regressions in tests that I need to look at, and because I haven't updated all tests yet.)