Description
Alive2 repro: https://godbolt.org/z/P7fb4P9ab
The bug is in guard widening. However, to demonstrate why it is a bug, I will also run indvars. Consider case:
declare i32 @llvm.experimental.deoptimize.i32(...)
define i32 @test(i32 %start) {
entry:
%wc1 = call i1 @llvm.experimental.widenable.condition()
br label %loop
loop:
%iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]
br i1 %wc1, label %guard_block, label %exit_by_wc
exit_by_wc:
%rval1 = call i32(...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv) ]
ret i32 %rval1
guard_block:
%start_plus_1 = add i32 %start, 1
%cond = icmp ne i32 %start_plus_1, %iv
%wc2 = call i1 @llvm.experimental.widenable.condition()
%guard = and i1 %cond, %wc2
br i1 %guard, label %backedge, label %failure
backedge:
call void @side_effect()
%iv.next = add i32 %iv, 1
br label %loop
exit:
ret i32 -1
failure:
%rval2 = call i32(...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv) ]
ret i32 %rval2
}
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(inaccessiblemem: readwrite)
declare i1 @llvm.experimental.widenable.condition()
declare void @side_effect()
Let start = 0
. The possible scenarios here are the following:
wc1 = false
. In this case,@side_effect
is never called, we deopt withiv = 0
.wc1 = true, wc2 = false
(on 1st iteration). In this case,@side_effect
is never called, we deopt withiv = 0
.wc1 = true, wc2 = true
(on 1st iteration). In this case,@side_effect
will be called once. Deopt afterloop
will not happen (becausewc1 = true
and is loop-invariant), howevercond
will befalse
on 1st iteration (iv
will be equal tostart + 1
). So regardless onwc2
, we will then deopt withiv = 1
.
As you can see, in all cases the number of calls of @side_effect
matches the value we deoptimize with. We can expect that this fact stays true, as long as we do the right things.
Now, indvars comes. Invars notices that branch by wc1
is loop-invariant. Therefore, deopt there can only happen on 1st iteration or never. It means it is safe to replace the iv
in deopt value with start
. Result: https://godbolt.org/z/Wq7EMT58M
declare i32 @llvm.experimental.deoptimize.i32(...)
define i32 @test(i32 %start) {
entry:
%wc1 = call i1 @llvm.experimental.widenable.condition()
br label %loop
loop: ; preds = %backedge, %entry
%iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]
br i1 %wc1, label %guard_block, label %exit_by_wc
exit_by_wc: ; preds = %loop
%iv.lcssa = phi i32 [ %start, %loop ]
%rval1 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa) ]
ret i32 %rval1
guard_block: ; preds = %loop
%start_plus_1 = add i32 %start, 1
%cond = icmp ne i32 %start_plus_1, %iv
%wc2 = call i1 @llvm.experimental.widenable.condition()
%guard = and i1 %cond, %wc2
br i1 %guard, label %backedge, label %failure
backedge: ; preds = %guard_block
call void @side_effect()
%iv.next = add i32 %iv, 1
br label %loop
ret i32 -1
failure: ; preds = %guard_block
%iv.lcssa1 = phi i32 [ %iv, %guard_block ]
%rval2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa1) ]
ret i32 %rval2
}
declare i1 @llvm.experimental.widenable.condition() #0
declare void @side_effect()
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(inaccessiblemem: readwrite) }
Note that this transform is completely legal and corresponds to LangRef for widenable condition, saying:
While this may appear similar in semantics to undef, it is very different in that an invocation produces a particular, singular value. It is also intended to be lowered late, and remain available for specific optimizations and transforms that can benefit from its special properties
(when it was written, there is no such thing as freeze
, but you can think that widenable condition outside the loop behaves exactly like freeze(undef)
outside the loop, so it's a loop invariant).
And now how the bug happens. Let's run guard-widening on top of it: https://godbolt.org/z/P7fb4P9ab
declare i32 @llvm.experimental.deoptimize.i32(...)
define i32 @test(i32 %start) {
entry:
%wc1 = call i1 @llvm.experimental.widenable.condition()
br label %loop
loop: ; preds = %backedge, %entry
%iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]
%start_plus_1 = add i32 %start, 1
%cond = icmp ne i32 %start_plus_1, %iv
%wide.chk = and i1 true, %cond
%0 = and i1 %wide.chk, %wc1
br i1 %0, label %guard_block, label %exit_by_wc
exit_by_wc: ; preds = %loop
%iv.lcssa = phi i32 [ %start, %loop ]
%rval1 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa) ]
ret i32 %rval1
guard_block: ; preds = %loop
%wc2 = call i1 @llvm.experimental.widenable.condition()
%guard = and i1 %cond, %wc2
br i1 true, label %backedge, label %failure
backedge: ; preds = %guard_block
call void @side_effect()
%iv.next = add i32 %iv, 1
br label %loop
ret i32 -1
failure: ; preds = %guard_block
%iv.lcssa1 = phi i32 [ %iv, %guard_block ]
%rval2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa1) ]
ret i32 %rval2
}
declare i1 @llvm.experimental.widenable.condition() #0
declare void @side_effect()
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(inaccessiblemem: readwrite) }
Now imagine that wc1 = true
and wc2 = true
. On the first iteration, wide_check
is true (because iv = 0
and start + 1 = 1
). So we will reach guard block and then backedge, calling @side_effect
once. Then, on the 2nd iteration, cond = false
and therefore wide.chk = false
. It means that we must deoptimize in block exit_by_wc
with lv.lcssa = 0
.
So now we've called side effect once, but after deopt, we think that iv = 0
. If it was used to re-execute the loop in interpreter, it will make one extra iteration.
My working theory is that widening into loop-invariant WCs is just wrong.