Skip to content

[GuardWidening] Miscompile due to widening into loop-invariant WC #60234

Closed
@xortator

Description

@xortator

Alive2 repro: https://godbolt.org/z/P7fb4P9ab

The bug is in guard widening. However, to demonstrate why it is a bug, I will also run indvars. Consider case:

declare i32 @llvm.experimental.deoptimize.i32(...)

define i32 @test(i32 %start) {
entry:
  %wc1 = call i1 @llvm.experimental.widenable.condition()
  br label %loop

loop:
  %iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]
  br i1 %wc1, label %guard_block, label %exit_by_wc

exit_by_wc:
  %rval1 = call i32(...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv) ]
  ret i32 %rval1

guard_block:
  %start_plus_1 = add i32 %start, 1
  %cond = icmp ne i32 %start_plus_1, %iv
  %wc2 = call i1 @llvm.experimental.widenable.condition()
  %guard = and i1 %cond, %wc2
  br i1 %guard, label %backedge, label %failure

backedge:
  call void @side_effect()
  %iv.next = add i32 %iv, 1
  br label %loop

exit:
  ret i32 -1

failure:
  %rval2 = call i32(...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv) ]
  ret i32 %rval2
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(inaccessiblemem: readwrite)
declare i1 @llvm.experimental.widenable.condition()

declare void @side_effect()

Let start = 0. The possible scenarios here are the following:

  1. wc1 = false. In this case, @side_effect is never called, we deopt with iv = 0.
  2. wc1 = true, wc2 = false (on 1st iteration). In this case, @side_effect is never called, we deopt with iv = 0.
  3. wc1 = true, wc2 = true (on 1st iteration). In this case, @side_effect will be called once. Deopt after loop will not happen (because wc1 = true and is loop-invariant), however cond will be false on 1st iteration (iv will be equal to start + 1). So regardless on wc2, we will then deopt with iv = 1.

As you can see, in all cases the number of calls of @side_effect matches the value we deoptimize with. We can expect that this fact stays true, as long as we do the right things.

Now, indvars comes. Invars notices that branch by wc1 is loop-invariant. Therefore, deopt there can only happen on 1st iteration or never. It means it is safe to replace the iv in deopt value with start. Result: https://godbolt.org/z/Wq7EMT58M

declare i32 @llvm.experimental.deoptimize.i32(...)

define i32 @test(i32 %start) {
entry:
  %wc1 = call i1 @llvm.experimental.widenable.condition()
  br label %loop

loop:                                             ; preds = %backedge, %entry
  %iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]
  br i1 %wc1, label %guard_block, label %exit_by_wc

exit_by_wc:                                       ; preds = %loop
  %iv.lcssa = phi i32 [ %start, %loop ]
  %rval1 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa) ]
  ret i32 %rval1

guard_block:                                      ; preds = %loop
  %start_plus_1 = add i32 %start, 1
  %cond = icmp ne i32 %start_plus_1, %iv
  %wc2 = call i1 @llvm.experimental.widenable.condition()
  %guard = and i1 %cond, %wc2
  br i1 %guard, label %backedge, label %failure

backedge:                                         ; preds = %guard_block
  call void @side_effect()
  %iv.next = add i32 %iv, 1
  br label %loop

  ret i32 -1

failure:                                          ; preds = %guard_block
  %iv.lcssa1 = phi i32 [ %iv, %guard_block ]
  %rval2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa1) ]
  ret i32 %rval2
}

declare i1 @llvm.experimental.widenable.condition() #0

declare void @side_effect()

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(inaccessiblemem: readwrite) }

Note that this transform is completely legal and corresponds to LangRef for widenable condition, saying:

While this may appear similar in semantics to undef, it is very different in that an invocation produces a particular, singular value. It is also intended to be lowered late, and remain available for specific optimizations and transforms that can benefit from its special properties

(when it was written, there is no such thing as freeze, but you can think that widenable condition outside the loop behaves exactly like freeze(undef) outside the loop, so it's a loop invariant).

And now how the bug happens. Let's run guard-widening on top of it: https://godbolt.org/z/P7fb4P9ab

declare i32 @llvm.experimental.deoptimize.i32(...)

define i32 @test(i32 %start) {
entry:
  %wc1 = call i1 @llvm.experimental.widenable.condition()
  br label %loop

loop:                                             ; preds = %backedge, %entry
  %iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]
  %start_plus_1 = add i32 %start, 1
  %cond = icmp ne i32 %start_plus_1, %iv
  %wide.chk = and i1 true, %cond
  %0 = and i1 %wide.chk, %wc1
  br i1 %0, label %guard_block, label %exit_by_wc

exit_by_wc:                                       ; preds = %loop
  %iv.lcssa = phi i32 [ %start, %loop ]
  %rval1 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa) ]
  ret i32 %rval1

guard_block:                                      ; preds = %loop
  %wc2 = call i1 @llvm.experimental.widenable.condition()
  %guard = and i1 %cond, %wc2
  br i1 true, label %backedge, label %failure

backedge:                                         ; preds = %guard_block
  call void @side_effect()
  %iv.next = add i32 %iv, 1
  br label %loop

  ret i32 -1

failure:                                          ; preds = %guard_block
  %iv.lcssa1 = phi i32 [ %iv, %guard_block ]
  %rval2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %iv.lcssa1) ]
  ret i32 %rval2
}

declare i1 @llvm.experimental.widenable.condition() #0

declare void @side_effect()

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(inaccessiblemem: readwrite) }

Now imagine that wc1 = true and wc2 = true. On the first iteration, wide_check is true (because iv = 0 and start + 1 = 1). So we will reach guard block and then backedge, calling @side_effect once. Then, on the 2nd iteration, cond = false and therefore wide.chk = false. It means that we must deoptimize in block exit_by_wc with lv.lcssa = 0.

So now we've called side effect once, but after deopt, we think that iv = 0. If it was used to re-execute the loop in interpreter, it will make one extra iteration.

My working theory is that widening into loop-invariant WCs is just wrong.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions