[LoopRotate] Incorrect branch weights updating #66675

aleks-tmb · 2023-09-18T17:37:20Z

In the revision https://reviews.llvm.org/D157462 branch weights updating was added to the LoopRotate.
But let's run the pass bin/opt -passes=loop-rotate test.ll on with the test case with 1:0 weights ratio:

declare void @llvm.experimental.deoptimize.isVoid(...)

define void @test(ptr addrspace(1) %0, i32 %limit) {
  br label %loop

loop:                                             ; preds = %guarded, %1
  %iv = phi i32 [ %iv.next, %guarded ], [ 0, %1 ]
  %gc = icmp slt i32 %iv, %limit
  br i1 %gc, label %guarded, label %exit, !prof !0

guarded:                                          ; preds = %loop
  %iv.next = add i32 %iv, 1
  %load = load atomic ptr addrspace(1), ptr addrspace(1) %0 unordered, align 8
  %ec = icmp eq ptr addrspace(1) %load, null
  br i1 %ec, label %deopt, label %loop

deopt:                                            ; preds = %guarded
  call void (...) @llvm.experimental.deoptimize.isVoid(i32 11) [ "deopt"(i32 0) ]
  ret void

exit:                                             ; preds = %loop
  ret void
}

!0 = !{!"branch_weights", i32 1, i32 0}

Resulting branch weigths looks weird:

define void @test(ptr addrspace(1) %0, i32 %limit) {
  %gc1 = icmp slt i32 0, %limit
  br i1 %gc1, label %guarded.lr.ph, label %exit, !prof !0

guarded.lr.ph:                                    ; preds = %1
  br label %guarded

loop:                                             ; preds = %guarded
  %iv = phi i32 [ %iv.next, %guarded ]
  %gc = icmp slt i32 %iv, %limit
  br i1 %gc, label %guarded, label %loop.exit_crit_edge, !prof !1

guarded:                                          ; preds = %guarded.lr.ph, %loop
  %iv2 = phi i32 [ 0, %guarded.lr.ph ], [ %iv, %loop ]
  %iv.next = add i32 %iv2, 1
  %load = load atomic ptr addrspace(1), ptr addrspace(1) %0 unordered, align 8
  %ec = icmp eq ptr addrspace(1) %load, null
  br i1 %ec, label %deopt, label %loop

deopt:                                            ; preds = %guarded
  call void (...) @llvm.experimental.deoptimize.isVoid(i32 11) [ "deopt"(i32 0) ]
  ret void

loop.exit_crit_edge:                              ; preds = %loop
  br label %exit

exit:                                             ; preds = %loop.exit_crit_edge, %1
  ret void
}

!0 = !{!"branch_weights", i32 -1, i32 1}
!1 = !{!"branch_weights", i32 -2147483647, i32 -1}

It seems the reason is that we don't scale OrigLoopExitWeight when it's 0:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp#L314

      // Scale up counts if necessary so we can match `ZeroTripCountWeights` for
      // the `ExitWeight0`:`ExitWeight1` (aka `x0`:`x1` ratio`) ratio.
      while (OrigLoopExitWeight < ZeroTripCountWeights[1] + ExitWeight0) {
        // ... but don't overflow.
        uint32_t const HighBit = uint32_t{1} << (sizeof(uint32_t) * 8 - 1);
        if ((OrigLoopBackedgeWeight & HighBit) != 0 ||
            (OrigLoopExitWeight & HighBit) != 0)
          break;
        OrigLoopBackedgeWeight <<= 1;
        OrigLoopExitWeight <<= 1;
      }

and the subsequent computation overflowing:
uint32_t ExitWeight1 = OrigLoopExitWeight - ExitWeight0;

The text was updated successfully, but these errors were encountered:

MatzeB · 2023-09-18T18:29:12Z

Thanks for the report!

I did not consider zero-branch weights and they did cause an underflow here as my formula assumed that loop-entry weights should be equivalent to loop-exit weights which isn't really true for these weights that make things look like an endless loop. I pushed a PR that adds a special case to the code for this.

Out of curiosity: I assume you are not using the IPGO? As the instrumentation there will always increase the counters a bit to avoid zero-weights as far as I can tell... (of course it makes sense to properly handle zero-counts in LoopRotation anyway)

The formula I added to LoopRotationUtils does not produce reasonable results if some of the branch weights are zero. Add special case handling for this. This fixes llvm#66675

aleks-tmb · 2023-09-19T13:52:49Z

Out of curiosity: I assume you are not using the IPGO?

We aren't. We have a java based JIT compiler and IR is being built using custom PGO instrumentation, where zero-weights are possible :-)

The formula I added to LoopRotationUtils does not produce reasonable results if some of the branch weights are zero. Add special case handling for this. This fixes #66675

github-actions bot added the new issue label Sep 18, 2023

aleks-tmb assigned MatzeB Sep 18, 2023

MatzeB mentioned this issue Sep 18, 2023

LoopRotationUtils: Special case zero-branch weight cases #66681

Merged

EugeneZelenko added loopoptim and removed new issue labels Sep 19, 2023

MatzeB added the PGO Profile Guided Optimizations label Sep 19, 2023

MatzeB closed this as completed in #66681 Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoopRotate] Incorrect branch weights updating #66675

[LoopRotate] Incorrect branch weights updating #66675

aleks-tmb commented Sep 18, 2023

MatzeB commented Sep 18, 2023

aleks-tmb commented Sep 19, 2023 •

edited

[LoopRotate] Incorrect branch weights updating #66675

[LoopRotate] Incorrect branch weights updating #66675

Comments

aleks-tmb commented Sep 18, 2023

MatzeB commented Sep 18, 2023

aleks-tmb commented Sep 19, 2023 • edited

aleks-tmb commented Sep 19, 2023 •

edited