-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
In the following testcase, ideally, the jump to the trap can de elided, but the current optimization pipeline misses this.
Running CorrelatedValuePropagation (-passes=correlated-propagation
) misses this due to merging the ranges for the incoming predecessors into a block. Currently LVI does "union(predecessors)", followed by "computation based on instruction opcode". If the ranges were stored per block predecessor (another cache...so added cost...), LVI could theoretically do first "computation based on instruction opcode for each predecessor", followed by "union(predecessors)").
For the example below, the range for%lshr
is currently the result of lshr
[7, 277632] [3, 8], which is [0, 34704]. The intervals used for the lshr
are the results of previous union operations from the phi nodes.
But if the intervals for the operands were computed per predecessor (for the add and phis, store the result of getEdgeValue
before doing the union in mergeIn
), then the range for%lshr
could become the union of ((lshr
[7, 1032] , 3), (lshr
[15487, 262145], 7))=union of ([0, 129], [129, 2169])=[0, 2169]. With this, %icmp6
could be replaced with true
in the br
.
Is it worth extending LVI to support this for constant ranges only, when all predecessor values are known (i.e. clear the predecessor cache on early exit for overdefined), with compile times evaluated and the extended cache populated/used under a flag?
Are there other passes or ways this range analysis is done and could currently catch this?
@global = external local_unnamed_addr global [4338 x i32], align 16
define dso_local noundef zeroext i1 @foo(i64 noundef %arg, ptr noundef writeonly captures(none) %arg1) local_unnamed_addr {
bb:
%icmp = icmp ult i64 %arg, 1025
br i1 %icmp, label %bb4, label %bb2
bb2: ; preds = %bb
%icmp3 = icmp ult i64 %arg, 262145
br i1 %icmp3, label %bb4, label %bb9
bb4: ; preds = %bb2, %bb
%phi = phi i64 [ 7, %bb ], [ 15487, %bb2 ]
%phi5 = phi i64 [ 3, %bb ], [ 7, %bb2 ]
%add = add nuw nsw i64 %phi, %arg
%lshr = lshr i64 %add, %phi5
%icmp6 = icmp samesign ult i64 %lshr, 4338
br i1 %icmp6, label %bb8, label %bb7
bb7: ; preds = %bb4
tail call void @llvm.ubsantrap(i8 18)
unreachable
bb8: ; preds = %bb4
%getelementptr = getelementptr inbounds nuw [4338 x i32], ptr @global, i64 0, i64 %lshr
%load = load i32, ptr %getelementptr, align 4
%sext = sext i32 %load to i64
store i64 %sext, ptr %arg1, align 8
br label %bb9
bb9: ; preds = %bb8, %bb2
%phi10 = phi i1 [ true, %bb8 ], [ false, %bb2 ]
ret i1 %phi10
}
; Function Attrs: cold noreturn nounwind
declare void @llvm.ubsantrap(i8 immarg) #0
attributes #0 = { cold noreturn nounwind }