-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Open
Description
LoopVectorize currently gives up on a common pattern where a loop loads an element, increments it and save it back with a loop-invariant address:
while.body: ; preds = %while.body.preheader, %while.body
%theFirst.addr.0112 = phi ptr [ %incdec.ptr9, %while.body ], [ %theFirst, %while.body.preheader ]
%thePointer.0111 = phi ptr [ %incdec.ptr, %while.body ], [ %add.ptr.i, %while.body.preheader ]
%17 = load i16, ptr %theFirst.addr.0112, align 2
store i16 %17, ptr %thePointer.0111, align 2
%incdec.ptr = getelementptr inbounds nuw i8, ptr %thePointer.0111, i64 2
; starts here
%18 = load i64, ptr %m_size.i, align 8
%inc = add i64 %18, 1
store i64 %inc, ptr %m_size.i, align 8
; ends here
%incdec.ptr9 = getelementptr inbounds nuw i8, ptr %theFirst.addr.0112, i64 2
%cmp7.not = icmp eq ptr %incdec.ptr9, %theLast
br i1 %cmp7.not, label %cleanup.loopexit, label %while.body, !llvm.loop !15
Because %m_size.i is loop-invariant, LV treats the store as a loop-carried dependence to a uniform address and bails out.
Earlier passes also cannot promote this because they lack runtime alias checks.
Hoisting this load showed a ~25% speedup on SPEC CPU Xalan for RISC-V.
I'm working on a hoisting check using MemorySSA and SCEV in LoopAccessAnalysis and a new recipe for promotion of scalar in LoopVectorize.