[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants #73056

jrbyrnes · 2023-11-22T00:23:51Z

In this case, a trivial GEP chain has the form:

%ptr = getelementptr sameType, %base, constant
%val = getelementptr sameType, %ptr, %variable

That is, a one-index GEP consumes another (of the same basis and result type) one-index GEP, where the inner GEP uses a constant index and the outer GEP uses a variable index. For chains of this type, it is trivial to reorder them (by simply swapping the indexes). The result of doing so is better AddrMode matching for users of the ultimate ptr produced by GEP chain.

Future patches can extend this to support non-trivial GEP chains (e.g. those with different basis types and/or multiple indices).

llvmbot · 2023-11-22T00:24:30Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-transforms

Author: Jeffrey Byrnes (jrbyrnes)

Changes

In this case, a trivial GEP chain has the form:

%ptr = getelementptr sameType, %base, constant
%val = getelementptr sameType, %ptr, %variable

That is, a one-index GEP consumes another (of the same basis and result type) one-index GEP, where the inner GEP uses a constant index and the outer GEP uses a variable index. For chains of this type, it is trivial to reorder them (by simply swapping the indexes). The result of doing so is better AddrMode matching for users of the ultimate ptr produced by GEP chain.

Future patches can extend this to support non-trivial GEP chains (e.g. those with different basis types and/or multiple indices).

Patch is 79.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73056.diff

7 Files Affected:

(modified) llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp (+42-3)
(modified) llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll (+18-20)
(modified) llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll (+116-135)
(modified) llvm/test/CodeGen/PowerPC/licm-remat.ll (+1-1)
(added) llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/reorder-gep.ll (+175)
(added) llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/lower-gep-reorder.ll (+65)
(added) llvm/test/Transforms/SeparateConstOffsetFromGEP/reorder-gep.ll (+188)

diff --git a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
index b8c9d9d100f117a..72791ac55839a3d 100644
--- a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+++ b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
@@ -987,11 +987,50 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
   bool NeedsExtraction;
   int64_t AccumulativeByteOffset = accumulateByteOffset(GEP, NeedsExtraction);
 
-  if (!NeedsExtraction)
-    return Changed;
-
   TargetTransformInfo &TTI = GetTTI(*GEP->getFunction());
 
+  if (!NeedsExtraction) {
+    // Attempt to reassociate GEPs for better AddrMode construction.
+    if (auto PtrGEP = dyn_cast<GetElementPtrInst>(GEP->getPointerOperand())) {
+      // TODO: support reordering for non-trivial GEP chains
+      if (PtrGEP->getNumIndices() != 1 || GEP->getNumIndices() != 1)
+        return Changed;
+
+      Type *GEPType = GEP->getResultElementType();
+      Type *PtrGEPType = PtrGEP->getResultElementType();
+      // TODO: support reordering for non-trivial GEP chains
+      if ((PtrGEPType != GEPType) ||
+          (PtrGEP->getSourceElementType() != GEP->getSourceElementType()))
+        return Changed;
+
+      bool NestedNeedsExtraction;
+      int64_t NestedByteOffset =
+          accumulateByteOffset(PtrGEP, NestedNeedsExtraction);
+      if (!NestedNeedsExtraction)
+        return Changed;
+
+      unsigned AddrSpace = PtrGEP->getPointerAddressSpace();
+      if (!TTI.isLegalAddressingMode(GEP->getResultElementType(),
+                                     /*BaseGV=*/nullptr, NestedByteOffset,
+                                     /*HasBaseReg=*/true, /*Scale=*/0,
+                                     AddrSpace)) {
+        return Changed;
+      }
+
+      IRBuilder<> Builder(GEP);
+      Builder.SetCurrentDebugLocation(GEP->getDebugLoc());
+      // For trivial GEP chains, we can swap the indicies.
+      auto NewSrc = Builder.CreateGEP(PtrGEPType, PtrGEP->getPointerOperand(),
+                                      SmallVector<Value *, 4>(GEP->indices()));
+      auto NewGEP = Builder.CreateGEP(
+          GEPType, NewSrc, SmallVector<Value *, 4>(PtrGEP->indices()));
+      GEP->replaceAllUsesWith(NewGEP);
+      RecursivelyDeleteTriviallyDeadInstructions(GEP);
+      Changed = true;
+    }
+    return Changed;
+  }
+
   // If LowerGEP is disabled, before really splitting the GEP, check whether the
   // backend supports the addressing mode we are about to produce. If no, this
   // splitting probably won't be beneficial.
diff --git a/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll b/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll
index 75f3b5463c3944b..91f635893aed305 100644
--- a/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll
+++ b/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll
@@ -264,11 +264,11 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v58
 ; CHECK-NEXT:    s_branch .LBB0_7
-; CHECK-NEXT:  .LBB0_16: ; %Flow43
+; CHECK-NEXT:  .LBB0_16: ; %Flow45
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s57
 ; CHECK-NEXT:    v_mov_b32_e32 v57, v0
-; CHECK-NEXT:  .LBB0_17: ; %Flow44
+; CHECK-NEXT:  .LBB0_17: ; %Flow46
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s56
 ; CHECK-NEXT:    s_mov_b32 s55, exec_lo
@@ -311,11 +311,11 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v57
 ; CHECK-NEXT:    s_branch .LBB0_19
-; CHECK-NEXT:  .LBB0_22: ; %Flow41
+; CHECK-NEXT:  .LBB0_22: ; %Flow43
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
 ; CHECK-NEXT:    s_inst_prefetch 0x2
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s56
-; CHECK-NEXT:  .LBB0_23: ; %Flow42
+; CHECK-NEXT:  .LBB0_23: ; %Flow44
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
 ; CHECK-NEXT:  ; %bb.24: ; in Loop: Header=BB0_5 Depth=1
@@ -328,7 +328,7 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_or_b32 s49, s4, s49
 ; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s49
 ; CHECK-NEXT:    s_cbranch_execnz .LBB0_5
-; CHECK-NEXT:  .LBB0_25: ; %Flow49
+; CHECK-NEXT:  .LBB0_25: ; %Flow51
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s48
 ; CHECK-NEXT:    v_mov_b32_e32 v31, v41
 ; CHECK-NEXT:    v_mov_b32_e32 v0, 1
@@ -347,18 +347,16 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_cmpx_gt_u32_e64 v47, v40
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_33
 ; CHECK-NEXT:  ; %bb.26:
-; CHECK-NEXT:    s_add_u32 s52, s44, 8
-; CHECK-NEXT:    s_addc_u32 s53, s45, 0
 ; CHECK-NEXT:    s_getpc_b64 s[42:43]
 ; CHECK-NEXT:    s_add_u32 s42, s42, _Z10atomic_addPU3AS1Vjj@rel32@lo+4
 ; CHECK-NEXT:    s_addc_u32 s43, s43, _Z10atomic_addPU3AS1Vjj@rel32@hi+12
 ; CHECK-NEXT:    s_mov_b32 s54, 0
-; CHECK-NEXT:    s_getpc_b64 s[44:45]
-; CHECK-NEXT:    s_add_u32 s44, s44, _Z10atomic_subPU3AS1Vjj@rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s45, s45, _Z10atomic_subPU3AS1Vjj@rel32@hi+12
 ; CHECK-NEXT:    s_getpc_b64 s[48:49]
-; CHECK-NEXT:    s_add_u32 s48, s48, _Z14get_local_sizej@rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s49, s49, _Z14get_local_sizej@rel32@hi+12
+; CHECK-NEXT:    s_add_u32 s48, s48, _Z10atomic_subPU3AS1Vjj@rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s49, s49, _Z10atomic_subPU3AS1Vjj@rel32@hi+12
+; CHECK-NEXT:    s_getpc_b64 s[52:53]
+; CHECK-NEXT:    s_add_u32 s52, s52, _Z14get_local_sizej@rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s53, s53, _Z14get_local_sizej@rel32@hi+12
 ; CHECK-NEXT:    s_branch .LBB0_28
 ; CHECK-NEXT:  .LBB0_27: ; in Loop: Header=BB0_28 Depth=1
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
@@ -371,7 +369,7 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[48:49]
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[52:53]
 ; CHECK-NEXT:    v_add_co_u32 v40, vcc_lo, v0, v40
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc_lo, v47, v40
 ; CHECK-NEXT:    s_or_b32 s54, vcc_lo, s54
@@ -388,15 +386,15 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_mul_u32_u24_e32 v1, 0x180, v63
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 5, v62
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v4, 5, v72
-; CHECK-NEXT:    v_add_co_u32 v2, s4, s52, v1
-; CHECK-NEXT:    v_add_co_ci_u32_e64 v3, null, s53, 0, s4
+; CHECK-NEXT:    v_add_co_u32 v2, s4, s44, v1
+; CHECK-NEXT:    v_add_co_ci_u32_e64 v3, null, s45, 0, s4
 ; CHECK-NEXT:    v_add_co_u32 v0, vcc_lo, v2, v0
 ; CHECK-NEXT:    v_add_co_ci_u32_e32 v1, vcc_lo, 0, v3, vcc_lo
 ; CHECK-NEXT:    v_add_co_u32 v2, vcc_lo, v2, v4
 ; CHECK-NEXT:    v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
 ; CHECK-NEXT:    s_clause 0x1
-; CHECK-NEXT:    global_load_dwordx4 v[4:7], v[0:1], off
-; CHECK-NEXT:    global_load_dwordx4 v[8:11], v[2:3], off
+; CHECK-NEXT:    global_load_dwordx4 v[4:7], v[0:1], off offset:8
+; CHECK-NEXT:    global_load_dwordx4 v[8:11], v[2:3], off offset:8
 ; CHECK-NEXT:    s_waitcnt vmcnt(0)
 ; CHECK-NEXT:    v_xor_b32_e32 v46, v9, v5
 ; CHECK-NEXT:    v_xor_b32_e32 v45, v8, v4
@@ -408,8 +406,8 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_27
 ; CHECK-NEXT:  ; %bb.29: ; in Loop: Header=BB0_28 Depth=1
 ; CHECK-NEXT:    s_clause 0x1
-; CHECK-NEXT:    global_load_dwordx2 v[58:59], v[2:3], off offset:16
-; CHECK-NEXT:    global_load_dwordx2 v[60:61], v[0:1], off offset:16
+; CHECK-NEXT:    global_load_dwordx2 v[58:59], v[2:3], off offset:24
+; CHECK-NEXT:    global_load_dwordx2 v[60:61], v[0:1], off offset:24
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 4, v45
 ; CHECK-NEXT:    v_alignbit_b32 v1, v46, v45, 12
 ; CHECK-NEXT:    v_and_b32_e32 v2, 0xf0000, v45
@@ -484,7 +482,7 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[44:45]
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[48:49]
 ; CHECK-NEXT:    s_branch .LBB0_27
 ; CHECK-NEXT:  .LBB0_33:
 ; CHECK-NEXT:    s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll b/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll
index fbe0b156cd9baa0..3186eedbbc0f523 100644
--- a/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll
+++ b/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll
@@ -31,205 +31,188 @@ define amdgpu_gs void @_amdgpu_gs_main(i32 inreg %primShaderTableAddrLow, <31 x
   ; CHECK-NEXT:   [[COPY13:%[0-9]+]]:sgpr_32 = COPY $sgpr10
   ; CHECK-NEXT:   [[COPY14:%[0-9]+]]:sgpr_32 = COPY $sgpr8
   ; CHECK-NEXT:   undef [[S_LOAD_DWORDX2_IMM:%[0-9]+]].sub0_sub1:sgpr_128 = S_LOAD_DWORDX2_IMM [[COPY]], 232, 0 :: (invariant load (s64) from %ir.39, addrspace 4)
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM undef %125:sgpr_128, 0, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   KILL undef %125:sgpr_128
   ; CHECK-NEXT:   [[S_LSHL_B32_:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY5]], 4, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_LSHL_B32_1:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY4]], 4, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_LSHL_B32_2:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY3]], 4, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ASHR_I32_:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_]], 31, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ASHR_I32_1:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_1]], 31, implicit-def dead $scc
-  ; CHECK-NEXT:   [[S_ASHR_I32_2:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_2]], 31, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM:%[0-9]+]].sub1:sgpr_128 = S_AND_B32 [[S_LOAD_DWORDX2_IMM]].sub1, 65535, implicit-def dead $scc
+  ; CHECK-NEXT:   [[S_ASHR_I32_2:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_2]], 31, implicit-def dead $scc
+  ; CHECK-NEXT:   [[S_SUB_I32_:%[0-9]+]]:sreg_32 = S_SUB_I32 [[S_BUFFER_LOAD_DWORD_IMM]], 29, implicit-def dead $scc
+  ; CHECK-NEXT:   [[S_SUB_I32_1:%[0-9]+]]:sreg_32 = S_SUB_I32 [[S_BUFFER_LOAD_DWORD_IMM]], 30, implicit-def dead $scc
   ; CHECK-NEXT:   undef [[S_ADD_U32_:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_2]], implicit-def $scc
   ; CHECK-NEXT:   [[S_ADD_U32_:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
   ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_]], 16, 0 :: (invariant load (s128) from %ir.81, addrspace 4)
   ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM1:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM undef %74:sreg_64, 0, 0 :: (invariant load (s128) from `ptr addrspace(4) undef`, addrspace 4)
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM undef %132:sgpr_128, 0, 0 :: (dereferenceable invariant load (s32))
   ; CHECK-NEXT:   KILL undef %74:sreg_64
-  ; CHECK-NEXT:   KILL undef %132:sgpr_128
   ; CHECK-NEXT:   KILL [[S_ADD_U32_]].sub0, [[S_ADD_U32_]].sub1
   ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM [[S_LOAD_DWORDX4_IMM]], 0, 0 :: (dereferenceable invariant load (s32))
   ; CHECK-NEXT:   [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
   ; CHECK-NEXT:   undef [[S_MOV_B32_:%[0-9]+]].sub1:sgpr_128 = S_MOV_B32 0
+  ; CHECK-NEXT:   [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET undef %118:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], undef %89:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN1:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   KILL undef %89:sgpr_128
-  ; CHECK-NEXT:   [[S_SUB_I32_:%[0-9]+]]:sreg_32 = S_SUB_I32 [[S_BUFFER_LOAD_DWORD_IMM]], 29, implicit-def dead $scc
-  ; CHECK-NEXT:   [[S_SUB_I32_1:%[0-9]+]]:sreg_32 = S_SUB_I32 [[S_BUFFER_LOAD_DWORD_IMM]], 30, implicit-def dead $scc
+  ; CHECK-NEXT:   KILL undef %118:sgpr_128
   ; CHECK-NEXT:   [[S_SUB_I32_2:%[0-9]+]]:sreg_32 = S_SUB_I32 [[S_BUFFER_LOAD_DWORD_IMM1]], 31, implicit-def dead $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_1:%[0-9]+]]:sreg_32 = S_ADD_U32 [[COPY6]], 64, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADDC_U32_:%[0-9]+]]:sreg_32 = S_ADDC_U32 undef %54:sreg_32, 0, implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_2:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_1]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_2:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_]], [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_3:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_1]], [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM2:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_2]], 0, 0 :: (invariant load (s128) from %ir.87, addrspace 4)
-  ; CHECK-NEXT:   [[S_ADD_U32_3:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_]], [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_4:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_1]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_4:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_]], [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_ASHR_I32_3:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 undef %171:sreg_32, 31, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_5:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_1]], undef %171:sreg_32, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_5:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_]], [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_6:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_6:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_7:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_7:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_8:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, undef %171:sreg_32, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_8:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_9:%[0-9]+]]:sreg_32 = S_ADD_U32 [[COPY7]].sub0, 224, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADDC_U32_1:%[0-9]+]]:sreg_32 = S_ADDC_U32 undef %51:sreg_32, 0, implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_10:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_9]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_10:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_1]], [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_11:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_9]], [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_11:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_1]], [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_12:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_9]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_12:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_1]], [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_13:%[0-9]+]]:sreg_32 = S_ADD_U32 [[COPY7]].sub0, 576, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADDC_U32_2:%[0-9]+]]:sreg_32 = S_ADDC_U32 undef %51:sreg_32, 0, implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_14:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_13]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_14:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_2]], [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_15:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_13]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_15:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_2]], [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_16:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[S_ADD_U32_13]], undef %171:sreg_32, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_16:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 [[S_ADDC_U32_2]], [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_17:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY8]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_17:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %48:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_18:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY9]], [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_18:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %45:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_19:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY9]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_19:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %45:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_1:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_1:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_2:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_1]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_2:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_3:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_2]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM2:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_1]], 64, 0 :: (invariant load (s128) from %ir.87, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM3:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_2]], 64, 0 :: (invariant load (s128) from %ir.93, addrspace 4)
+  ; CHECK-NEXT:   KILL [[S_ADD_U32_1]].sub0, [[S_ADD_U32_1]].sub1
+  ; CHECK-NEXT:   KILL [[S_ADD_U32_2]].sub0, [[S_ADD_U32_2]].sub1
+  ; CHECK-NEXT:   [[S_ADD_U32_3:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_ASHR_I32_3:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 undef %169:sreg_32, 31, implicit-def dead $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_4:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], undef %169:sreg_32, implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_4:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_5:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_5:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_6:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_1]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_6:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:  ...
[truncated]

arsenm

I'm somewhat confused by the description, since this is what I thought the point of the pass was and it already did. What exactly is special about this particular case?

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

jrbyrnes · 2023-12-07T18:04:19Z

What exactly is special about this particular case?

When analyzing the GEP operands for constants, this pass only supports certain Value classes: {ConstantInt, BinaryOperator, TruncInst, SExtInst, ZExtInst}. In this case, the operand is a GetElementPtrInst, so the pass does not attempt to reorganize the GEP in such a way that the constant is the outtermost index.

I think it is feasible to extend the set of supported Value classes to include GEPs, but that requires a more significant rework. This feature can bridge the gap until then.

krzysz00

Looks good to me!

arsenm · 2024-01-03T05:46:03Z

Can you add an alive2 link that shows the inbounds preserve is OK?

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

arsenm · 2024-02-06T08:09:56Z

reverse ping

jrbyrnes · 2024-02-07T18:05:15Z

So it appears that the inbounds preservation is not correct.

Preserved:
https://alive2.llvm.org/ce/z/fdH9f2

Stripped:
https://alive2.llvm.org/ce/z/dJwv-W

Instead of stripping the inbounds, I've disabled the optimization for inbounds GEPs

github-actions · 2024-02-07T18:05:53Z

✅ With the latest revision this PR passed the C/C++ code formatter.

krzysz00 · 2024-02-07T18:58:50Z

... hold on, from looking at that Alive2, I think you can be a bit more aggressive than just not applying this swap to inbounds GEPs (or stripping the inbounds - do we know which of these is more likely to lead to good codegen?).

Specifically, I claim that, if you can prove that the constant and variable indices are non-negative, you should be able to reassociate the inbounds GEPs soundly.

See the somewhat ham-fisted https://alive2.llvm.org/ce/z/CRkgEN

jrbyrnes · 2024-02-08T19:02:00Z

Specifically, I claim that, if you can prove that the constant and variable indices are non-negative, you should be able to reassociate the inbounds GEPs soundly.

Thanks @krzysz00 for pointing that out -- I agree with that claim and have added to the most recent version

do we know which of these is more likely to lead to good codegen

I'm not quite sure; I think I'd have to analyze empirically.

My initial thought was to disable the reordering if we lose inbounds to avoid introducing regressions, however, most of the optimizations that depend on typically occur earlier in the pipeline.

Change-Id: Ia3d536df43067a796f4c9104a8a13ca0f45fc8fa

krzysz00 · 2024-02-13T18:31:16Z

Do you need someone to land this or?

jrbyrnes · 2024-02-13T19:22:34Z

Do you need someone to land this or?

I was just waiting to see if there were any objections with the newest version of the changes -- based on this, I'll assume not. Thanks for the review!

jrbyrnes · 2024-02-13T19:50:58Z

I guess some unaccounted for change had come along to effect the behavior of NVPTX/lower-gep-reorder.ll

I've resolved the check-llvm fails via ec0aa16

nico · 2024-02-13T21:00:06Z

Looks like this still breaks tests: http://45.33.8.238/linux/130717/step_12.txt

This is after ec0aa16.

Should we revert for now?

krzysz00 · 2024-02-13T21:02:26Z

Might be good to revert and make sure there's a local test run with all targets enabled at least?

…parate constants (#73056)" and follow ups "ninja check-llvm" is failing on tip of tree. This reverts commit ec0aa16. This reverts commit 1b65742.

…parate constants (#81671) Actually update tests w.r.t 9e5a77f and reland #73056

AaronHLiu · 2024-02-15T19:37:23Z

Looks like there are some LIT failures need to be addressed:

FAIL: LLVM :: Transforms/SeparateConstOffsetFromGEP/NVPTX/lower-gep-reorder.ll (68679 of 78188)
******************** TEST 'LLVM :: Transforms/SeparateConstOffsetFromGEP/NVPTX/lower-gep-reorder.ll' FAILED ********************
Exit Code: 1
...

krzysz00 · 2024-02-15T19:40:46Z

... that got handled in the re-land?

jrbyrnes · 2024-02-15T19:45:22Z

@AaronHLiu this PR was known to cause that exact LIT failure -- this was why it was reverted upstream -- does your build have the revert?

The reland should be okay; in you deleted comment there was a CHECK line flagged that existed in that bad version of the LIT test and not the reland version (reland: #81671) .

AaronHLiu · 2024-02-15T19:51:19Z

@AaronHLiu this PR was known to cause that exact LIT failure -- this was why it was reverted upstream -- does your build have the revert?

The reland should be okay; in you deleted comment there was a CHECK line flagged that existed in that bad version of the LIT test and not the reland version (reland: #81671) .

The build did not contain the new version yet. Will check. Thanks!

…parate constants (llvm#81671) Actually update tests w.r.t llvm@9e5a77f and reland llvm#73056

jrbyrnes requested review from arsenm, nikic and krzysz00 November 22, 2023 00:23

llvmbot added backend:AMDGPU llvm:transforms labels Nov 22, 2023

arsenm reviewed Nov 22, 2023

View reviewed changes

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp Outdated Show resolved Hide resolved

jrbyrnes force-pushed the ReorderGEP branch 2 times, most recently from 330df8e to 3d0f306 Compare December 12, 2023 22:28

krzysz00 approved these changes Jan 2, 2024

View reviewed changes

arsenm reviewed Jan 17, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp Outdated Show resolved Hide resolved

jrbyrnes force-pushed the ReorderGEP branch from 3d0f306 to d7feea7 Compare February 7, 2024 18:03

jrbyrnes force-pushed the ReorderGEP branch from d7feea7 to 30f4ac1 Compare February 7, 2024 18:06

jrbyrnes force-pushed the ReorderGEP branch from 30f4ac1 to 9683c1e Compare February 8, 2024 18:59

[SepGEP] Reorder trivial GEP chains to separate constants

9aa71f0

Change-Id: Ia3d536df43067a796f4c9104a8a13ca0f45fc8fa

jrbyrnes force-pushed the ReorderGEP branch from 9683c1e to 9aa71f0 Compare February 8, 2024 19:20

jrbyrnes merged commit 1b65742 into llvm:main Feb 13, 2024
3 of 4 checks passed

jrbyrnes mentioned this pull request Feb 13, 2024

[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to separate constants #81671

Merged

jrbyrnes added a commit that referenced this pull request Feb 14, 2024

[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to se…

7180c23

…parate constants (#81671) Actually update tests w.r.t 9e5a77f and reland #73056

jrbyrnes added a commit to jrbyrnes/llvm-project that referenced this pull request Apr 26, 2024

[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to se…

652a526

…parate constants (llvm#81671) Actually update tests w.r.t llvm@9e5a77f and reland llvm#73056

jrbyrnes added a commit to jrbyrnes/llvm-project that referenced this pull request May 7, 2024

[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to se…

014d840

…parate constants (llvm#81671) Actually update tests w.r.t llvm@9e5a77f and reland llvm#73056

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants #73056

[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants #73056

jrbyrnes commented Nov 22, 2023

llvmbot commented Nov 22, 2023 •

edited

arsenm left a comment

jrbyrnes commented Dec 7, 2023 •

edited

krzysz00 left a comment

arsenm commented Jan 3, 2024

arsenm commented Feb 6, 2024

jrbyrnes commented Feb 7, 2024

github-actions bot commented Feb 7, 2024 •

edited

krzysz00 commented Feb 7, 2024

jrbyrnes commented Feb 8, 2024

krzysz00 commented Feb 13, 2024

jrbyrnes commented Feb 13, 2024

jrbyrnes commented Feb 13, 2024

nico commented Feb 13, 2024

krzysz00 commented Feb 13, 2024

AaronHLiu commented Feb 15, 2024 •

edited

krzysz00 commented Feb 15, 2024

jrbyrnes commented Feb 15, 2024 •

edited

AaronHLiu commented Feb 15, 2024

[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants #73056

[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants #73056

Conversation

jrbyrnes commented Nov 22, 2023

llvmbot commented Nov 22, 2023 • edited

arsenm left a comment

Choose a reason for hiding this comment

jrbyrnes commented Dec 7, 2023 • edited

krzysz00 left a comment

Choose a reason for hiding this comment

arsenm commented Jan 3, 2024

arsenm commented Feb 6, 2024

jrbyrnes commented Feb 7, 2024

github-actions bot commented Feb 7, 2024 • edited

krzysz00 commented Feb 7, 2024

jrbyrnes commented Feb 8, 2024

krzysz00 commented Feb 13, 2024

jrbyrnes commented Feb 13, 2024

jrbyrnes commented Feb 13, 2024

nico commented Feb 13, 2024

krzysz00 commented Feb 13, 2024

AaronHLiu commented Feb 15, 2024 • edited

krzysz00 commented Feb 15, 2024

jrbyrnes commented Feb 15, 2024 • edited

AaronHLiu commented Feb 15, 2024

llvmbot commented Nov 22, 2023 •

edited

jrbyrnes commented Dec 7, 2023 •

edited

github-actions bot commented Feb 7, 2024 •

edited

AaronHLiu commented Feb 15, 2024 •

edited

jrbyrnes commented Feb 15, 2024 •

edited