Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ValueTracking] Allow tracking values through Integral AddrSpaceCasts for improved alignments #70483

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jrbyrnes
Copy link
Contributor

@jrbyrnes jrbyrnes commented Oct 27, 2023

Provide capability to compute known bits through AddrSpaceCasts to support improved reasoning about alignments.

@jrbyrnes jrbyrnes requested a review from arsenm October 27, 2023 17:28
@jrbyrnes
Copy link
Contributor Author

@arsenm Is this what you had in mind instead of #69776?

@nikic
Copy link
Contributor

nikic commented Oct 27, 2023

(Hard reject on TTI dependency in ValueTracking. You can solve this either via changes to LangRef, or by encoding necessary information in DataLayout.)

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/include/llvm/Analysis/TargetTransformInfo.h Outdated Show resolved Hide resolved
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h Outdated Show resolved Hide resolved
@jrbyrnes jrbyrnes force-pushed the CKBAddrSpaceRebase0 branch 2 times, most recently from 4633db9 to 0fd4d0e Compare October 31, 2023 23:39
@jrbyrnes jrbyrnes changed the title [ValueTracking] NFC: Allow tracking values through AddrSpaceCasts [ValueTracking] Allow tracking values through Integral AddrSpaceCasts Oct 31, 2023
@jrbyrnes
Copy link
Contributor Author

@arsenm Does this look more like you expected? If so, I can work on converting out of draft (test & code cleanup).

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 1, 2023

@llvm/pr-subscribers-backend-spir-v
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-backend-directx
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Jeffrey Byrnes (jrbyrnes)

Changes

Provide capability to compute known bits through AddrSpaceCasts via target queries. This is mostly useful when trying to determine alignments.


Patch is 69.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70483.diff

7 Files Affected:

  • (modified) llvm/include/llvm/Support/KnownBits.h (+8)
  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+28)
  • (modified) llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll (+80-82)
  • (added) llvm/test/Transforms/InferAlignment/addrspacecast_amdgpu.ll (+523)
  • (added) llvm/test/Transforms/InferAlignment/addrspacecast_x86.ll (+394)
  • (modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+2-2)
  • (added) llvm/test/Transforms/PhaseOrdering/AMDGPU/align-before-vectorize.ll (+46)
diff --git a/llvm/include/llvm/Support/KnownBits.h b/llvm/include/llvm/Support/KnownBits.h
index fb034e0b9e3baff..545a694f612fa36 100644
--- a/llvm/include/llvm/Support/KnownBits.h
+++ b/llvm/include/llvm/Support/KnownBits.h
@@ -148,6 +148,14 @@ struct KnownBits {
     return Max;
   }
 
+  KnownBits clearLowBits(unsigned BitWidth) {
+    APInt NewZero = Zero;
+    APInt NewOne = One;
+    NewZero.clearLowBits(BitWidth);
+    NewOne.clearLowBits(BitWidth);
+    return KnownBits(NewZero, NewOne);
+  }
+
   /// Return known bits for a truncation of the value we're tracking.
   KnownBits trunc(unsigned BitWidth) const {
     return KnownBits(Zero.trunc(BitWidth), One.trunc(BitWidth));
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index c303d261107eb19..db0f2e7203841e9 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1775,6 +1775,34 @@ static void computeKnownBitsFromOperator(const Operator *I,
                                   Depth + 1))
       computeKnownBits(I->getOperand(0), Known, Depth + 1, Q);
     break;
+  case Instruction::AddrSpaceCast: {
+    auto ASC = cast<AddrSpaceCastOperator>(I);
+    unsigned SrcAS = ASC->getSrcAddressSpace();
+    unsigned DestAS = ASC->getDestAddressSpace();
+
+    auto DL = Q.DL;
+    if (DL.isNonIntegralAddressSpace(SrcAS) ||
+        DL.isNonIntegralAddressSpace(DestAS))
+      break;
+
+    auto SrcSize = DL.getPointerSizeInBits(SrcAS);
+    auto DstSize = DL.getPointerSizeInBits(DestAS);
+
+    if (DstSize > SrcSize) {
+      Known2 = Known;
+      Known2 = Known2.clearLowBits(SrcSize);
+      Known = Known.trunc(SrcSize);
+      computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+      Known = Known.anyext(DstSize);
+      Known = Known.unionWith(Known2);
+    }
+
+    else { // DstSize <= SrcSize
+      Known = Known.anyext(SrcSize);
+      computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+      Known = Known.trunc(DstSize);
+    }
+  }
   }
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll b/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll
index 0e6c1aecb6774a3..73dbb4b72f03f5b 100644
--- a/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll
+++ b/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll
@@ -9,70 +9,68 @@ define void @issue63986(i64 %0, i64 %idxprom) {
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:    v_lshlrev_b64 v[4:5], 6, v[2:3]
 ; CHECK-NEXT:    s_mov_b64 s[4:5], 0
-; CHECK-NEXT:  ; %bb.1: ; %loop-memcpy-expansion.preheader
-; CHECK-NEXT:    v_lshlrev_b64 v[6:7], 6, v[2:3]
 ; CHECK-NEXT:    s_mov_b64 s[6:7], 0
-; CHECK-NEXT:  .LBB0_2: ; %loop-memcpy-expansion
+; CHECK-NEXT:  .LBB0_1: ; %loop-memcpy-expansion
 ; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    v_mov_b32_e32 v9, s7
-; CHECK-NEXT:    v_mov_b32_e32 v8, s6
-; CHECK-NEXT:    flat_load_ubyte v10, v[8:9] offset:5
-; CHECK-NEXT:    flat_load_ubyte v11, v[8:9] offset:6
-; CHECK-NEXT:    flat_load_ubyte v12, v[8:9] offset:7
-; CHECK-NEXT:    flat_load_ubyte v13, v[8:9] offset:3
-; CHECK-NEXT:    flat_load_ubyte v14, v[8:9] offset:2
-; CHECK-NEXT:    flat_load_ubyte v15, v[8:9] offset:1
-; CHECK-NEXT:    flat_load_ubyte v16, v[8:9]
-; CHECK-NEXT:    flat_load_ubyte v17, v[8:9] offset:4
-; CHECK-NEXT:    flat_load_ubyte v18, v[8:9] offset:13
-; CHECK-NEXT:    flat_load_ubyte v19, v[8:9] offset:14
-; CHECK-NEXT:    flat_load_ubyte v20, v[8:9] offset:15
-; CHECK-NEXT:    flat_load_ubyte v21, v[8:9] offset:11
-; CHECK-NEXT:    flat_load_ubyte v22, v[8:9] offset:10
-; CHECK-NEXT:    flat_load_ubyte v23, v[8:9] offset:9
-; CHECK-NEXT:    flat_load_ubyte v24, v[8:9] offset:8
-; CHECK-NEXT:    flat_load_ubyte v25, v[8:9] offset:12
+; CHECK-NEXT:    v_mov_b32_e32 v6, s6
+; CHECK-NEXT:    v_mov_b32_e32 v7, s7
+; CHECK-NEXT:    flat_load_ubyte v8, v[6:7] offset:5
+; CHECK-NEXT:    flat_load_ubyte v9, v[6:7] offset:6
+; CHECK-NEXT:    flat_load_ubyte v10, v[6:7] offset:7
+; CHECK-NEXT:    flat_load_ubyte v11, v[6:7] offset:3
+; CHECK-NEXT:    flat_load_ubyte v12, v[6:7] offset:2
+; CHECK-NEXT:    flat_load_ubyte v13, v[6:7] offset:1
+; CHECK-NEXT:    flat_load_ubyte v14, v[6:7]
+; CHECK-NEXT:    flat_load_ubyte v15, v[6:7] offset:4
+; CHECK-NEXT:    flat_load_ubyte v16, v[6:7] offset:13
+; CHECK-NEXT:    flat_load_ubyte v17, v[6:7] offset:14
+; CHECK-NEXT:    flat_load_ubyte v18, v[6:7] offset:15
+; CHECK-NEXT:    flat_load_ubyte v19, v[6:7] offset:11
+; CHECK-NEXT:    flat_load_ubyte v20, v[6:7] offset:10
+; CHECK-NEXT:    flat_load_ubyte v21, v[6:7] offset:9
+; CHECK-NEXT:    flat_load_ubyte v22, v[6:7] offset:8
+; CHECK-NEXT:    flat_load_ubyte v23, v[6:7] offset:12
 ; CHECK-NEXT:    s_add_u32 s4, s4, 1
 ; CHECK-NEXT:    s_addc_u32 s5, s5, 0
-; CHECK-NEXT:    v_add_co_u32_e32 v8, vcc, s6, v6
+; CHECK-NEXT:    v_add_co_u32_e32 v6, vcc, s6, v4
 ; CHECK-NEXT:    v_cmp_ge_u64_e64 s[8:9], s[4:5], 2
-; CHECK-NEXT:    v_addc_co_u32_e32 v9, vcc, v9, v7, vcc
+; CHECK-NEXT:    v_addc_co_u32_e32 v7, vcc, v7, v5, vcc
 ; CHECK-NEXT:    s_add_u32 s6, s6, 16
 ; CHECK-NEXT:    s_addc_u32 s7, s7, 0
 ; CHECK-NEXT:    s_and_b64 vcc, exec, s[8:9]
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    flat_store_byte v[8:9], v13 offset:3
-; CHECK-NEXT:    flat_store_byte v[8:9], v14 offset:2
-; CHECK-NEXT:    flat_store_byte v[8:9], v15 offset:1
-; CHECK-NEXT:    flat_store_byte v[8:9], v16
-; CHECK-NEXT:    flat_store_byte v[8:9], v12 offset:7
-; CHECK-NEXT:    flat_store_byte v[8:9], v11 offset:6
-; CHECK-NEXT:    flat_store_byte v[8:9], v10 offset:5
-; CHECK-NEXT:    flat_store_byte v[8:9], v17 offset:4
-; CHECK-NEXT:    flat_store_byte v[8:9], v21 offset:11
-; CHECK-NEXT:    flat_store_byte v[8:9], v22 offset:10
-; CHECK-NEXT:    flat_store_byte v[8:9], v23 offset:9
-; CHECK-NEXT:    flat_store_byte v[8:9], v24 offset:8
-; CHECK-NEXT:    flat_store_byte v[8:9], v20 offset:15
-; CHECK-NEXT:    flat_store_byte v[8:9], v19 offset:14
-; CHECK-NEXT:    flat_store_byte v[8:9], v18 offset:13
-; CHECK-NEXT:    flat_store_byte v[8:9], v25 offset:12
-; CHECK-NEXT:    s_cbranch_vccz .LBB0_2
-; CHECK-NEXT:  ; %bb.3: ; %loop-memcpy-residual-header
+; CHECK-NEXT:    flat_store_byte v[6:7], v11 offset:3
+; CHECK-NEXT:    flat_store_byte v[6:7], v12 offset:2
+; CHECK-NEXT:    flat_store_byte v[6:7], v13 offset:1
+; CHECK-NEXT:    flat_store_byte v[6:7], v14
+; CHECK-NEXT:    flat_store_byte v[6:7], v10 offset:7
+; CHECK-NEXT:    flat_store_byte v[6:7], v9 offset:6
+; CHECK-NEXT:    flat_store_byte v[6:7], v8 offset:5
+; CHECK-NEXT:    flat_store_byte v[6:7], v15 offset:4
+; CHECK-NEXT:    flat_store_byte v[6:7], v19 offset:11
+; CHECK-NEXT:    flat_store_byte v[6:7], v20 offset:10
+; CHECK-NEXT:    flat_store_byte v[6:7], v21 offset:9
+; CHECK-NEXT:    flat_store_byte v[6:7], v22 offset:8
+; CHECK-NEXT:    flat_store_byte v[6:7], v18 offset:15
+; CHECK-NEXT:    flat_store_byte v[6:7], v17 offset:14
+; CHECK-NEXT:    flat_store_byte v[6:7], v16 offset:13
+; CHECK-NEXT:    flat_store_byte v[6:7], v23 offset:12
+; CHECK-NEXT:    s_cbranch_vccz .LBB0_1
+; CHECK-NEXT:  ; %bb.2: ; %loop-memcpy-residual-header
 ; CHECK-NEXT:    s_and_b32 s4, 32, 15
 ; CHECK-NEXT:    s_mov_b32 s5, 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB0_5
-; CHECK-NEXT:  ; %bb.4:
+; CHECK-NEXT:    s_cbranch_scc0 .LBB0_4
+; CHECK-NEXT:  ; %bb.3:
 ; CHECK-NEXT:    ; implicit-def: $vgpr2_vgpr3
-; CHECK-NEXT:    s_branch .LBB0_6
-; CHECK-NEXT:  .LBB0_5: ; %loop-memcpy-residual-header.post-loop-memcpy-expansion_crit_edge
+; CHECK-NEXT:    s_branch .LBB0_5
+; CHECK-NEXT:  .LBB0_4: ; %loop-memcpy-residual-header.post-loop-memcpy-expansion_crit_edge
 ; CHECK-NEXT:    v_lshlrev_b64 v[2:3], 6, v[2:3]
-; CHECK-NEXT:    s_cbranch_execnz .LBB0_9
-; CHECK-NEXT:  .LBB0_6: ; %loop-memcpy-residual.preheader
+; CHECK-NEXT:    s_cbranch_execnz .LBB0_8
+; CHECK-NEXT:  .LBB0_5: ; %loop-memcpy-residual.preheader
 ; CHECK-NEXT:    v_or_b32_e32 v2, 32, v4
 ; CHECK-NEXT:    v_mov_b32_e32 v3, v5
 ; CHECK-NEXT:    s_mov_b64 s[6:7], 0
-; CHECK-NEXT:  .LBB0_7: ; %loop-memcpy-residual
+; CHECK-NEXT:  .LBB0_6: ; %loop-memcpy-residual
 ; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    s_add_u32 s8, 32, s6
 ; CHECK-NEXT:    s_addc_u32 s9, 0, s7
@@ -85,15 +83,15 @@ define void @issue63986(i64 %0, i64 %idxprom) {
 ; CHECK-NEXT:    s_add_u32 s6, s6, 1
 ; CHECK-NEXT:    v_mov_b32_e32 v6, s4
 ; CHECK-NEXT:    v_addc_co_u32_e32 v9, vcc, v3, v9, vcc
-; CHECK-NEXT:    s_addc_u32 s7, s7, 0
+; CHECK-NEXT:    s_addc_u32 s7, 0, s7
 ; CHECK-NEXT:    v_cmp_lt_u64_e32 vcc, s[6:7], v[6:7]
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:    flat_store_byte v[8:9], v10
-; CHECK-NEXT:    s_cbranch_vccnz .LBB0_7
-; CHECK-NEXT:  ; %bb.8:
+; CHECK-NEXT:    s_cbranch_vccnz .LBB0_6
+; CHECK-NEXT:  ; %bb.7:
 ; CHECK-NEXT:    v_mov_b32_e32 v2, v4
 ; CHECK-NEXT:    v_mov_b32_e32 v3, v5
-; CHECK-NEXT:  .LBB0_9: ; %post-loop-memcpy-expansion
+; CHECK-NEXT:  .LBB0_8: ; %post-loop-memcpy-expansion
 ; CHECK-NEXT:    v_lshrrev_b64 v[4:5], 4, v[0:1]
 ; CHECK-NEXT:    v_and_b32_e32 v6, 15, v0
 ; CHECK-NEXT:    v_mov_b32_e32 v7, 0
@@ -102,28 +100,28 @@ define void @issue63986(i64 %0, i64 %idxprom) {
 ; CHECK-NEXT:    v_cmp_ne_u64_e64 s[6:7], 0, v[6:7]
 ; CHECK-NEXT:    v_add_co_u32_e32 v8, vcc, v2, v0
 ; CHECK-NEXT:    v_addc_co_u32_e32 v9, vcc, v3, v1, vcc
-; CHECK-NEXT:    s_branch .LBB0_12
-; CHECK-NEXT:  .LBB0_10: ; %Flow19
-; CHECK-NEXT:    ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT:    s_branch .LBB0_11
+; CHECK-NEXT:  .LBB0_9: ; %Flow19
+; CHECK-NEXT:    ; in Loop: Header=BB0_11 Depth=1
 ; CHECK-NEXT:    s_or_b64 exec, exec, s[10:11]
 ; CHECK-NEXT:    s_mov_b64 s[8:9], 0
-; CHECK-NEXT:  .LBB0_11: ; %Flow21
-; CHECK-NEXT:    ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT:  .LBB0_10: ; %Flow21
+; CHECK-NEXT:    ; in Loop: Header=BB0_11 Depth=1
 ; CHECK-NEXT:    s_andn2_b64 vcc, exec, s[8:9]
-; CHECK-NEXT:    s_cbranch_vccz .LBB0_20
-; CHECK-NEXT:  .LBB0_12: ; %while.cond
+; CHECK-NEXT:    s_cbranch_vccz .LBB0_19
+; CHECK-NEXT:  .LBB0_11: ; %while.cond
 ; CHECK-NEXT:    ; =>This Loop Header: Depth=1
-; CHECK-NEXT:    ; Child Loop BB0_14 Depth 2
-; CHECK-NEXT:    ; Child Loop BB0_18 Depth 2
+; CHECK-NEXT:    ; Child Loop BB0_13 Depth 2
+; CHECK-NEXT:    ; Child Loop BB0_17 Depth 2
 ; CHECK-NEXT:    s_and_saveexec_b64 s[8:9], s[4:5]
-; CHECK-NEXT:    s_cbranch_execz .LBB0_15
-; CHECK-NEXT:  ; %bb.13: ; %loop-memcpy-expansion2.preheader
-; CHECK-NEXT:    ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT:    s_cbranch_execz .LBB0_14
+; CHECK-NEXT:  ; %bb.12: ; %loop-memcpy-expansion2.preheader
+; CHECK-NEXT:    ; in Loop: Header=BB0_11 Depth=1
 ; CHECK-NEXT:    s_mov_b64 s[10:11], 0
 ; CHECK-NEXT:    s_mov_b64 s[12:13], 0
 ; CHECK-NEXT:    s_mov_b64 s[14:15], 0
-; CHECK-NEXT:  .LBB0_14: ; %loop-memcpy-expansion2
-; CHECK-NEXT:    ; Parent Loop BB0_12 Depth=1
+; CHECK-NEXT:  .LBB0_13: ; %loop-memcpy-expansion2
+; CHECK-NEXT:    ; Parent Loop BB0_11 Depth=1
 ; CHECK-NEXT:    ; => This Inner Loop Header: Depth=2
 ; CHECK-NEXT:    v_mov_b32_e32 v10, s10
 ; CHECK-NEXT:    v_mov_b32_e32 v11, s11
@@ -169,23 +167,23 @@ define void @issue63986(i64 %0, i64 %idxprom) {
 ; CHECK-NEXT:    flat_store_byte v[10:11], v20 offset:13
 ; CHECK-NEXT:    flat_store_byte v[10:11], v27 offset:12
 ; CHECK-NEXT:    s_andn2_b64 exec, exec, s[12:13]
-; CHECK-NEXT:    s_cbranch_execnz .LBB0_14
-; CHECK-NEXT:  .LBB0_15: ; %Flow20
-; CHECK-NEXT:    ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT:    s_cbranch_execnz .LBB0_13
+; CHECK-NEXT:  .LBB0_14: ; %Flow20
+; CHECK-NEXT:    ; in Loop: Header=BB0_11 Depth=1
 ; CHECK-NEXT:    s_or_b64 exec, exec, s[8:9]
 ; CHECK-NEXT:    s_mov_b64 s[8:9], -1
-; CHECK-NEXT:    s_cbranch_execz .LBB0_11
-; CHECK-NEXT:  ; %bb.16: ; %loop-memcpy-residual-header5
-; CHECK-NEXT:    ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT:    s_cbranch_execz .LBB0_10
+; CHECK-NEXT:  ; %bb.15: ; %loop-memcpy-residual-header5
+; CHECK-NEXT:    ; in Loop: Header=BB0_11 Depth=1
 ; CHECK-NEXT:    s_and_saveexec_b64 s[8:9], s[6:7]
 ; CHECK-NEXT:    s_xor_b64 s[10:11], exec, s[8:9]
-; CHECK-NEXT:    s_cbranch_execz .LBB0_10
-; CHECK-NEXT:  ; %bb.17: ; %loop-memcpy-residual4.preheader
-; CHECK-NEXT:    ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT:    s_cbranch_execz .LBB0_9
+; CHECK-NEXT:  ; %bb.16: ; %loop-memcpy-residual4.preheader
+; CHECK-NEXT:    ; in Loop: Header=BB0_11 Depth=1
 ; CHECK-NEXT:    s_mov_b64 s[12:13], 0
 ; CHECK-NEXT:    s_mov_b64 s[14:15], 0
-; CHECK-NEXT:  .LBB0_18: ; %loop-memcpy-residual4
-; CHECK-NEXT:    ; Parent Loop BB0_12 Depth=1
+; CHECK-NEXT:  .LBB0_17: ; %loop-memcpy-residual4
+; CHECK-NEXT:    ; Parent Loop BB0_11 Depth=1
 ; CHECK-NEXT:    ; => This Inner Loop Header: Depth=2
 ; CHECK-NEXT:    v_mov_b32_e32 v12, s15
 ; CHECK-NEXT:    v_add_co_u32_e32 v10, vcc, s14, v0
@@ -200,12 +198,12 @@ define void @issue63986(i64 %0, i64 %idxprom) {
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:    flat_store_byte v[10:11], v13
 ; CHECK-NEXT:    s_andn2_b64 exec, exec, s[12:13]
-; CHECK-NEXT:    s_cbranch_execnz .LBB0_18
-; CHECK-NEXT:  ; %bb.19: ; %Flow
-; CHECK-NEXT:    ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT:    s_cbranch_execnz .LBB0_17
+; CHECK-NEXT:  ; %bb.18: ; %Flow
+; CHECK-NEXT:    ; in Loop: Header=BB0_11 Depth=1
 ; CHECK-NEXT:    s_or_b64 exec, exec, s[12:13]
-; CHECK-NEXT:    s_branch .LBB0_10
-; CHECK-NEXT:  .LBB0_20: ; %DummyReturnBlock
+; CHECK-NEXT:    s_branch .LBB0_9
+; CHECK-NEXT:  .LBB0_19: ; %DummyReturnBlock
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:    s_setpc_b64 s[30:31]
 entry:
diff --git a/llvm/test/Transforms/InferAlignment/addrspacecast_amdgpu.ll b/llvm/test/Transforms/InferAlignment/addrspacecast_amdgpu.ll
new file mode 100644
index 000000000000000..dff71098ae23616
--- /dev/null
+++ b/llvm/test/Transforms/InferAlignment/addrspacecast_amdgpu.ll
@@ -0,0 +1,523 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -passes='infer-alignment' -mtriple=amdgcn -mcpu=gfx90a -S < %s  | FileCheck -check-prefix=AMDGPU %s
+; RUN: opt -passes='infer-alignment' -mcpu=corei7 -mtriple=x86_64-linux -S < %s  | FileCheck -check-prefix=X86 %s
+
+$globalArrayAS0 = comdat any
+$globalArrayAS1 = comdat any
+$globalArrayAS2 = comdat any
+$globalArrayAS3 = comdat any
+$globalArrayAS4 = comdat any
+$globalArrayAS5 = comdat any
+$globalArrayAS6 = comdat any
+$globalArrayAS7 = comdat any
+$globalArrayAS8 = comdat any
+@globalArrayAS0 = linkonce_odr hidden addrspace(0) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS1 = linkonce_odr hidden addrspace(1) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS2 = linkonce_odr hidden addrspace(2) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS3 = linkonce_odr hidden addrspace(3) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS4 = linkonce_odr hidden addrspace(4) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS5 = linkonce_odr hidden addrspace(5) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS6 = linkonce_odr hidden addrspace(6) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS7 = linkonce_odr hidden addrspace(7) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS8 = linkonce_odr hidden addrspace(8) global [4096 x i8] undef, comdat, align 16
+
+; Function Attrs: alwaysinline convergent mustprogress nounwind
+define amdgpu_kernel void @infer_AS10(i32 %idx) unnamed_addr align 2 {
+; AMDGPU-LABEL: define amdgpu_kernel void @infer_AS10(
+; AMDGPU-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0:[0-9]+]] align 2 {
+; AMDGPU-NEXT:  entry:
+; AMDGPU-NEXT:    [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; AMDGPU-NEXT:    [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; AMDGPU-NEXT:    [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; AMDGPU-NEXT:    [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(1) @globalArrayAS1 to ptr), i64 [[IDXPROM37_I21]]
+; AMDGPU-NEXT:    [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; AMDGPU-NEXT:    ret void
+;
+; X86-LABEL: define amdgpu_kernel void @infer_AS10(
+; X86-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0:[0-9]+]] align 2 {
+; X86-NEXT:  entry:
+; X86-NEXT:    [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; X86-NEXT:    [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; X86-NEXT:    [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; X86-NEXT:    [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(1) @globalArrayAS1 to ptr), i64 [[IDXPROM37_I21]]
+; X86-NEXT:    [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; X86-NEXT:    ret void
+;
+entry:
+  %mul32.i = shl nuw nsw i32 %idx, 8
+  %add36.i = add nuw nsw i32 %mul32.i, 1024
+  %idxprom37.i21 = zext i32 %add36.i to i64
+  %arrayidx38.i22 = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(1) @globalArrayAS1 to ptr), i64 %idxprom37.i21
+  %l1 = load float, ptr  %arrayidx38.i22, align 4
+  ret void
+}
+
+; Function Attrs: alwaysinline convergent mustprogress nounwind
+define amdgpu_kernel void @infer_AS20(i32 %idx) unnamed_addr align 2 {
+; AMDGPU-LABEL: define amdgpu_kernel void @infer_AS20(
+; AMDGPU-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; AMDGPU-NEXT:  entry:
+; AMDGPU-NEXT:    [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; AMDGPU-NEXT:    [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; AMDGPU-NEXT:    [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; AMDGPU-NEXT:    [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(2) @globalArrayAS2 to ptr), i64 [[IDXPROM37_I21]]
+; AMDGPU-NEXT:    [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; AMDGPU-NEXT:    ret void
+;
+; X86-LABEL: define amdgpu_kernel void @infer_AS20(
+; X86-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; X86-NEXT:  entry:
+; X86-NEXT:    [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; X86-NEXT:    [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; X86-NEXT:    [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; X86-NEXT:    [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(2) @globalArrayAS2 to ptr), i64 [[IDXPROM37_I21]]
+; X86-NEXT:    [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; X86-NEXT:    ret void
+;
+entry:
+  %mul32.i = shl nuw nsw i32 %idx, 8
+  %add36.i = add nuw nsw i32 %mul32.i, 1024
+  %idxprom37.i21 = zext i32 %add36.i to i64
+  %arrayidx38.i22 = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(2) @globalArrayAS2 to ptr), i64 %idxprom37.i21
+  %l1 = load float, ptr  %arrayidx38.i22, align 4
+  ret void
+}
+
+; Function Attrs: alwaysinline convergent mustprogress nounwind
+define amdgpu_kernel void @infer_AS30(i32 %idx) unnamed_addr align 2 {
+; AMDGPU-LABEL: define amdgpu_kernel void @infer_AS30(
+; AMDGPU-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; AMDGPU-NEXT:  entry:
+; AMDGPU-NEXT:    [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; AMDGPU-NEXT:    [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; AMDGPU-NEXT:    [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; AMDGPU-NEXT:    [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(3) @globalArrayAS3 to ptr), i64 [[IDXPROM37_I21]]
+; AMDGPU-NEXT:    [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; AMDGPU-NEXT:    ret void
+;
+; X86-LABEL: define amdgpu_kernel void @infer_AS30(
+; X86-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; X86-NEXT:  entry:
+; X86-NEXT:    [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; X86-NEXT:    [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; X86-NEXT:    [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; X86-NEXT:    [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(3) @globalArrayAS3 to ptr), i64 [[IDXPROM37_I21]]
+; X86-NEXT:    [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; X86-NEXT:    ret void
+;
+entry:
+  %mul32.i = shl nuw nsw i32 %idx, 8
+  %add36.i = add nuw nsw i32 %mul32.i, 1024
+  %idxprom37.i21 = zext i32 %add36.i to i64
+  %arrayidx38.i22 = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(3) @globalArrayAS3 to ptr), i64 %idxprom37.i21
+  %l1 = load float, ptr  %ar...
[truncated]

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/include/llvm/Support/KnownBits.h Outdated Show resolved Hide resolved
@nikic
Copy link
Contributor

nikic commented Nov 7, 2023

This also needs a LangRef patch to make the transform legal.

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Nov 8, 2023

This also needs a LangRef patch to make the transform legal.

Hi -- thanks for your comments. Do you mean a patch to update the addrspacecast documentation (esp w.r.t integral pointers)?

@nikic
Copy link
Contributor

nikic commented Nov 8, 2023

This also needs a LangRef patch to make the transform legal.

Hi -- thanks for your comments. Do you mean a patch to update the addrspacecast documentation (esp w.r.t integral pointers)?

Right.

@arsenm
Copy link
Contributor

arsenm commented Jan 9, 2024

Reverse ping

@arsenm
Copy link
Contributor

arsenm commented Jan 31, 2024

Ping?

@jrbyrnes jrbyrnes marked this pull request as draft February 7, 2024 18:26
@arsenm
Copy link
Contributor

arsenm commented Feb 29, 2024

ping?

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Mar 8, 2024

Seems like there are instances where adress spaces should be marked non-integral where they are not (for example, NVPTX see

if (DstAddrSpace == ADDRESS_SPACE_GENERIC) {
,
multiclass NG_TO_G<string Str, Intrinsic Intrin> {
). In order for this to be legal, I'll need to find all such instances and correctly mark them as nonintegral.

@arsenm
Copy link
Contributor

arsenm commented Mar 11, 2024

Seems like there are instances where adress spaces should be marked non-integral where they are not (for example, NVPTX see

I don't see the connection. NVPTX generic address space is just 0, which certainly is integral. I don't believe any of the nvptx address spaces are nonintegral

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Mar 11, 2024

I don't see the connection. NVPTX generic address space is just 0, which certainly is integral. I don't believe any of the nvptx address spaces are nonintegral

Sorry, I meant to flag the block with that condition as the entry point.

Casting from the target specific address spaces to the generic requires use of a target specific instruction. For example, NVPTX::cvta_shared (lowers to cvta.shared instruction) is used to go from shared -> generic . I'm not sure the semantics of this instruction, but I would assume it is not semantically equivalent to truncated / extended bitcast. Based on this assumption, this analysis is not correct for this address space. But, it seems to me that this shared address space (as well as the others handled in the block) has a target dependent integral representation, and is thus non-integral.

@arsenm
Copy link
Contributor

arsenm commented Mar 12, 2024

I'm not sure the semantics of this instruction, but I would assume it is not semantically equivalent to truncated / extended bitcast. Based on this assumption, this analysis is not correct for this address space.

It is correct, because it doesn't involve any change in the low bits. It's exactly the same as for AMDGPU

Change-Id: Ic44b01c9f6a810e90b8777b7a0d7a5a4b9c4f30a
@jrbyrnes
Copy link
Contributor Author

The only targets with data-layout defined address spaces are:

  1. AVR (function address space)
  2. WebAssembly
  3. AMDGPU
  4. X86
  5. NVPTX

There are a few other targets with addrspace handling / test coverage.

  1. SPIRV
  2. XCore
  3. AArch64
  4. Mips
  5. PowerPC
  6. ARN
  7. DirectX
  8. RISCV

Of these, only DirectX, XCore and SPIRV have non-noop addrspacecasts. In particular, I'm not sure about DirectX and XCore; I can only assume that integral addrspacecasts on these targets match the valuetracking logic.

I have added tests for relevant targets.

Change-Id: Iae07ee73f24144157614882152815adc46f81d0e
llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/docs/LangRef.rst Outdated Show resolved Hide resolved
@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2024

Description should mention alignment / known bits

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs an RFC on discourse.

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
Change-Id: If64879f88b1440b17ce3f167df16586dd1dd5613
@jrbyrnes jrbyrnes changed the title [ValueTracking] Allow tracking values through Integral AddrSpaceCasts [ValueTracking] Allow tracking values through Integral AddrSpaceCasts for improved alignments Mar 21, 2024
@jrbyrnes
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

5 participants