New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ValueTracking] Allow tracking values through Integral AddrSpaceCasts for improved alignments #70483
base: main
Are you sure you want to change the base?
Conversation
(Hard reject on TTI dependency in ValueTracking. You can solve this either via changes to LangRef, or by encoding necessary information in DataLayout.) |
4633db9
to
0fd4d0e
Compare
@arsenm Does this look more like you expected? If so, I can work on converting out of draft (test & code cleanup). |
0fd4d0e
to
d0d7543
Compare
@llvm/pr-subscribers-backend-spir-v @llvm/pr-subscribers-llvm-analysis Author: Jeffrey Byrnes (jrbyrnes) ChangesProvide capability to compute known bits through AddrSpaceCasts via target queries. This is mostly useful when trying to determine alignments. Patch is 69.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70483.diff 7 Files Affected:
diff --git a/llvm/include/llvm/Support/KnownBits.h b/llvm/include/llvm/Support/KnownBits.h
index fb034e0b9e3baff..545a694f612fa36 100644
--- a/llvm/include/llvm/Support/KnownBits.h
+++ b/llvm/include/llvm/Support/KnownBits.h
@@ -148,6 +148,14 @@ struct KnownBits {
return Max;
}
+ KnownBits clearLowBits(unsigned BitWidth) {
+ APInt NewZero = Zero;
+ APInt NewOne = One;
+ NewZero.clearLowBits(BitWidth);
+ NewOne.clearLowBits(BitWidth);
+ return KnownBits(NewZero, NewOne);
+ }
+
/// Return known bits for a truncation of the value we're tracking.
KnownBits trunc(unsigned BitWidth) const {
return KnownBits(Zero.trunc(BitWidth), One.trunc(BitWidth));
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index c303d261107eb19..db0f2e7203841e9 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1775,6 +1775,34 @@ static void computeKnownBitsFromOperator(const Operator *I,
Depth + 1))
computeKnownBits(I->getOperand(0), Known, Depth + 1, Q);
break;
+ case Instruction::AddrSpaceCast: {
+ auto ASC = cast<AddrSpaceCastOperator>(I);
+ unsigned SrcAS = ASC->getSrcAddressSpace();
+ unsigned DestAS = ASC->getDestAddressSpace();
+
+ auto DL = Q.DL;
+ if (DL.isNonIntegralAddressSpace(SrcAS) ||
+ DL.isNonIntegralAddressSpace(DestAS))
+ break;
+
+ auto SrcSize = DL.getPointerSizeInBits(SrcAS);
+ auto DstSize = DL.getPointerSizeInBits(DestAS);
+
+ if (DstSize > SrcSize) {
+ Known2 = Known;
+ Known2 = Known2.clearLowBits(SrcSize);
+ Known = Known.trunc(SrcSize);
+ computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+ Known = Known.anyext(DstSize);
+ Known = Known.unionWith(Known2);
+ }
+
+ else { // DstSize <= SrcSize
+ Known = Known.anyext(SrcSize);
+ computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+ Known = Known.trunc(DstSize);
+ }
+ }
}
}
diff --git a/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll b/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll
index 0e6c1aecb6774a3..73dbb4b72f03f5b 100644
--- a/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll
+++ b/llvm/test/CodeGen/AMDGPU/memcpy-crash-issue63986.ll
@@ -9,70 +9,68 @@ define void @issue63986(i64 %0, i64 %idxprom) {
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: v_lshlrev_b64 v[4:5], 6, v[2:3]
; CHECK-NEXT: s_mov_b64 s[4:5], 0
-; CHECK-NEXT: ; %bb.1: ; %loop-memcpy-expansion.preheader
-; CHECK-NEXT: v_lshlrev_b64 v[6:7], 6, v[2:3]
; CHECK-NEXT: s_mov_b64 s[6:7], 0
-; CHECK-NEXT: .LBB0_2: ; %loop-memcpy-expansion
+; CHECK-NEXT: .LBB0_1: ; %loop-memcpy-expansion
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
-; CHECK-NEXT: v_mov_b32_e32 v9, s7
-; CHECK-NEXT: v_mov_b32_e32 v8, s6
-; CHECK-NEXT: flat_load_ubyte v10, v[8:9] offset:5
-; CHECK-NEXT: flat_load_ubyte v11, v[8:9] offset:6
-; CHECK-NEXT: flat_load_ubyte v12, v[8:9] offset:7
-; CHECK-NEXT: flat_load_ubyte v13, v[8:9] offset:3
-; CHECK-NEXT: flat_load_ubyte v14, v[8:9] offset:2
-; CHECK-NEXT: flat_load_ubyte v15, v[8:9] offset:1
-; CHECK-NEXT: flat_load_ubyte v16, v[8:9]
-; CHECK-NEXT: flat_load_ubyte v17, v[8:9] offset:4
-; CHECK-NEXT: flat_load_ubyte v18, v[8:9] offset:13
-; CHECK-NEXT: flat_load_ubyte v19, v[8:9] offset:14
-; CHECK-NEXT: flat_load_ubyte v20, v[8:9] offset:15
-; CHECK-NEXT: flat_load_ubyte v21, v[8:9] offset:11
-; CHECK-NEXT: flat_load_ubyte v22, v[8:9] offset:10
-; CHECK-NEXT: flat_load_ubyte v23, v[8:9] offset:9
-; CHECK-NEXT: flat_load_ubyte v24, v[8:9] offset:8
-; CHECK-NEXT: flat_load_ubyte v25, v[8:9] offset:12
+; CHECK-NEXT: v_mov_b32_e32 v6, s6
+; CHECK-NEXT: v_mov_b32_e32 v7, s7
+; CHECK-NEXT: flat_load_ubyte v8, v[6:7] offset:5
+; CHECK-NEXT: flat_load_ubyte v9, v[6:7] offset:6
+; CHECK-NEXT: flat_load_ubyte v10, v[6:7] offset:7
+; CHECK-NEXT: flat_load_ubyte v11, v[6:7] offset:3
+; CHECK-NEXT: flat_load_ubyte v12, v[6:7] offset:2
+; CHECK-NEXT: flat_load_ubyte v13, v[6:7] offset:1
+; CHECK-NEXT: flat_load_ubyte v14, v[6:7]
+; CHECK-NEXT: flat_load_ubyte v15, v[6:7] offset:4
+; CHECK-NEXT: flat_load_ubyte v16, v[6:7] offset:13
+; CHECK-NEXT: flat_load_ubyte v17, v[6:7] offset:14
+; CHECK-NEXT: flat_load_ubyte v18, v[6:7] offset:15
+; CHECK-NEXT: flat_load_ubyte v19, v[6:7] offset:11
+; CHECK-NEXT: flat_load_ubyte v20, v[6:7] offset:10
+; CHECK-NEXT: flat_load_ubyte v21, v[6:7] offset:9
+; CHECK-NEXT: flat_load_ubyte v22, v[6:7] offset:8
+; CHECK-NEXT: flat_load_ubyte v23, v[6:7] offset:12
; CHECK-NEXT: s_add_u32 s4, s4, 1
; CHECK-NEXT: s_addc_u32 s5, s5, 0
-; CHECK-NEXT: v_add_co_u32_e32 v8, vcc, s6, v6
+; CHECK-NEXT: v_add_co_u32_e32 v6, vcc, s6, v4
; CHECK-NEXT: v_cmp_ge_u64_e64 s[8:9], s[4:5], 2
-; CHECK-NEXT: v_addc_co_u32_e32 v9, vcc, v9, v7, vcc
+; CHECK-NEXT: v_addc_co_u32_e32 v7, vcc, v7, v5, vcc
; CHECK-NEXT: s_add_u32 s6, s6, 16
; CHECK-NEXT: s_addc_u32 s7, s7, 0
; CHECK-NEXT: s_and_b64 vcc, exec, s[8:9]
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; CHECK-NEXT: flat_store_byte v[8:9], v13 offset:3
-; CHECK-NEXT: flat_store_byte v[8:9], v14 offset:2
-; CHECK-NEXT: flat_store_byte v[8:9], v15 offset:1
-; CHECK-NEXT: flat_store_byte v[8:9], v16
-; CHECK-NEXT: flat_store_byte v[8:9], v12 offset:7
-; CHECK-NEXT: flat_store_byte v[8:9], v11 offset:6
-; CHECK-NEXT: flat_store_byte v[8:9], v10 offset:5
-; CHECK-NEXT: flat_store_byte v[8:9], v17 offset:4
-; CHECK-NEXT: flat_store_byte v[8:9], v21 offset:11
-; CHECK-NEXT: flat_store_byte v[8:9], v22 offset:10
-; CHECK-NEXT: flat_store_byte v[8:9], v23 offset:9
-; CHECK-NEXT: flat_store_byte v[8:9], v24 offset:8
-; CHECK-NEXT: flat_store_byte v[8:9], v20 offset:15
-; CHECK-NEXT: flat_store_byte v[8:9], v19 offset:14
-; CHECK-NEXT: flat_store_byte v[8:9], v18 offset:13
-; CHECK-NEXT: flat_store_byte v[8:9], v25 offset:12
-; CHECK-NEXT: s_cbranch_vccz .LBB0_2
-; CHECK-NEXT: ; %bb.3: ; %loop-memcpy-residual-header
+; CHECK-NEXT: flat_store_byte v[6:7], v11 offset:3
+; CHECK-NEXT: flat_store_byte v[6:7], v12 offset:2
+; CHECK-NEXT: flat_store_byte v[6:7], v13 offset:1
+; CHECK-NEXT: flat_store_byte v[6:7], v14
+; CHECK-NEXT: flat_store_byte v[6:7], v10 offset:7
+; CHECK-NEXT: flat_store_byte v[6:7], v9 offset:6
+; CHECK-NEXT: flat_store_byte v[6:7], v8 offset:5
+; CHECK-NEXT: flat_store_byte v[6:7], v15 offset:4
+; CHECK-NEXT: flat_store_byte v[6:7], v19 offset:11
+; CHECK-NEXT: flat_store_byte v[6:7], v20 offset:10
+; CHECK-NEXT: flat_store_byte v[6:7], v21 offset:9
+; CHECK-NEXT: flat_store_byte v[6:7], v22 offset:8
+; CHECK-NEXT: flat_store_byte v[6:7], v18 offset:15
+; CHECK-NEXT: flat_store_byte v[6:7], v17 offset:14
+; CHECK-NEXT: flat_store_byte v[6:7], v16 offset:13
+; CHECK-NEXT: flat_store_byte v[6:7], v23 offset:12
+; CHECK-NEXT: s_cbranch_vccz .LBB0_1
+; CHECK-NEXT: ; %bb.2: ; %loop-memcpy-residual-header
; CHECK-NEXT: s_and_b32 s4, 32, 15
; CHECK-NEXT: s_mov_b32 s5, 0
-; CHECK-NEXT: s_cbranch_scc0 .LBB0_5
-; CHECK-NEXT: ; %bb.4:
+; CHECK-NEXT: s_cbranch_scc0 .LBB0_4
+; CHECK-NEXT: ; %bb.3:
; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
-; CHECK-NEXT: s_branch .LBB0_6
-; CHECK-NEXT: .LBB0_5: ; %loop-memcpy-residual-header.post-loop-memcpy-expansion_crit_edge
+; CHECK-NEXT: s_branch .LBB0_5
+; CHECK-NEXT: .LBB0_4: ; %loop-memcpy-residual-header.post-loop-memcpy-expansion_crit_edge
; CHECK-NEXT: v_lshlrev_b64 v[2:3], 6, v[2:3]
-; CHECK-NEXT: s_cbranch_execnz .LBB0_9
-; CHECK-NEXT: .LBB0_6: ; %loop-memcpy-residual.preheader
+; CHECK-NEXT: s_cbranch_execnz .LBB0_8
+; CHECK-NEXT: .LBB0_5: ; %loop-memcpy-residual.preheader
; CHECK-NEXT: v_or_b32_e32 v2, 32, v4
; CHECK-NEXT: v_mov_b32_e32 v3, v5
; CHECK-NEXT: s_mov_b64 s[6:7], 0
-; CHECK-NEXT: .LBB0_7: ; %loop-memcpy-residual
+; CHECK-NEXT: .LBB0_6: ; %loop-memcpy-residual
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: s_add_u32 s8, 32, s6
; CHECK-NEXT: s_addc_u32 s9, 0, s7
@@ -85,15 +83,15 @@ define void @issue63986(i64 %0, i64 %idxprom) {
; CHECK-NEXT: s_add_u32 s6, s6, 1
; CHECK-NEXT: v_mov_b32_e32 v6, s4
; CHECK-NEXT: v_addc_co_u32_e32 v9, vcc, v3, v9, vcc
-; CHECK-NEXT: s_addc_u32 s7, s7, 0
+; CHECK-NEXT: s_addc_u32 s7, 0, s7
; CHECK-NEXT: v_cmp_lt_u64_e32 vcc, s[6:7], v[6:7]
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: flat_store_byte v[8:9], v10
-; CHECK-NEXT: s_cbranch_vccnz .LBB0_7
-; CHECK-NEXT: ; %bb.8:
+; CHECK-NEXT: s_cbranch_vccnz .LBB0_6
+; CHECK-NEXT: ; %bb.7:
; CHECK-NEXT: v_mov_b32_e32 v2, v4
; CHECK-NEXT: v_mov_b32_e32 v3, v5
-; CHECK-NEXT: .LBB0_9: ; %post-loop-memcpy-expansion
+; CHECK-NEXT: .LBB0_8: ; %post-loop-memcpy-expansion
; CHECK-NEXT: v_lshrrev_b64 v[4:5], 4, v[0:1]
; CHECK-NEXT: v_and_b32_e32 v6, 15, v0
; CHECK-NEXT: v_mov_b32_e32 v7, 0
@@ -102,28 +100,28 @@ define void @issue63986(i64 %0, i64 %idxprom) {
; CHECK-NEXT: v_cmp_ne_u64_e64 s[6:7], 0, v[6:7]
; CHECK-NEXT: v_add_co_u32_e32 v8, vcc, v2, v0
; CHECK-NEXT: v_addc_co_u32_e32 v9, vcc, v3, v1, vcc
-; CHECK-NEXT: s_branch .LBB0_12
-; CHECK-NEXT: .LBB0_10: ; %Flow19
-; CHECK-NEXT: ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT: s_branch .LBB0_11
+; CHECK-NEXT: .LBB0_9: ; %Flow19
+; CHECK-NEXT: ; in Loop: Header=BB0_11 Depth=1
; CHECK-NEXT: s_or_b64 exec, exec, s[10:11]
; CHECK-NEXT: s_mov_b64 s[8:9], 0
-; CHECK-NEXT: .LBB0_11: ; %Flow21
-; CHECK-NEXT: ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT: .LBB0_10: ; %Flow21
+; CHECK-NEXT: ; in Loop: Header=BB0_11 Depth=1
; CHECK-NEXT: s_andn2_b64 vcc, exec, s[8:9]
-; CHECK-NEXT: s_cbranch_vccz .LBB0_20
-; CHECK-NEXT: .LBB0_12: ; %while.cond
+; CHECK-NEXT: s_cbranch_vccz .LBB0_19
+; CHECK-NEXT: .LBB0_11: ; %while.cond
; CHECK-NEXT: ; =>This Loop Header: Depth=1
-; CHECK-NEXT: ; Child Loop BB0_14 Depth 2
-; CHECK-NEXT: ; Child Loop BB0_18 Depth 2
+; CHECK-NEXT: ; Child Loop BB0_13 Depth 2
+; CHECK-NEXT: ; Child Loop BB0_17 Depth 2
; CHECK-NEXT: s_and_saveexec_b64 s[8:9], s[4:5]
-; CHECK-NEXT: s_cbranch_execz .LBB0_15
-; CHECK-NEXT: ; %bb.13: ; %loop-memcpy-expansion2.preheader
-; CHECK-NEXT: ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT: s_cbranch_execz .LBB0_14
+; CHECK-NEXT: ; %bb.12: ; %loop-memcpy-expansion2.preheader
+; CHECK-NEXT: ; in Loop: Header=BB0_11 Depth=1
; CHECK-NEXT: s_mov_b64 s[10:11], 0
; CHECK-NEXT: s_mov_b64 s[12:13], 0
; CHECK-NEXT: s_mov_b64 s[14:15], 0
-; CHECK-NEXT: .LBB0_14: ; %loop-memcpy-expansion2
-; CHECK-NEXT: ; Parent Loop BB0_12 Depth=1
+; CHECK-NEXT: .LBB0_13: ; %loop-memcpy-expansion2
+; CHECK-NEXT: ; Parent Loop BB0_11 Depth=1
; CHECK-NEXT: ; => This Inner Loop Header: Depth=2
; CHECK-NEXT: v_mov_b32_e32 v10, s10
; CHECK-NEXT: v_mov_b32_e32 v11, s11
@@ -169,23 +167,23 @@ define void @issue63986(i64 %0, i64 %idxprom) {
; CHECK-NEXT: flat_store_byte v[10:11], v20 offset:13
; CHECK-NEXT: flat_store_byte v[10:11], v27 offset:12
; CHECK-NEXT: s_andn2_b64 exec, exec, s[12:13]
-; CHECK-NEXT: s_cbranch_execnz .LBB0_14
-; CHECK-NEXT: .LBB0_15: ; %Flow20
-; CHECK-NEXT: ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT: s_cbranch_execnz .LBB0_13
+; CHECK-NEXT: .LBB0_14: ; %Flow20
+; CHECK-NEXT: ; in Loop: Header=BB0_11 Depth=1
; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]
; CHECK-NEXT: s_mov_b64 s[8:9], -1
-; CHECK-NEXT: s_cbranch_execz .LBB0_11
-; CHECK-NEXT: ; %bb.16: ; %loop-memcpy-residual-header5
-; CHECK-NEXT: ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT: s_cbranch_execz .LBB0_10
+; CHECK-NEXT: ; %bb.15: ; %loop-memcpy-residual-header5
+; CHECK-NEXT: ; in Loop: Header=BB0_11 Depth=1
; CHECK-NEXT: s_and_saveexec_b64 s[8:9], s[6:7]
; CHECK-NEXT: s_xor_b64 s[10:11], exec, s[8:9]
-; CHECK-NEXT: s_cbranch_execz .LBB0_10
-; CHECK-NEXT: ; %bb.17: ; %loop-memcpy-residual4.preheader
-; CHECK-NEXT: ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT: s_cbranch_execz .LBB0_9
+; CHECK-NEXT: ; %bb.16: ; %loop-memcpy-residual4.preheader
+; CHECK-NEXT: ; in Loop: Header=BB0_11 Depth=1
; CHECK-NEXT: s_mov_b64 s[12:13], 0
; CHECK-NEXT: s_mov_b64 s[14:15], 0
-; CHECK-NEXT: .LBB0_18: ; %loop-memcpy-residual4
-; CHECK-NEXT: ; Parent Loop BB0_12 Depth=1
+; CHECK-NEXT: .LBB0_17: ; %loop-memcpy-residual4
+; CHECK-NEXT: ; Parent Loop BB0_11 Depth=1
; CHECK-NEXT: ; => This Inner Loop Header: Depth=2
; CHECK-NEXT: v_mov_b32_e32 v12, s15
; CHECK-NEXT: v_add_co_u32_e32 v10, vcc, s14, v0
@@ -200,12 +198,12 @@ define void @issue63986(i64 %0, i64 %idxprom) {
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: flat_store_byte v[10:11], v13
; CHECK-NEXT: s_andn2_b64 exec, exec, s[12:13]
-; CHECK-NEXT: s_cbranch_execnz .LBB0_18
-; CHECK-NEXT: ; %bb.19: ; %Flow
-; CHECK-NEXT: ; in Loop: Header=BB0_12 Depth=1
+; CHECK-NEXT: s_cbranch_execnz .LBB0_17
+; CHECK-NEXT: ; %bb.18: ; %Flow
+; CHECK-NEXT: ; in Loop: Header=BB0_11 Depth=1
; CHECK-NEXT: s_or_b64 exec, exec, s[12:13]
-; CHECK-NEXT: s_branch .LBB0_10
-; CHECK-NEXT: .LBB0_20: ; %DummyReturnBlock
+; CHECK-NEXT: s_branch .LBB0_9
+; CHECK-NEXT: .LBB0_19: ; %DummyReturnBlock
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_setpc_b64 s[30:31]
entry:
diff --git a/llvm/test/Transforms/InferAlignment/addrspacecast_amdgpu.ll b/llvm/test/Transforms/InferAlignment/addrspacecast_amdgpu.ll
new file mode 100644
index 000000000000000..dff71098ae23616
--- /dev/null
+++ b/llvm/test/Transforms/InferAlignment/addrspacecast_amdgpu.ll
@@ -0,0 +1,523 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -passes='infer-alignment' -mtriple=amdgcn -mcpu=gfx90a -S < %s | FileCheck -check-prefix=AMDGPU %s
+; RUN: opt -passes='infer-alignment' -mcpu=corei7 -mtriple=x86_64-linux -S < %s | FileCheck -check-prefix=X86 %s
+
+$globalArrayAS0 = comdat any
+$globalArrayAS1 = comdat any
+$globalArrayAS2 = comdat any
+$globalArrayAS3 = comdat any
+$globalArrayAS4 = comdat any
+$globalArrayAS5 = comdat any
+$globalArrayAS6 = comdat any
+$globalArrayAS7 = comdat any
+$globalArrayAS8 = comdat any
+@globalArrayAS0 = linkonce_odr hidden addrspace(0) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS1 = linkonce_odr hidden addrspace(1) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS2 = linkonce_odr hidden addrspace(2) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS3 = linkonce_odr hidden addrspace(3) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS4 = linkonce_odr hidden addrspace(4) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS5 = linkonce_odr hidden addrspace(5) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS6 = linkonce_odr hidden addrspace(6) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS7 = linkonce_odr hidden addrspace(7) global [4096 x i8] undef, comdat, align 16
+@globalArrayAS8 = linkonce_odr hidden addrspace(8) global [4096 x i8] undef, comdat, align 16
+
+; Function Attrs: alwaysinline convergent mustprogress nounwind
+define amdgpu_kernel void @infer_AS10(i32 %idx) unnamed_addr align 2 {
+; AMDGPU-LABEL: define amdgpu_kernel void @infer_AS10(
+; AMDGPU-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0:[0-9]+]] align 2 {
+; AMDGPU-NEXT: entry:
+; AMDGPU-NEXT: [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; AMDGPU-NEXT: [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; AMDGPU-NEXT: [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; AMDGPU-NEXT: [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(1) @globalArrayAS1 to ptr), i64 [[IDXPROM37_I21]]
+; AMDGPU-NEXT: [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; AMDGPU-NEXT: ret void
+;
+; X86-LABEL: define amdgpu_kernel void @infer_AS10(
+; X86-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0:[0-9]+]] align 2 {
+; X86-NEXT: entry:
+; X86-NEXT: [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; X86-NEXT: [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; X86-NEXT: [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; X86-NEXT: [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(1) @globalArrayAS1 to ptr), i64 [[IDXPROM37_I21]]
+; X86-NEXT: [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; X86-NEXT: ret void
+;
+entry:
+ %mul32.i = shl nuw nsw i32 %idx, 8
+ %add36.i = add nuw nsw i32 %mul32.i, 1024
+ %idxprom37.i21 = zext i32 %add36.i to i64
+ %arrayidx38.i22 = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(1) @globalArrayAS1 to ptr), i64 %idxprom37.i21
+ %l1 = load float, ptr %arrayidx38.i22, align 4
+ ret void
+}
+
+; Function Attrs: alwaysinline convergent mustprogress nounwind
+define amdgpu_kernel void @infer_AS20(i32 %idx) unnamed_addr align 2 {
+; AMDGPU-LABEL: define amdgpu_kernel void @infer_AS20(
+; AMDGPU-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; AMDGPU-NEXT: entry:
+; AMDGPU-NEXT: [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; AMDGPU-NEXT: [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; AMDGPU-NEXT: [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; AMDGPU-NEXT: [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(2) @globalArrayAS2 to ptr), i64 [[IDXPROM37_I21]]
+; AMDGPU-NEXT: [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; AMDGPU-NEXT: ret void
+;
+; X86-LABEL: define amdgpu_kernel void @infer_AS20(
+; X86-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; X86-NEXT: entry:
+; X86-NEXT: [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; X86-NEXT: [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; X86-NEXT: [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; X86-NEXT: [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(2) @globalArrayAS2 to ptr), i64 [[IDXPROM37_I21]]
+; X86-NEXT: [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; X86-NEXT: ret void
+;
+entry:
+ %mul32.i = shl nuw nsw i32 %idx, 8
+ %add36.i = add nuw nsw i32 %mul32.i, 1024
+ %idxprom37.i21 = zext i32 %add36.i to i64
+ %arrayidx38.i22 = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(2) @globalArrayAS2 to ptr), i64 %idxprom37.i21
+ %l1 = load float, ptr %arrayidx38.i22, align 4
+ ret void
+}
+
+; Function Attrs: alwaysinline convergent mustprogress nounwind
+define amdgpu_kernel void @infer_AS30(i32 %idx) unnamed_addr align 2 {
+; AMDGPU-LABEL: define amdgpu_kernel void @infer_AS30(
+; AMDGPU-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; AMDGPU-NEXT: entry:
+; AMDGPU-NEXT: [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; AMDGPU-NEXT: [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; AMDGPU-NEXT: [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; AMDGPU-NEXT: [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(3) @globalArrayAS3 to ptr), i64 [[IDXPROM37_I21]]
+; AMDGPU-NEXT: [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; AMDGPU-NEXT: ret void
+;
+; X86-LABEL: define amdgpu_kernel void @infer_AS30(
+; X86-SAME: i32 [[IDX:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+; X86-NEXT: entry:
+; X86-NEXT: [[MUL32_I:%.*]] = shl nuw nsw i32 [[IDX]], 8
+; X86-NEXT: [[ADD36_I:%.*]] = add nuw nsw i32 [[MUL32_I]], 1024
+; X86-NEXT: [[IDXPROM37_I21:%.*]] = zext i32 [[ADD36_I]] to i64
+; X86-NEXT: [[ARRAYIDX38_I22:%.*]] = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(3) @globalArrayAS3 to ptr), i64 [[IDXPROM37_I21]]
+; X86-NEXT: [[L1:%.*]] = load float, ptr [[ARRAYIDX38_I22]], align 16
+; X86-NEXT: ret void
+;
+entry:
+ %mul32.i = shl nuw nsw i32 %idx, 8
+ %add36.i = add nuw nsw i32 %mul32.i, 1024
+ %idxprom37.i21 = zext i32 %add36.i to i64
+ %arrayidx38.i22 = getelementptr inbounds float, ptr addrspacecast (ptr addrspace(3) @globalArrayAS3 to ptr), i64 %idxprom37.i21
+ %l1 = load float, ptr %ar...
[truncated]
|
This also needs a LangRef patch to make the transform legal. |
Hi -- thanks for your comments. Do you mean a patch to update the addrspacecast documentation (esp w.r.t integral pointers)? |
Right. |
Reverse ping |
Ping? |
d0d7543
to
5e161e6
Compare
ping? |
Seems like there are instances where adress spaces should be marked non-integral where they are not (for example, NVPTX see
|
I don't see the connection. NVPTX generic address space is just 0, which certainly is integral. I don't believe any of the nvptx address spaces are nonintegral |
Sorry, I meant to flag the block with that condition as the entry point. Casting from the target specific address spaces to the generic requires use of a target specific instruction. For example, NVPTX::cvta_shared (lowers to cvta.shared instruction) is used to go from shared -> generic . I'm not sure the semantics of this instruction, but I would assume it is not semantically equivalent to truncated / extended bitcast. Based on this assumption, this analysis is not correct for this address space. But, it seems to me that this shared address space (as well as the others handled in the block) has a target dependent integral representation, and is thus non-integral. |
It is correct, because it doesn't involve any change in the low bits. It's exactly the same as for AMDGPU |
Change-Id: Ic44b01c9f6a810e90b8777b7a0d7a5a4b9c4f30a
5e161e6
to
1cfa944
Compare
The only targets with data-layout defined address spaces are:
There are a few other targets with addrspace handling / test coverage.
Of these, only DirectX, XCore and SPIRV have non-noop addrspacecasts. In particular, I'm not sure about DirectX and XCore; I can only assume that integral addrspacecasts on these targets match the valuetracking logic. I have added tests for relevant targets. |
Change-Id: Iae07ee73f24144157614882152815adc46f81d0e
Description should mention alignment / known bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs an RFC on discourse.
Change-Id: If64879f88b1440b17ce3f167df16586dd1dd5613
|
Provide capability to compute known bits through AddrSpaceCasts to support improved reasoning about alignments.