-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[IR] Replace alignment argument with attribute on masked intrinsics #163802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h,cpp,c -- clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1sb.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1sh.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1sw.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1ub.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1uh.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1uw.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_st1.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_st1b.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_st1h.c clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_st1w.c clang/test/CodeGen/X86/avx10_2bf16-builtins.c clang/test/CodeGen/X86/avx512bw-builtins.c clang/test/CodeGen/X86/avx512f-builtins.c clang/test/CodeGen/X86/avx512fp16-builtins.c clang/test/CodeGen/X86/avx512vl-builtins.c clang/test/CodeGen/X86/avx512vlbw-builtins.c clang/test/CodeGen/arm-mve-intrinsics/cplusplus.cpp clang/test/CodeGen/arm-mve-intrinsics/load-store.c clang/test/CodeGen/builtin-masked.c llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/IR/PatternMatch.h llvm/lib/Analysis/ConstantFolding.cpp llvm/lib/Analysis/InstructionSimplify.cpp llvm/lib/Analysis/MemoryLocation.cpp llvm/lib/CodeGen/InterleavedAccessPass.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/IRBuilder.cpp llvm/lib/IR/Intrinsics.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Target/AMDGPU/AMDGPUAsanInstrumentation.cpp llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp llvm/lib/Target/Hexagon/HexagonVectorCombine.cpp llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp llvm/lib/Transforms/Instrumentation/MemProfInstrumentation.cpp llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/lib/Transforms/Scalar/EarlyCSE.cpp llvm/lib/Transforms/Scalar/GVN.cpp llvm/lib/Transforms/Scalar/InferAlignment.cpp llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp llvm/unittests/IR/IRBuilderTest.cpp --diff_from_common_commit
View the diff from clang-format here.diff --git a/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp b/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp
index 146e7d104..b5787fba6 100644
--- a/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp
+++ b/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp
@@ -138,9 +138,9 @@ static unsigned adjustForEndian(const DataLayout &DL, unsigned VectorWidth,
// br label %else2
//
// else2: ; preds = %else, %cond.load1
-// %res.phi.else3 = phi <16 x i32> [ %9, %cond.load1 ], [ %res.phi.else, %else ]
-// %10 = extractelement <16 x i1> %mask, i32 2
-// br i1 %10, label %cond.load4, label %else5
+// %res.phi.else3 = phi <16 x i32> [ %9, %cond.load1 ], [ %res.phi.else, %else
+// ] %10 = extractelement <16 x i1> %mask, i32 2 br i1 %10, label %cond.load4,
+// label %else5
//
static void scalarizeMaskedLoad(const DataLayout &DL, bool HasBranchDivergence,
CallInst *CI, DomTreeUpdater *DTU,
|
883cc4e
to
72b3458
Compare
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-backend-arm Author: Nikita Popov (nikic) ChangesThe This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway). Based on top of #163790. Patch is 2.79 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163802.diff 302 Files Affected:
diff --git a/clang/include/clang/Basic/arm_mve.td b/clang/include/clang/Basic/arm_mve.td
index 412ef9abac1bc..2e5e1d93be096 100644
--- a/clang/include/clang/Basic/arm_mve.td
+++ b/clang/include/clang/Basic/arm_mve.td
@@ -831,9 +831,8 @@ multiclass contiguous_load<string mnemonic, PrimitiveType memtype,
NameOverride<mnemonic>;
def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr,
Predicate:$pred),
- (IRIntBase<"masked_load", [Vector, CPtr<Vector>]>
- (CPtr<Vector> $addr), !srl(memtype.size,3),
- $pred, (zeroinit Vector))>,
+ (masked_load Vector, (CPtr<Vector> $addr),
+ !srl(memtype.size,3), $pred, (zeroinit Vector))>,
NameOverride<mnemonic # "_z">;
}
@@ -846,9 +845,8 @@ multiclass contiguous_load<string mnemonic, PrimitiveType memtype,
NameOverride<"vld1q">;
def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr,
Predicate:$pred),
- (IRIntBase<"masked_load", [Vector, CPtr<Vector>]>
- (CPtr<Vector> $addr), !srl(memtype.size,3),
- $pred, (zeroinit Vector))>,
+ (masked_load Vector, (CPtr<Vector> $addr),
+ !srl(memtype.size,3), $pred, (zeroinit Vector))>,
NameOverride<"vld1q_z">;
}
@@ -863,9 +861,7 @@ multiclass contiguous_load<string mnemonic, PrimitiveType memtype,
NameOverride<mnemonic>;
def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr,
Predicate:$pred),
- (extend (IRIntBase<"masked_load",
- [NarrowedVecOf<memtype,Vector>,
- CPtr<NarrowedVecOf<memtype,Vector>>]>
+ (extend (masked_load NarrowedVecOf<memtype,Vector>,
(CPtr<NarrowedVecOf<memtype,Vector>> $addr),
!srl(memtype.size,3), $pred,
(zeroinit NarrowedVecOf<memtype,Vector>)),
@@ -890,8 +886,7 @@ multiclass contiguous_store<string mnemonic, PrimitiveType memtype,
NameOverride<mnemonic>;
def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
Vector:$value, Predicate:$pred),
- (IRIntBase<"masked_store", [Vector, Ptr<Vector>]>
- $value, (Ptr<Vector> $addr),
+ (masked_store $value, (Ptr<Vector> $addr),
!srl(memtype.size,3), $pred)>,
NameOverride<mnemonic # "_p">;
}
@@ -907,8 +902,7 @@ multiclass contiguous_store<string mnemonic, PrimitiveType memtype,
NameOverride<"vst1q">;
def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
Vector:$value, Predicate:$pred),
- (IRIntBase<"masked_store", [Vector, Ptr<Vector>]>
- $value, (Ptr<Vector> $addr),
+ (masked_store $value, (Ptr<Vector> $addr),
!srl(memtype.size,3), $pred)>,
NameOverride<"vst1q_p">;
}
@@ -925,9 +919,7 @@ multiclass contiguous_store<string mnemonic, PrimitiveType memtype,
NameOverride<mnemonic>;
def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
Vector:$value, Predicate:$pred),
- (IRIntBase<"masked_store",
- [NarrowedVecOf<memtype,Vector>,
- Ptr<NarrowedVecOf<memtype,Vector>>]>
+ (masked_store
(trunc $value, NarrowedVecOf<memtype,Vector>),
(Ptr<NarrowedVecOf<memtype,Vector>> $addr),
!srl(memtype.size,3), $pred)>,
diff --git a/clang/include/clang/Basic/arm_mve_defs.td b/clang/include/clang/Basic/arm_mve_defs.td
index 083d03a396ba3..c1562a0c1f04c 100644
--- a/clang/include/clang/Basic/arm_mve_defs.td
+++ b/clang/include/clang/Basic/arm_mve_defs.td
@@ -134,6 +134,13 @@ def unzip: CGHelperFn<"VectorUnzip"> {
}
def zip: CGHelperFn<"VectorZip">;
+def masked_load: IRBuilder<"CreateMaskedLoad"> {
+ let special_params = [IRBuilderIntParam<2, "Align">];
+}
+def masked_store: IRBuilder<"CreateMaskedStore"> {
+ let special_params = [IRBuilderIntParam<2, "Align">];
+}
+
// Trivial 'codegen' function that just returns its argument. Useful
// for wrapping up a variable name like $foo into a thing you can pass
// around as type 'dag'.
diff --git a/clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1.c b/clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1.c
index 19e5243c8a625..1ed59c6c80bdc 100644
--- a/clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1.c
+++ b/clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_ld1.c
@@ -24,12 +24,12 @@
// CHECK-LABEL: @test_svld1_s8(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr [[BASE:%.*]], i32 1, <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
+// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr align 1 [[BASE:%.*]], <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
// CHECK-NEXT: ret <vscale x 16 x i8> [[TMP0]]
//
// CPP-CHECK-LABEL: @_Z13test_svld1_s8u10__SVBool_tPKa(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr [[BASE:%.*]], i32 1, <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr align 1 [[BASE:%.*]], <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 16 x i8> [[TMP0]]
//
svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) MODE_ATTR
@@ -40,13 +40,13 @@ svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr [[BASE:%.*]], i32 1, <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr align 1 [[BASE:%.*]], <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
// CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_s16u10__SVBool_tPKs(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr [[BASE:%.*]], i32 1, <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr align 1 [[BASE:%.*]], <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
//
svint16_t test_svld1_s16(svbool_t pg, const int16_t *base) MODE_ATTR
@@ -57,13 +57,13 @@ svint16_t test_svld1_s16(svbool_t pg, const int16_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr [[BASE:%.*]], i32 1, <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr align 1 [[BASE:%.*]], <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
// CHECK-NEXT: ret <vscale x 4 x i32> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_s32u10__SVBool_tPKi(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr [[BASE:%.*]], i32 1, <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr align 1 [[BASE:%.*]], <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 4 x i32> [[TMP1]]
//
svint32_t test_svld1_s32(svbool_t pg, const int32_t *base) MODE_ATTR
@@ -74,13 +74,13 @@ svint32_t test_svld1_s32(svbool_t pg, const int32_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr [[BASE:%.*]], i32 1, <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr align 1 [[BASE:%.*]], <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
// CHECK-NEXT: ret <vscale x 2 x i64> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_s64u10__SVBool_tPKl(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr [[BASE:%.*]], i32 1, <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr align 1 [[BASE:%.*]], <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 2 x i64> [[TMP1]]
//
svint64_t test_svld1_s64(svbool_t pg, const int64_t *base) MODE_ATTR
@@ -90,12 +90,12 @@ svint64_t test_svld1_s64(svbool_t pg, const int64_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_u8(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr [[BASE:%.*]], i32 1, <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
+// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr align 1 [[BASE:%.*]], <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
// CHECK-NEXT: ret <vscale x 16 x i8> [[TMP0]]
//
// CPP-CHECK-LABEL: @_Z13test_svld1_u8u10__SVBool_tPKh(
// CPP-CHECK-NEXT: entry:
-// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr [[BASE:%.*]], i32 1, <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr align 1 [[BASE:%.*]], <vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i8> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 16 x i8> [[TMP0]]
//
svuint8_t test_svld1_u8(svbool_t pg, const uint8_t *base) MODE_ATTR
@@ -106,13 +106,13 @@ svuint8_t test_svld1_u8(svbool_t pg, const uint8_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr [[BASE:%.*]], i32 1, <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr align 1 [[BASE:%.*]], <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
// CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_u16u10__SVBool_tPKt(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr [[BASE:%.*]], i32 1, <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr align 1 [[BASE:%.*]], <vscale x 8 x i1> [[TMP0]], <vscale x 8 x i16> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
//
svuint16_t test_svld1_u16(svbool_t pg, const uint16_t *base) MODE_ATTR
@@ -123,13 +123,13 @@ svuint16_t test_svld1_u16(svbool_t pg, const uint16_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr [[BASE:%.*]], i32 1, <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr align 1 [[BASE:%.*]], <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
// CHECK-NEXT: ret <vscale x 4 x i32> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_u32u10__SVBool_tPKj(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr [[BASE:%.*]], i32 1, <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr align 1 [[BASE:%.*]], <vscale x 4 x i1> [[TMP0]], <vscale x 4 x i32> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 4 x i32> [[TMP1]]
//
svuint32_t test_svld1_u32(svbool_t pg, const uint32_t *base) MODE_ATTR
@@ -140,13 +140,13 @@ svuint32_t test_svld1_u32(svbool_t pg, const uint32_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr [[BASE:%.*]], i32 1, <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr align 1 [[BASE:%.*]], <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
// CHECK-NEXT: ret <vscale x 2 x i64> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_u64u10__SVBool_tPKm(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr [[BASE:%.*]], i32 1, <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i64> @llvm.masked.load.nxv2i64.p0(ptr align 1 [[BASE:%.*]], <vscale x 2 x i1> [[TMP0]], <vscale x 2 x i64> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 2 x i64> [[TMP1]]
//
svuint64_t test_svld1_u64(svbool_t pg, const uint64_t *base) MODE_ATTR
@@ -157,13 +157,13 @@ svuint64_t test_svld1_u64(svbool_t pg, const uint64_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x half> @llvm.masked.load.nxv8f16.p0(ptr [[BASE:%.*]], i32 1, <vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x half> @llvm.masked.load.nxv8f16.p0(ptr align 1 [[BASE:%.*]], <vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> zeroinitializer)
// CHECK-NEXT: ret <vscale x 8 x half> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_f16u10__SVBool_tPKDh(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x half> @llvm.masked.load.nxv8f16.p0(ptr [[BASE:%.*]], i32 1, <vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x half> @llvm.masked.load.nxv8f16.p0(ptr align 1 [[BASE:%.*]], <vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 8 x half> [[TMP1]]
//
svfloat16_t test_svld1_f16(svbool_t pg, const float16_t *base) MODE_ATTR
@@ -174,13 +174,13 @@ svfloat16_t test_svld1_f16(svbool_t pg, const float16_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr [[BASE:%.*]], i32 1, <vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr align 1 [[BASE:%.*]], <vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> zeroinitializer)
// CHECK-NEXT: ret <vscale x 4 x float> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_f32u10__SVBool_tPKf(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr [[BASE:%.*]], i32 1, <vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr align 1 [[BASE:%.*]], <vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 4 x float> [[TMP1]]
//
svfloat32_t test_svld1_f32(svbool_t pg, const float32_t *base) MODE_ATTR
@@ -191,13 +191,13 @@ svfloat32_t test_svld1_f32(svbool_t pg, const float32_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_f64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]])
-// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x double> @llvm.masked.load.nxv2f64.p0(ptr [[BASE:%.*]], i32 1, <vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> zeroinitializer)
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x double> @llvm.masked.load.nxv2f64.p0(ptr align 1 [[BASE:%.*]], <vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> zeroinitializer)
// CHECK-NEXT: ret <vscale x 2 x double> [[TMP1]]
//
// CPP-CHECK-LABEL: @_Z14test_svld1_f64u10__SVBool_tPKd(
// CPP-CHECK-NEXT: entry:
// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]])
-// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x double> @llvm.masked.load.nxv2f64.p0(ptr [[BASE:%.*]], i32 1, <vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> zeroinitializer)
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x double> @llvm.masked.load.nxv2f64.p0(ptr align 1 [[BASE:%.*]], <vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> zeroinitializer)
// CPP-CHECK-NEXT: ret <vscale x 2 x double> [[TMP1]]
//
svfloat64_t test_svld1_f64(svbool_t pg, const float64_t *base) MODE_ATTR
@@ -207,12 +207,12 @@ svfloat64_t test_svld1_f64(svbool_t pg, const float64_t *base) MODE_ATTR
// CHECK-LABEL: @test_svld1_mf8(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0...
[truncated]
|
Value *PtrOp = II->getIntrinsicID() == Intrinsic::masked_load | ||
? II->getArgOperand(0) | ||
: II->getArgOperand(1); | ||
unsigned PtrOpIdx = II->getIntrinsicID() == Intrinsic::masked_load ? 0 : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this handling if it's a property of the parameter? Figured there would be a common pass for increasing alignment on parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the default pipeline, InferAlignment is that common pass. Though it would probably make sense to extend this code to handle all memory intrinsics instead of only masked.load/store now.
In the non-default pipeline, I expect that this will enable alignment inference in Attributor.
Are the CI failures unrelated? Seems pretty straightforward otherwise. |
The CI failure is unrelated (see #163937). |
// declaration has not been auto-upgraded yet. In that case, rename the | ||
// invalid declaration and insert a new one with the correct signature. The | ||
// invalid declaration will get upgraded later. | ||
F->setName(F->getName() + ".invalid"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which test corresponds to this change? We have renamed the old intrinsic to name+old
in upgradeIntrinsicFunction1
. Isn't it enough? Why didn't we need this in the last migration of memcpy/memset/memmove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens e.g. in some X86 intrinsic upgrade tests. The flow is that in places like
llvm-project/llvm/lib/IR/AutoUpgrade.cpp
Line 2114 in 64c511f
return Builder.CreateMaskedStore(Data, Ptr, Alignment, Mask); |
.old
rename doesn't help because it would only happen later, when we get to that declaration.
(I have a prototype lying around somewhere that does not use named functions for intrinsics, and represents them via ID and signature only. That would avoid this kind of problem by design.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Great to see this simplification/unification.
The
masked.load
,masked.store
,masked.gather
andmasked.scatter
intrinsics currently accept a separate alignment immarg. Replace this with analign
attribute on the pointer / vector of pointers argument.This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).