Skip to content

Commit

Permalink
[SROA] isVectorPromotionViable(): memory intrinsics operate on vect…
Browse files Browse the repository at this point in the history
…ors of bytes

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. AliveToolkit/alive2#860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
  • Loading branch information
LebedevRI committed Nov 22, 2022
1 parent 11c2c16 commit cf624b2
Show file tree
Hide file tree
Showing 17 changed files with 216 additions and 268 deletions.
18 changes: 11 additions & 7 deletions clang/test/CodeGenOpenCL/amdgpu-nullptr.cl
Expand Up @@ -515,13 +515,17 @@ typedef struct {
private char *p;
} StructTy3;

// CHECK-LABEL: test_memset_private
// CHECK: call void @llvm.memset.p5i8.i64(i8 addrspace(5)* noundef align 8 {{.*}}, i8 0, i64 32, i1 false)
// CHECK: [[GEP:%.*]] = getelementptr inbounds %struct.StructTy3, %struct.StructTy3 addrspace(5)* %ptr, i32 0, i32 4
// CHECK: store i8 addrspace(5)* addrspacecast (i8* null to i8 addrspace(5)*), i8 addrspace(5)* addrspace(5)* [[GEP]]
// CHECK: [[GEP1:%.*]] = getelementptr inbounds i8, i8 addrspace(5)* {{.*}}, i32 36
// CHECK: [[GEP1_CAST:%.*]] = bitcast i8 addrspace(5)* [[GEP1]] to i32 addrspace(5)*
// CHECK: store i32 0, i32 addrspace(5)* [[GEP1_CAST]], align 4
// CHECK-LABEL: @test_memset_private(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = bitcast [[STRUCT_STRUCTTY3:%.*]] addrspace(5)* [[PTR:%.*]] to i8 addrspace(5)*
// CHECK-NEXT: [[S3_SROA_0_SROA_0_0_S3_SROA_0_0__SROA_CAST2_SROA_CAST:%.*]] = bitcast [[STRUCT_STRUCTTY3]] addrspace(5)* [[PTR]] to <32 x i8> addrspace(5)*
// CHECK-NEXT: store <32 x i8> zeroinitializer, <32 x i8> addrspace(5)* [[S3_SROA_0_SROA_0_0_S3_SROA_0_0__SROA_CAST2_SROA_CAST]], align 8, !tbaa.struct !9
// CHECK-NEXT: [[S3_SROA_4_0__SROA_IDX6:%.*]] = getelementptr inbounds [[STRUCT_STRUCTTY3]], [[STRUCT_STRUCTTY3]] addrspace(5)* [[PTR]], i32 0, i32 4
// CHECK-NEXT: store i8 addrspace(5)* addrspacecast (i8* null to i8 addrspace(5)*), i8 addrspace(5)* addrspace(5)* [[S3_SROA_4_0__SROA_IDX6]], align 8, !tbaa.struct !12
// CHECK-NEXT: [[S3_SROA_5_0__SROA_IDX:%.*]] = getelementptr inbounds i8, i8 addrspace(5)* [[TMP0]], i32 36
// CHECK-NEXT: [[S3_SROA_5_0__SROA_CAST8:%.*]] = bitcast i8 addrspace(5)* [[S3_SROA_5_0__SROA_IDX]] to i32 addrspace(5)*
// CHECK-NEXT: store i32 0, i32 addrspace(5)* [[S3_SROA_5_0__SROA_CAST8]], align 4, !tbaa.struct !13
// CHECK-NEXT: ret void
void test_memset_private(private StructTy3 *ptr) {
StructTy3 S3 = {0, 0, 0, 0, 0};
*ptr = S3;
Expand Down
15 changes: 11 additions & 4 deletions llvm/lib/Transforms/Scalar/SROA.cpp
Expand Up @@ -1806,8 +1806,10 @@ static bool isVectorPromotionViableForSlice(Partition &P, const Slice &S,
? Ty->getElementType()
: FixedVectorType::get(Ty->getElementType(), NumElements);

Type *SplitIntTy =
Type::getIntNTy(Ty->getContext(), NumElements * ElementSize * 8);
Type *SplitIntTy = nullptr;
if (uint64_t Bitwidth = NumElements * ElementSize * 8;
Bitwidth <= IntegerType::MAX_INT_BITS)
SplitIntTy = Type::getIntNTy(Ty->getContext(), Bitwidth);

Use *U = S.getUse();

Expand All @@ -1826,7 +1828,8 @@ static bool isVectorPromotionViableForSlice(Partition &P, const Slice &S,
// Disable vector promotion when there are loads or stores of an FCA.
if (LTy->isStructTy())
return false;
if (P.beginOffset() > S.beginOffset() || P.endOffset() < S.endOffset()) {
if (SplitIntTy &&
(P.beginOffset() > S.beginOffset() || P.endOffset() < S.endOffset())) {
assert(LTy->isIntegerTy());
LTy = SplitIntTy;
}
Expand All @@ -1839,7 +1842,8 @@ static bool isVectorPromotionViableForSlice(Partition &P, const Slice &S,
// Disable vector promotion when there are loads or stores of an FCA.
if (STy->isStructTy())
return false;
if (P.beginOffset() > S.beginOffset() || P.endOffset() < S.endOffset()) {
if (SplitIntTy &&
(P.beginOffset() > S.beginOffset() || P.endOffset() < S.endOffset())) {
assert(STy->isIntegerTy());
STy = SplitIntTy;
}
Expand Down Expand Up @@ -1934,6 +1938,9 @@ static VectorType *isVectorPromotionViable(Partition &P, const DataLayout &DL) {
CheckCandidateType(LI->getType());
else if (auto *SI = dyn_cast<StoreInst>(S.getUse()->getUser()))
CheckCandidateType(SI->getValueOperand()->getType());
else if (auto *MTI = dyn_cast<MemIntrinsic>(S.getUse()->getUser()))
CheckCandidateType(FixedVectorType::get(
IntegerType::getInt8Ty(MTI->getContext()), P.size()));
}

// If we didn't find a vector type, nothing to do here.
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/AMDGPU/v1024.ll
Expand Up @@ -4,7 +4,7 @@

; GCN-LABEL: {{^}}test_v1024:
; GCN-NOT: v_accvgpr
; GCN-COUNT-32: v_mov_b32_e32
; GCN-COUNT-10: v_mov_b32_e32
; GCN-NOT: v_accvgpr
define amdgpu_kernel void @test_v1024() {
entry:
Expand Down
6 changes: 2 additions & 4 deletions llvm/test/DebugInfo/X86/sroasplit-1.ll
Expand Up @@ -20,10 +20,8 @@
;

; Verify that SROA creates a variable piece when splitting i1.
; CHECK: %[[I1:.*]] = alloca [12 x i8], align 4
; CHECK: call void @llvm.dbg.declare(metadata [12 x i8]* %[[I1]], metadata ![[VAR:[0-9]+]], metadata !DIExpression(DW_OP_LLVM_fragment, 32, 96))
; CHECK: call void @llvm.dbg.value(metadata i32 %[[A:.*]], metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32))
; CHECK: ret i32 %[[A]]
; CHECK: %[[I1:.*]] = load <12 x i8>,
; CHECK: call void @llvm.dbg.value(metadata <12 x i8> %[[I1]], metadata ![[VAR:.*]], metadata !DIExpression(DW_OP_LLVM_fragment, 32, 96))
; Read Var and Piece:
; CHECK: ![[VAR]] = !DILocalVariable(name: "i1",{{.*}} line: 11,

Expand Down
20 changes: 10 additions & 10 deletions llvm/test/DebugInfo/X86/sroasplit-4.ll
@@ -1,28 +1,28 @@
; RUN: opt -sroa < %s -S -o - | FileCheck %s
;
; Test that recursively splitting an alloca updates the debug info correctly.
; CHECK: %[[T:.*]] = load i64, i64* @t, align 8
; CHECK: call void @llvm.dbg.value(metadata i64 %[[T]], metadata ![[Y:.*]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64))
; CHECK: %[[T1:.*]] = load i64, i64* @t, align 8
; CHECK: call void @llvm.dbg.value(metadata i64 %[[T1]], metadata ![[Y]], metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64))
; CHECK: call void @llvm.dbg.value(metadata i64 %[[T]], metadata ![[R:.*]], metadata !DIExpression(DW_OP_LLVM_fragment, 192, 64))
; CHECK: call void @llvm.dbg.value(metadata i64 %[[T1]], metadata ![[R]], metadata !DIExpression(DW_OP_LLVM_fragment, 256, 64))
;
; CHECK: call void @llvm.dbg.value(metadata <16 x i8> %[[Y_VEC:.*]], metadata ![[Y:.*]], metadata !DIExpression())
; CHECK: call void @llvm.dbg.value(metadata <16 x i8> %[[Y_VEC1:.*]], metadata ![[Y]], metadata !DIExpression())
; CHECK: call void @llvm.dbg.value(metadata i32 0, metadata ![[R:.*]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32))
; CHECK: call void @llvm.dbg.value(metadata i64 0, metadata ![[R]], metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64))
; CHECK: call void @llvm.dbg.value(metadata i64 0, metadata ![[R]], metadata !DIExpression(DW_OP_LLVM_fragment, 128, 64))
; CHECK: call void @llvm.dbg.value(metadata <16 x i8> %[[Y_VEC1]], metadata ![[R]], metadata !DIExpression(DW_OP_LLVM_fragment, 192, 128))
;
; struct p {
; __SIZE_TYPE__ s;
; __SIZE_TYPE__ t;
; };
;
;
; struct r {
; int i;
; struct p x;
; struct p y;
; };
;
;
; extern int call_me(struct r);
; extern int maybe();
; extern __SIZE_TYPE__ t;
;
;
; int test() {
; if (maybe())
; return 0;
Expand Down
20 changes: 11 additions & 9 deletions llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
Expand Up @@ -68,12 +68,13 @@ define dso_local i32* @_Z3foo1S(%0* byval(%0) align 8 %arg) {
; CHECK-LABEL: @_Z3foo1S(
; CHECK-NEXT: bb:
; CHECK-NEXT: [[I2:%.*]] = alloca [[TMP0:%.*]], align 8
; CHECK-NEXT: [[I1_SROA_0_0_I5_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0]], %0* [[ARG:%.*]], i64 0, i32 0
; CHECK-NEXT: [[I1_SROA_0_0_COPYLOAD:%.*]] = load i32*, i32** [[I1_SROA_0_0_I5_SROA_IDX]], align 8
; CHECK-NEXT: [[TMP0]] = bitcast %0* [[ARG:%.*]] to i64*
; CHECK-NEXT: [[I11_SROA_0_0_VEC_EXTRACT_EXTRACT:%.*]] = load i64, i64* [[TMP0]], align 8
; CHECK-NEXT: [[TMP1:%.*]] = inttoptr i64 [[I11_SROA_0_0_VEC_EXTRACT_EXTRACT]] to i32*
; CHECK-NEXT: [[I_SROA_0_0_I6_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0]], %0* [[I2]], i64 0, i32 0
; CHECK-NEXT: store i32* [[I1_SROA_0_0_COPYLOAD]], i32** [[I_SROA_0_0_I6_SROA_IDX]], align 8
; CHECK-NEXT: store i32* [[TMP1]], i32** [[I_SROA_0_0_I6_SROA_IDX]], align 8
; CHECK-NEXT: tail call void @_Z7escape01S(%0* nonnull byval([[TMP0]]) align 8 [[I2]])
; CHECK-NEXT: ret i32* [[I1_SROA_0_0_COPYLOAD]]
; CHECK-NEXT: ret i32* [[TMP1]]
;
bb:
%i = alloca %0, align 8
Expand Down Expand Up @@ -107,21 +108,22 @@ declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture)
define dso_local i32* @_Z3bar1S(%0* byval(%0) align 8 %arg) {
; CHECK-LABEL: @_Z3bar1S(
; CHECK-NEXT: bb:
; CHECK-NEXT: [[I1_SROA_0_0_I4_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0:%.*]], %0* [[ARG:%.*]], i64 0, i32 0
; CHECK-NEXT: [[I1_SROA_0_0_COPYLOAD:%.*]] = load i32*, i32** [[I1_SROA_0_0_I4_SROA_IDX]], align 8
; CHECK-NEXT: [[TMP0:%.*]] = bitcast %0* [[ARG:%.*]] to i64*
; CHECK-NEXT: [[I13_SROA_0_0_VEC_EXTRACT_EXTRACT:%.*]] = load i64, i64* [[TMP0]], align 8
; CHECK-NEXT: [[TMP1:%.*]] = inttoptr i64 [[I13_SROA_0_0_VEC_EXTRACT_EXTRACT]] to i32*
; CHECK-NEXT: [[I5:%.*]] = tail call i32 @_Z4condv()
; CHECK-NEXT: [[I6_NOT:%.*]] = icmp eq i32 [[I5]], 0
; CHECK-NEXT: br i1 [[I6_NOT]], label [[BB10:%.*]], label [[BB7:%.*]]
; CHECK: bb7:
; CHECK-NEXT: tail call void @_Z5sync0v()
; CHECK-NEXT: tail call void @_Z7escape0Pi(i32* [[I1_SROA_0_0_COPYLOAD]])
; CHECK-NEXT: tail call void @_Z7escape0Pi(i32* [[TMP1]])
; CHECK-NEXT: br label [[BB13:%.*]]
; CHECK: bb10:
; CHECK-NEXT: tail call void @_Z5sync1v()
; CHECK-NEXT: tail call void @_Z7escape1Pi(i32* [[I1_SROA_0_0_COPYLOAD]])
; CHECK-NEXT: tail call void @_Z7escape1Pi(i32* [[TMP1]])
; CHECK-NEXT: br label [[BB13]]
; CHECK: bb13:
; CHECK-NEXT: ret i32* [[I1_SROA_0_0_COPYLOAD]]
; CHECK-NEXT: ret i32* [[TMP1]]
;
bb:
%i = alloca %0, align 8
Expand Down
12 changes: 6 additions & 6 deletions llvm/test/Transforms/SROA/address-spaces.ll
Expand Up @@ -11,8 +11,8 @@ declare void @llvm.memcpy.p1.p1.i32(ptr addrspace(1) nocapture, ptr addrspace(1)
; Make sure an illegal bitcast isn't introduced
define void @test_address_space_1_1(ptr addrspace(1) %a, ptr addrspace(1) %b) {
; CHECK-LABEL: @test_address_space_1_1(
; CHECK-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: ret void
;
%aa = alloca <2 x i64>, align 16
Expand All @@ -23,8 +23,8 @@ define void @test_address_space_1_1(ptr addrspace(1) %a, ptr addrspace(1) %b) {

define void @test_address_space_1_0(ptr addrspace(1) %a, ptr %b) {
; CHECK-LABEL: @test_address_space_1_0(
; CHECK-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr [[B:%.*]], align 2
; CHECK-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr [[B:%.*]], align 2
; CHECK-NEXT: ret void
;
%aa = alloca <2 x i64>, align 16
Expand All @@ -35,8 +35,8 @@ define void @test_address_space_1_0(ptr addrspace(1) %a, ptr %b) {

define void @test_address_space_0_1(ptr %a, ptr addrspace(1) %b) {
; CHECK-LABEL: @test_address_space_0_1(
; CHECK-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr [[A:%.*]], align 2
; CHECK-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr [[A:%.*]], align 2
; CHECK-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: ret void
;
%aa = alloca <2 x i64>, align 16
Expand Down
24 changes: 10 additions & 14 deletions llvm/test/Transforms/SROA/alignment.ll
Expand Up @@ -92,15 +92,15 @@ define void @PR13920(ptr %a, ptr %b) {
; Test that alignments on memcpy intrinsics get propagated to loads and stores.
; CHECK-LABEL: @PR13920(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr [[A:%.*]], align 2
; CHECK-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr [[B:%.*]], align 2
; CHECK-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr [[A:%.*]], align 2
; CHECK-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr [[B:%.*]], align 2
; CHECK-NEXT: ret void
;
; DEBUGLOC-LABEL: @PR13920(
; DEBUGLOC-NEXT: entry:
; DEBUGLOC-NEXT: call void @llvm.dbg.value(metadata ptr undef, metadata [[META37:![0-9]+]], metadata !DIExpression()), !dbg [[DBG38:![0-9]+]]
; DEBUGLOC-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr [[A:%.*]], align 2, !dbg [[DBG39:![0-9]+]]
; DEBUGLOC-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr [[B:%.*]], align 2, !dbg [[DBG40:![0-9]+]]
; DEBUGLOC-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr [[A:%.*]], align 2, !dbg [[DBG39:![0-9]+]]
; DEBUGLOC-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr [[B:%.*]], align 2, !dbg [[DBG40:![0-9]+]]
; DEBUGLOC-NEXT: ret void, !dbg [[DBG41:![0-9]+]]
;

Expand All @@ -118,21 +118,17 @@ define void @test3(ptr %x) {
; reduce the alignment.
; CHECK-LABEL: @test3(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[A_SROA_0:%.*]] = alloca [22 x i8], align 8
; CHECK-NEXT: [[B_SROA_0:%.*]] = alloca [18 x i8], align 2
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 8 [[A_SROA_0]], ptr align 8 [[X:%.*]], i32 22, i1 false)
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 2 [[B_SROA_0]], ptr align 2 [[X]], i32 18, i1 false)
; CHECK-NEXT: [[A_SROA_0_0_COPYLOAD:%.*]] = load <22 x i8>, ptr [[X:%.*]], align 8
; CHECK-NEXT: [[B_SROA_0_6_COPYLOAD:%.*]] = load <18 x i8>, ptr [[X]], align 2
; CHECK-NEXT: ret void
;
; DEBUGLOC-LABEL: @test3(
; DEBUGLOC-NEXT: entry:
; DEBUGLOC-NEXT: [[A_SROA_0:%.*]] = alloca [22 x i8], align 8, !dbg [[DBG47:![0-9]+]]
; DEBUGLOC-NEXT: call void @llvm.dbg.value(metadata ptr undef, metadata [[META44:![0-9]+]], metadata !DIExpression()), !dbg [[DBG47]]
; DEBUGLOC-NEXT: [[B_SROA_0:%.*]] = alloca [18 x i8], align 2, !dbg [[DBG48:![0-9]+]]
; DEBUGLOC-NEXT: call void @llvm.dbg.value(metadata ptr undef, metadata [[META45:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48]]
; DEBUGLOC-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 8 [[A_SROA_0]], ptr align 8 [[X:%.*]], i32 22, i1 false), !dbg [[DBG49:![0-9]+]]
; DEBUGLOC-NEXT: call void @llvm.dbg.value(metadata ptr undef, metadata [[META44:![0-9]+]], metadata !DIExpression()), !dbg [[DBG47:![0-9]+]]
; DEBUGLOC-NEXT: call void @llvm.dbg.value(metadata ptr undef, metadata [[META45:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48:![0-9]+]]
; DEBUGLOC-NEXT: [[A_SROA_0_0_COPYLOAD:%.*]] = load <22 x i8>, ptr [[X:%.*]], align 8, !dbg [[DBG49:![0-9]+]]
; DEBUGLOC-NEXT: call void @llvm.dbg.value(metadata ptr undef, metadata [[META46:![0-9]+]], metadata !DIExpression()), !dbg [[DBG50:![0-9]+]]
; DEBUGLOC-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 2 [[B_SROA_0]], ptr align 2 [[X]], i32 18, i1 false), !dbg [[DBG51:![0-9]+]]
; DEBUGLOC-NEXT: [[B_SROA_0_6_COPYLOAD:%.*]] = load <18 x i8>, ptr [[X]], align 2, !dbg [[DBG51:![0-9]+]]
; DEBUGLOC-NEXT: ret void, !dbg [[DBG52:![0-9]+]]
;

Expand Down
12 changes: 6 additions & 6 deletions llvm/test/Transforms/SROA/alloca-address-space.ll
Expand Up @@ -10,8 +10,8 @@ declare void @llvm.memcpy.p1.p1.i32(ptr addrspace(1) nocapture, ptr addrspace(1)

define void @test_address_space_1_1(ptr addrspace(1) %a, ptr addrspace(1) %b) {
; CHECK-LABEL: @test_address_space_1_1(
; CHECK-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: ret void
;
%aa = alloca <2 x i64>, align 16, addrspace(2)
Expand All @@ -22,8 +22,8 @@ define void @test_address_space_1_1(ptr addrspace(1) %a, ptr addrspace(1) %b) {

define void @test_address_space_1_0(ptr addrspace(1) %a, ptr addrspace(2) %b) {
; CHECK-LABEL: @test_address_space_1_0(
; CHECK-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr addrspace(2) [[B:%.*]], align 2
; CHECK-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr addrspace(1) [[A:%.*]], align 2
; CHECK-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr addrspace(2) [[B:%.*]], align 2
; CHECK-NEXT: ret void
;
%aa = alloca <2 x i64>, align 16, addrspace(2)
Expand All @@ -34,8 +34,8 @@ define void @test_address_space_1_0(ptr addrspace(1) %a, ptr addrspace(2) %b) {

define void @test_address_space_0_1(ptr addrspace(2) %a, ptr addrspace(1) %b) {
; CHECK-LABEL: @test_address_space_0_1(
; CHECK-NEXT: [[AA_0_COPYLOAD:%.*]] = load <2 x i64>, ptr addrspace(2) [[A:%.*]], align 2
; CHECK-NEXT: store <2 x i64> [[AA_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: [[AA_SROA_0_0_COPYLOAD:%.*]] = load <16 x i8>, ptr addrspace(2) [[A:%.*]], align 2
; CHECK-NEXT: store <16 x i8> [[AA_SROA_0_0_COPYLOAD]], ptr addrspace(1) [[B:%.*]], align 2
; CHECK-NEXT: ret void
;
%aa = alloca <2 x i64>, align 16, addrspace(2)
Expand Down

7 comments on commit cf624b2

@mstorsjo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit caused two regressions for me. One case where compilation triggers failed asserts, one case where compilation time ballooned from <2 seconds to >100 seconds.

The failed assert is reproducible with a reduced testcase in https://martin.st/temp/sroa-assert.c like this:

$ clang -target x86_64-linux-gnu -w -c sroa-assert.c -O2
clang: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:11987: void llvm::SelectionDAG::createOperands(llvm::SDNode*, llvm::ArrayRef<llvm::SDValue>): Assertion `SDNode::getMaxNumOperands() >= Vals.size() && "too many operands to fit into SDNode
"' failed.

The full original unreduced testcase is available in https://martin.st/temp/placebo-preproc.c - reproducible in the same way, but this case requires building with -target aarch64-w64-mingw32.

The compilation time regression can be reproduced with https://martin.st/temp/h264_slice-preproc.c, with clang -target aarch64-linux-gnu -w -c h264_slice-preproc.c -O2.

@d0k
Copy link
Member

@d0k d0k commented on cf624b2 Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this for now in 5cfc22c. I'm still seeing crashes after 655d857.

assert.h assertion failed at llvm/include/llvm/Support/Casting.h:578 in decltype(auto) llvm::cast(From *) [To = llvm::BuildVectorSDNo
de, From = llvm::SDNode]: isa<To>(Val) && "cast<Ty>() argument of incompatible type!"
*** Check failure stack trace: ***
    @     0x55d43177f9a4  __assert_fail
    @     0x55d42eb087a2  llvm::AArch64TargetLowering::LowerBUILD_VECTOR()
    @     0x55d42eb00341  llvm::AArch64TargetLowering::LowerOperation()
    @     0x55d430192825  (anonymous namespace)::SelectionDAGLegalize::LegalizeOp()

Let me know if you need another test case.

@LebedevRI
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revert.
It sounds like @d0k's assertion is different, so it may be good to have a repro for it.

@jyknight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit also broke PPC64 2-stage builders (confirmed as fixed by the revert), with an apparent miscompile of llvm-tblgen causing them to produce invalid source-code.
https://lab.llvm.org/buildbot/#/builders/121/builds/25465
https://lab.llvm.org/buildbot/#/builders/36/builds/27549

@DavidSpickett
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also had issues on one of our 32 bit bots https://lab.llvm.org/buildbot/#/builders/182/builds/4502. Instead of asserts we just got timeouts, maybe if I'd left it it would have eventually asserted. Was fixed by the revert.

For whatever reason it got stuck on lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertWaitcnts.cpp. If this turns out to be a different issue I can get you a reproducer for it.

@d0k
Copy link
Member

@d0k d0k commented on cf624b2 Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you need another test case.

It took creduce 5h to produce it, but here's another reproducer: https://gist.github.com/d0k/2bf4e6a4e5642d053c5d84fc66a33c5b

@LebedevRI
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, PPC miscompile is still there, re-reverted. Help wanted.

Please sign in to comment.