Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN #110714

Merged
merged 3 commits into from
Oct 2, 2024

Conversation

krzysz00
Copy link
Contributor

@krzysz00 krzysz00 commented Oct 1, 2024

Certain pointer address spaces were not being correctly handled by the GlobalISel lowering for buffer_load and buffer_store.

  1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns defined for them, while p0 did, since those pointer types weren't in the list of types that was iterated to form the patterns.
  2. Vectors of pointers need to be bitcast to vectors of the corresponding scalars, since there doesn't seem to be a good way to define the rewrite patterns for buffer_load/store of those types

The need to bitcast vectors of pointers was also revealed to affect ordinary G_LOAD and G_STORE in some cases, so shouldBitcastLoadStore() has been fixed to handle it properly.

Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_store.

1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns
defined for them, while p0 did, since those pointer types weren't in
the list of types that was iterated to form the patterns.
2. Vectors of pointers need to be bitcast to vectors of the
corresponding scalars, since there doesn't seem to be a good way to
define the rewrite patterns for buffer_load/store of those types
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 1, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Krzysztof Drewniak (krzysz00)

Changes

Certain pointer address spaces were not being correctly handled by the GlobalISel lowering for buffer_load and buffer_store.

  1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns defined for them, while p0 did, since those pointer types weren't in the list of types that was iterated to form the patterns.
  2. Vectors of pointers need to be bitcast to vectors of the corresponding scalars, since there doesn't seem to be a good way to define the rewrite patterns for buffer_load/store of those types

Patch is 28.59 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110714.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+40-16)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h (+5-7)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+1-1)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/buffer-load-store-pointers.ll (+301)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 271c8d45fd4a21..1da029444027e0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -5794,8 +5794,9 @@ Register AMDGPULegalizerInfo::handleD16VData(MachineIRBuilder &B,
   return Reg;
 }
 
-Register AMDGPULegalizerInfo::fixStoreSourceType(
-  MachineIRBuilder &B, Register VData, bool IsFormat) const {
+Register AMDGPULegalizerInfo::fixStoreSourceType(MachineIRBuilder &B,
+                                                 Register VData, LLT MemTy,
+                                                 bool IsFormat) const {
   MachineRegisterInfo *MRI = B.getMRI();
   LLT Ty = MRI->getType(VData);
 
@@ -5805,6 +5806,10 @@ Register AMDGPULegalizerInfo::fixStoreSourceType(
   if (hasBufferRsrcWorkaround(Ty))
     return castBufferRsrcToV4I32(VData, B);
 
+  if (shouldBitcastLoadStoreType(ST, Ty, MemTy) || Ty.isPointerVector()) {
+    Ty = getBitcastRegisterType(Ty);
+    VData = B.buildBitcast(Ty, VData).getReg(0);
+  }
   // Fixup illegal register types for i8 stores.
   if (Ty == LLT::scalar(8) || Ty == S16) {
     Register AnyExt = B.buildAnyExt(LLT::scalar(32), VData).getReg(0);
@@ -5822,22 +5827,26 @@ Register AMDGPULegalizerInfo::fixStoreSourceType(
 }
 
 bool AMDGPULegalizerInfo::legalizeBufferStore(MachineInstr &MI,
-                                              MachineRegisterInfo &MRI,
-                                              MachineIRBuilder &B,
+                                              LegalizerHelper &Helper,
                                               bool IsTyped,
                                               bool IsFormat) const {
+  MachineIRBuilder &B = Helper.MIRBuilder;
+  MachineRegisterInfo &MRI = *B.getMRI();
+
   Register VData = MI.getOperand(1).getReg();
   LLT Ty = MRI.getType(VData);
   LLT EltTy = Ty.getScalarType();
   const bool IsD16 = IsFormat && (EltTy.getSizeInBits() == 16);
   const LLT S32 = LLT::scalar(32);
 
-  VData = fixStoreSourceType(B, VData, IsFormat);
-  castBufferRsrcArgToV4I32(MI, B, 2);
-  Register RSrc = MI.getOperand(2).getReg();
-
   MachineMemOperand *MMO = *MI.memoperands_begin();
   const int MemSize = MMO->getSize().getValue();
+  LLT MemTy = MMO->getMemoryType();
+
+  VData = fixStoreSourceType(B, VData, MemTy, IsFormat);
+
+  castBufferRsrcArgToV4I32(MI, B, 2);
+  Register RSrc = MI.getOperand(2).getReg();
 
   unsigned ImmOffset;
 
@@ -5930,10 +5939,13 @@ static void buildBufferLoad(unsigned Opc, Register LoadDstReg, Register RSrc,
 }
 
 bool AMDGPULegalizerInfo::legalizeBufferLoad(MachineInstr &MI,
-                                             MachineRegisterInfo &MRI,
-                                             MachineIRBuilder &B,
+                                             LegalizerHelper &Helper,
                                              bool IsFormat,
                                              bool IsTyped) const {
+  MachineIRBuilder &B = Helper.MIRBuilder;
+  MachineRegisterInfo &MRI = *B.getMRI();
+  GISelChangeObserver &Observer = Helper.Observer;
+
   // FIXME: Verifier should enforce 1 MMO for these intrinsics.
   MachineMemOperand *MMO = *MI.memoperands_begin();
   const LLT MemTy = MMO->getMemoryType();
@@ -5982,9 +5994,21 @@ bool AMDGPULegalizerInfo::legalizeBufferLoad(MachineInstr &MI,
   // Make addrspace 8 pointers loads into 4xs32 loads here, so the rest of the
   // logic doesn't have to handle that case.
   if (hasBufferRsrcWorkaround(Ty)) {
+    Observer.changingInstr(MI);
     Ty = castBufferRsrcFromV4I32(MI, B, MRI, 0);
+    Observer.changedInstr(MI);
     Dst = MI.getOperand(0).getReg();
+    B.setInsertPt(B.getMBB(), MI);
   }
+  if (shouldBitcastLoadStoreType(ST, Ty, MemTy) || Ty.isPointerVector()) {
+    Ty = getBitcastRegisterType(Ty);
+    Observer.changingInstr(MI);
+    Helper.bitcastDst(MI, Ty, 0);
+    Observer.changedInstr(MI);
+    Dst = MI.getOperand(0).getReg();
+    B.setInsertPt(B.getMBB(), MI);
+  }
+
   LLT EltTy = Ty.getScalarType();
   const bool IsD16 = IsFormat && (EltTy.getSizeInBits() == 16);
   const bool Unpacked = ST.hasUnpackedD16VMem();
@@ -7364,17 +7388,17 @@ bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   case Intrinsic::amdgcn_raw_ptr_buffer_store:
   case Intrinsic::amdgcn_struct_buffer_store:
   case Intrinsic::amdgcn_struct_ptr_buffer_store:
-    return legalizeBufferStore(MI, MRI, B, false, false);
+    return legalizeBufferStore(MI, Helper, false, false);
   case Intrinsic::amdgcn_raw_buffer_store_format:
   case Intrinsic::amdgcn_raw_ptr_buffer_store_format:
   case Intrinsic::amdgcn_struct_buffer_store_format:
   case Intrinsic::amdgcn_struct_ptr_buffer_store_format:
-    return legalizeBufferStore(MI, MRI, B, false, true);
+    return legalizeBufferStore(MI, Helper, false, true);
   case Intrinsic::amdgcn_raw_tbuffer_store:
   case Intrinsic::amdgcn_raw_ptr_tbuffer_store:
   case Intrinsic::amdgcn_struct_tbuffer_store:
   case Intrinsic::amdgcn_struct_ptr_tbuffer_store:
-    return legalizeBufferStore(MI, MRI, B, true, true);
+    return legalizeBufferStore(MI, Helper, true, true);
   case Intrinsic::amdgcn_raw_buffer_load:
   case Intrinsic::amdgcn_raw_ptr_buffer_load:
   case Intrinsic::amdgcn_raw_atomic_buffer_load:
@@ -7383,17 +7407,17 @@ bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   case Intrinsic::amdgcn_struct_ptr_buffer_load:
   case Intrinsic::amdgcn_struct_atomic_buffer_load:
   case Intrinsic::amdgcn_struct_ptr_atomic_buffer_load:
-    return legalizeBufferLoad(MI, MRI, B, false, false);
+    return legalizeBufferLoad(MI, Helper, false, false);
   case Intrinsic::amdgcn_raw_buffer_load_format:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_format:
   case Intrinsic::amdgcn_struct_buffer_load_format:
   case Intrinsic::amdgcn_struct_ptr_buffer_load_format:
-    return legalizeBufferLoad(MI, MRI, B, true, false);
+    return legalizeBufferLoad(MI, Helper, true, false);
   case Intrinsic::amdgcn_raw_tbuffer_load:
   case Intrinsic::amdgcn_raw_ptr_tbuffer_load:
   case Intrinsic::amdgcn_struct_tbuffer_load:
   case Intrinsic::amdgcn_struct_ptr_tbuffer_load:
-    return legalizeBufferLoad(MI, MRI, B, true, true);
+    return legalizeBufferLoad(MI, Helper, true, true);
   case Intrinsic::amdgcn_raw_buffer_atomic_swap:
   case Intrinsic::amdgcn_raw_ptr_buffer_atomic_swap:
   case Intrinsic::amdgcn_struct_buffer_atomic_swap:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
index 84470dc75b60ef..86c15197805d23 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
@@ -195,15 +195,13 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
 
   Register handleD16VData(MachineIRBuilder &B, MachineRegisterInfo &MRI,
                           Register Reg, bool ImageStore = false) const;
-  Register fixStoreSourceType(MachineIRBuilder &B, Register VData,
+  Register fixStoreSourceType(MachineIRBuilder &B, Register VData, LLT MemTy,
                               bool IsFormat) const;
 
-  bool legalizeBufferStore(MachineInstr &MI, MachineRegisterInfo &MRI,
-                           MachineIRBuilder &B, bool IsTyped,
-                           bool IsFormat) const;
-  bool legalizeBufferLoad(MachineInstr &MI, MachineRegisterInfo &MRI,
-                          MachineIRBuilder &B, bool IsFormat,
-                          bool IsTyped) const;
+  bool legalizeBufferStore(MachineInstr &MI, LegalizerHelper &Helper,
+                           bool IsTyped, bool IsFormat) const;
+  bool legalizeBufferLoad(MachineInstr &MI, LegalizerHelper &Helper,
+                          bool IsFormat, bool IsTyped) const;
   bool legalizeBufferAtomic(MachineInstr &MI, MachineIRBuilder &B,
                             Intrinsic::ID IID) const;
 
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index ef9adde13348fe..902feacede83f4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -590,7 +590,7 @@ class RegisterTypes<list<ValueType> reg_types> {
 
 def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
 def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>;
-def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, v4bf16]>;
+def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, p1, p4, v4i16, v4f16, v4bf16]>;
 def Reg96Types : RegisterTypes<[v3i32, v3f32]>;
 def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16]>;
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/buffer-load-store-pointers.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/buffer-load-store-pointers.ll
new file mode 100644
index 00000000000000..091c9f143ce7ee
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/buffer-load-store-pointers.ll
@@ -0,0 +1,301 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -global-isel -mtriple=amdgcn -mcpu=gfx90a -verify-machineinstrs -stop-after=instruction-select < %s | FileCheck --check-prefix=GFX9 %s
+
+define ptr @buffer_load_p0(ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_load_p0
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3
+  ; GFX9-NEXT:   [[BUFFER_LOAD_DWORDX2_OFFSET:%[0-9]+]]:vreg_64_align2 = BUFFER_LOAD_DWORDX2_OFFSET [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable load (s64) from %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   [[COPY4:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX2_OFFSET]].sub0
+  ; GFX9-NEXT:   [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX2_OFFSET]].sub1
+  ; GFX9-NEXT:   $vgpr0 = COPY [[COPY4]]
+  ; GFX9-NEXT:   $vgpr1 = COPY [[COPY5]]
+  ; GFX9-NEXT:   SI_RETURN implicit $vgpr0, implicit $vgpr1
+  %ret = call ptr @llvm.amdgcn.raw.ptr.buffer.load.p0(ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret ptr %ret
+}
+
+define void @buffer_store_p0(ptr %data, ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_store_p0
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17, $vgpr0, $vgpr1
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:vreg_64_align2 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY2]], %subreg.sub0, [[COPY3]], %subreg.sub1, [[COPY4]], %subreg.sub2, [[COPY5]], %subreg.sub3
+  ; GFX9-NEXT:   BUFFER_STORE_DWORDX2_OFFSET_exact [[REG_SEQUENCE]], [[REG_SEQUENCE1]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable store (s64) into %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   SI_RETURN
+  call void @llvm.amdgcn.raw.ptr.buffer.store.p0(ptr %data, ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret void
+}
+
+define ptr addrspace(1) @buffer_load_p1(ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_load_p1
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3
+  ; GFX9-NEXT:   [[BUFFER_LOAD_DWORDX2_OFFSET:%[0-9]+]]:vreg_64_align2 = BUFFER_LOAD_DWORDX2_OFFSET [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable load (s64) from %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   [[COPY4:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX2_OFFSET]].sub0
+  ; GFX9-NEXT:   [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX2_OFFSET]].sub1
+  ; GFX9-NEXT:   $vgpr0 = COPY [[COPY4]]
+  ; GFX9-NEXT:   $vgpr1 = COPY [[COPY5]]
+  ; GFX9-NEXT:   SI_RETURN implicit $vgpr0, implicit $vgpr1
+  %ret = call ptr addrspace(1) @llvm.amdgcn.raw.ptr.buffer.load.p1(ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret ptr addrspace(1) %ret
+}
+
+define void @buffer_store_p1(ptr addrspace(1) %data, ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_store_p1
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17, $vgpr0, $vgpr1
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:vreg_64_align2 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY2]], %subreg.sub0, [[COPY3]], %subreg.sub1, [[COPY4]], %subreg.sub2, [[COPY5]], %subreg.sub3
+  ; GFX9-NEXT:   BUFFER_STORE_DWORDX2_OFFSET_exact [[REG_SEQUENCE]], [[REG_SEQUENCE1]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable store (s64) into %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   SI_RETURN
+  call void @llvm.amdgcn.raw.ptr.buffer.store.p1(ptr addrspace(1) %data, ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret void
+}
+
+define ptr addrspace(4) @buffer_load_p4(ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_load_p4
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3
+  ; GFX9-NEXT:   [[BUFFER_LOAD_DWORDX2_OFFSET:%[0-9]+]]:vreg_64_align2 = BUFFER_LOAD_DWORDX2_OFFSET [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable load (s64) from %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   [[COPY4:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX2_OFFSET]].sub0
+  ; GFX9-NEXT:   [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX2_OFFSET]].sub1
+  ; GFX9-NEXT:   $vgpr0 = COPY [[COPY4]]
+  ; GFX9-NEXT:   $vgpr1 = COPY [[COPY5]]
+  ; GFX9-NEXT:   SI_RETURN implicit $vgpr0, implicit $vgpr1
+  %ret = call ptr addrspace(4) @llvm.amdgcn.raw.ptr.buffer.load.p4(ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret ptr addrspace(4) %ret
+}
+
+define void @buffer_store_p4(ptr addrspace(4) %data, ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_store_p4
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17, $vgpr0, $vgpr1
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:vreg_64_align2 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY2]], %subreg.sub0, [[COPY3]], %subreg.sub1, [[COPY4]], %subreg.sub2, [[COPY5]], %subreg.sub3
+  ; GFX9-NEXT:   BUFFER_STORE_DWORDX2_OFFSET_exact [[REG_SEQUENCE]], [[REG_SEQUENCE1]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable store (s64) into %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   SI_RETURN
+  call void @llvm.amdgcn.raw.ptr.buffer.store.p4(ptr addrspace(4) %data, ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret void
+}
+
+define ptr addrspace(5) @buffer_load_p5(ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_load_p5
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3
+  ; GFX9-NEXT:   [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable load (s32) from %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
+  ; GFX9-NEXT:   SI_RETURN implicit $vgpr0
+  %ret = call ptr addrspace(5) @llvm.amdgcn.raw.ptr.buffer.load.p5(ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret ptr addrspace(5) %ret
+}
+
+define void @buffer_store_p5(ptr addrspace(5) %data, ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_store_p5
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17, $vgpr0
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3
+  ; GFX9-NEXT:   BUFFER_STORE_DWORD_OFFSET_exact [[COPY]], [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable store (s32) into %ir.buf, align 1, addrspace 8)
+  ; GFX9-NEXT:   SI_RETURN
+  call void @llvm.amdgcn.raw.ptr.buffer.store.p5(ptr addrspace(5) %data, ptr addrspace(8) inreg %buf, i32 0, i32 0, i32 0)
+  ret void
+}
+
+define <2 x ptr addrspace(1)> @buffer_load_v2p1(ptr addrspace(8) inreg %buf) {
+  ; GFX9-LABEL: name: buffer_load_v2p1
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9-NEXT:   liveins: $sgpr6, $sgpr7, $sgpr16, $sgpr17
+  ; GFX9-NEXT: {{  $}}
+  ; GFX9-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX9-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr7
+  ; GFX9-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr16
+  ; GFX9-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr17
+  ; GFX9-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+  ; GFX9-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3
+  ; GFX9-NEXT:   [[BUFFER_LOAD_DWORDX4_OFFSET:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFSET [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec :: (dereferenceable load (<2 x...
[truncated]

@arsenm
Copy link
Contributor

arsenm commented Oct 1, 2024

  1. Vectors of pointers need to be bitcast to vectors of the corresponding scalars, since there doesn't seem to be a good way to define the rewrite patterns for buffer_load/store of those types

I'd hope you can declare them with VTVec and PtrValueType, but it probably requires new GlobalISelEmitter support. VTVec wants the enum value from MVT for some reason

@@ -5805,6 +5806,10 @@ Register AMDGPULegalizerInfo::fixStoreSourceType(
if (hasBufferRsrcWorkaround(Ty))
return castBufferRsrcToV4I32(VData, B);

if (shouldBitcastLoadStoreType(ST, Ty, MemTy) || Ty.isPointerVector()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Ty.isPointerVector() is redundant, these should be handled under the loadStoreBitcastWorkaround case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't. All the pointers vectors fall to the Size <= 64 case and so don't get bitcast

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like just a bug, and this should be folded into shouldBitcastLoadStoreType. It already tries to handle this case, just too late

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that a regular G_LOAD or G_STORE can handle vectors of pointers just fine, and moving the condition changes a bunch of tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they do work fine. The buffer cases should behave identically to G_LOAD and G_STORE, and both currently rely on the bitcast legalization.

Based on the 64-bit check, the 128-bit <2 x ptr> cases work. The 64-bit cases, e.g. <2 x ptr addrspace(3)> fail:
https://godbolt.org/z/4vWacff1d

So yes, tests should change and start working

}
if (shouldBitcastLoadStoreType(ST, Ty, MemTy) || Ty.isPointerVector()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty.isPointerVector should be redundant here too

@krzysz00
Copy link
Contributor Author

krzysz00 commented Oct 1, 2024

And yep, I tried to do the VTVec thing (since SPIR-V has a v2p64) but then I started getting tablegen errors when trying to generate patterns for v2p3 (there was some complaining about the type set being empty)

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does show we have missing end to end test coverage for vectors of pointers

@krzysz00 krzysz00 changed the title [AMDGPU][GlobalISel] Allow selecting buffer_load/store of pointers [AMDGPU][GlobalISel] Fix selecting load/store of pointer vectors, buffer load/store of pointers Oct 2, 2024
@krzysz00 krzysz00 changed the title [AMDGPU][GlobalISel] Fix selecting load/store of pointer vectors, buffer load/store of pointers [AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN Oct 2, 2024
@krzysz00 krzysz00 merged commit ae5bd2a into llvm:main Oct 2, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 2, 2024

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/4831

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: lang/cpp/fixits/TestCppFixIts.py (776 of 2814)
PASS: lldb-api :: lang/cpp/fpnan/TestFPNaN.py (777 of 2814)
PASS: lldb-api :: lang/cpp/frame-var-anon-unions/TestFrameVariableAnonymousUnions.py (778 of 2814)
PASS: lldb-api :: lang/cpp/frame-var-depth-and-elem-count/TestFrameVarDepthAndElemCount.py (779 of 2814)
PASS: lldb-api :: lang/cpp/function-local-class/TestCppFunctionLocalClass.py (780 of 2814)
UNSUPPORTED: lldb-api :: lang/cpp/function-template-parameter-pack/TestFunctionTemplateParameterPack.py (781 of 2814)
PASS: lldb-api :: lang/cpp/function-qualifiers/TestCppFunctionQualifiers.py (782 of 2814)
PASS: lldb-api :: lang/cpp/function-ref-qualifiers/TestCppFunctionRefQualifiers.py (783 of 2814)
PASS: lldb-api :: lang/cpp/function_refs/TestFunctionRefs.py (784 of 2814)
UNSUPPORTED: lldb-api :: lang/cpp/gmodules/alignment/TestPchAlignment.py (785 of 2814)
FAIL: lldb-api :: lang/c/shared_lib_stripped_symbols/TestSharedLibStrippedSymbols.py (786 of 2814)
******************** TEST 'lldb-api :: lang/c/shared_lib_stripped_symbols/TestSharedLibStrippedSymbols.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/usr/local/bin/llvm-ar --env OBJCOPY=/usr/bin/llvm-objcopy --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/lang/c/shared_lib_stripped_symbols -p TestSharedLibStrippedSymbols.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision ae5bd2a9f292037c605b2ec0ee31200581bd8701)
  clang revision ae5bd2a9f292037c605b2ec0ee31200581bd8701
  llvm revision ae5bd2a9f292037c605b2ec0ee31200581bd8701
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_expr_dsym (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase) (test case does not fall in any category of interest for this run) 
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_expr_dwarf (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_expr_dwo (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_frame_variable_dsym (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase) (test case does not fall in any category of interest for this run) 
XFAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_frame_variable_dwarf (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
XFAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_frame_variable_dwo (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
======================================================================
FAIL: test_expr_dwarf (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
   Test that types work when defined in a shared library and forwa/d-declared in the main executable
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1769, in test_method
    return attrvalue(self)
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/lang/c/shared_lib_stripped_symbols/TestSharedLibStrippedSymbols.py", line 24, in test_expr
    self.expect(
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2370, in expect
    self.runCmd(
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1000, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Variable(s) displayed correctly
Error output:

Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Oct 3, 2024
…lvm#110714)

Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_store.

1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns
defined for them, while p0 did, since those pointer types weren't in the
list of types that was iterated to form the patterns.
2. Vectors of pointers need to be bitcast to vectors of the
corresponding scalars, since there doesn't seem to be a good way to
define the rewrite patterns for buffer_load/store of those types

The need to bitcast vectors of pointers was also revealed to affect
ordinary `G_LOAD` and `G_STORE` in some cases, so
`shouldBitcastLoadStore()` has been fixed to handle it properly.
chapuni added a commit that referenced this pull request Oct 3, 2024
…r.*.pN (#110714)"

Some builders has been failing tests.
```
Failed Tests (2):
  LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-global-old-legalization.mir
  LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir
```

This reverts commit ae5bd2a.
(llvmorg-20-init-7805-gae5bd2a9f292)
@chapuni
Copy link
Contributor

chapuni commented Oct 3, 2024

Excuse me, I've reverted this. For example, https://lab.llvm.org/buildbot/#/builders/63/builds/1865

krzysz00 added a commit to krzysz00/llvm-project that referenced this pull request Oct 3, 2024
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
…lvm#110714)

Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_store.

1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns
defined for them, while p0 did, since those pointer types weren't in the
list of types that was iterated to form the patterns.
2. Vectors of pointers need to be bitcast to vectors of the
corresponding scalars, since there doesn't seem to be a good way to
define the rewrite patterns for buffer_load/store of those types

The need to bitcast vectors of pointers was also revealed to affect
ordinary `G_LOAD` and `G_STORE` in some cases, so
`shouldBitcastLoadStore()` has been fixed to handle it properly.
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
…r.*.pN (llvm#110714)"

Some builders has been failing tests.
```
Failed Tests (2):
  LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-global-old-legalization.mir
  LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir
```

This reverts commit ae5bd2a.
(llvmorg-20-init-7805-gae5bd2a9f292)
krzysz00 added a commit that referenced this pull request Oct 4, 2024
…er.*.pN (#110714)" (#111059)

This reverts commit 650c41a.

The test failures appear to be from conflicts with other PRs that landed around this time.
@ppenzin
Copy link

ppenzin commented Oct 5, 2024

Local failures also with the patch reapplied, as well as that builder is still failing. It is quite odd, not sure what is causing that.

chapuni added a commit that referenced this pull request Oct 6, 2024
…rs, buffer.*.pN (#110714)" (#111059)"

This reverts commit 98a15c7.
(llvmorg-20-init-8051-g98a15c7b0c6e)
@chapuni
Copy link
Contributor

chapuni commented Oct 6, 2024

Reverted again.

I've investigated further. Tests will fail with -DNDEBUG.

reportGISelFailure() is called only in +Asserts.
2b94972#diff-764f8c5ac9f21f864c5f677d103b945de05af957c7f993eebeb0bce8fbd25125R88
It marks a property.
7c6ca18#diff-7b1e350b8a96379b42928d6d373d502796b98500bbfe87d3b7abbcc08e1b8bc5R228

A quick fix will be mark tests with REQUIRES: asserts.
I didn't notice why the result was affected by assertions.

Kyvangka1610 added a commit to Kyvangka1610/llvm-project that referenced this pull request Oct 6, 2024
* commit 'FETCH_HEAD':
  [X86] combineAndLoadToBZHI - don't do an return early return if we fail to match a load
  [X86] replace-load-and-with-bzhi.ll - add commuted test cases to show failure to fold
  [X86] replace-load-and-with-bzhi.ll - cleanup check-prefixes to use X86/X64 for 32/64-bit targets
  [ExecutionEngine] Avoid repeated hash lookups (NFC) (llvm#111275)
  [ByteCode] Avoid repeated hash lookups (NFC) (llvm#111273)
  [StaticAnalyzer] Avoid repeated hash lookups (NFC) (llvm#111272)
  [CodeGen] Avoid repeated hash lookups (NFC) (llvm#111274)
  [RISCV] Simplify fixed-vector-fp.ll run lines. NFC
  [libc++][format][1/3] Adds more benchmarks. (llvm#101803)
  [X86] combineOrXorWithSETCC - avoid duplicate SDLoc/operands code. NFC.
  [X86] convertIntLogicToFPLogic - avoid duplicate SDLoc/operands code. NFC.
  [libc] Clean up some include in `libc`. (llvm#110980)
  [X86] combineBitOpWithPACK - avoid duplicate SDLoc/operands code. NFC.
  [X86] combineBitOpWithMOVMSK - avoid duplicate SDLoc/operands code. NFC.
  [X86] combineBitOpWithShift - avoid duplicate SDLoc/operands code. NFC.
  [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat.
  [X86] combineSubABS - avoid duplicate SDLoc. NFC.
  [ValueTypes][RISCV] Add v1bf16 type (llvm#111112)
  [VPlan] Add additional FOR hoisting test.
  [clang-tidy] Create bugprone-bitwise-pointer-cast check (llvm#108083)
  [InstCombine] Canonicalize more geps with constant gep bases and constant offsets. (llvm#110033)
  [LV] Honor uniform-after-vectorization in setVectorizedCallDecision.
  [ELF] Pass Ctx & to Arch/
  [ELF] Pass Ctx & to Arch/
  [libc++] Fix a typo (llvm#111239)
  [X86] For minsize memset/memcpy, use byte or double-word accesses (llvm#87003)
  [RISCV] Unify RVBShift_ri and RVBShiftW_ri with Shift_ri and ShiftW_ri. NFC (llvm#111263)
  Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (llvm#110714)" (llvm#111059)"
  [libc] Add missing include to __support/StringUtil/tables/stdc_errors.h. (llvm#111271)
  [libc] remove errno.h includes (llvm#110934)
  [NFC][rtsan] Update docs to include [[clang::blocking]] (llvm#111249)
  [RISCV] Give ZEXT_H_RV32 and ZEXT_H_RV64 R-type format to match PACK. NFC
  [mlir][SPIRV] Fix build (2) (llvm#111265)
  [mlir][SPIRV] Fix build error (llvm#111264)
  [mlir][NFC] Mark type converter in `populate...` functions as `const` (llvm#111250)
  [Basic] Avoid repeated hash lookups (NFC) (llvm#111228)
  [RISCV] Use THShift_ri class instead of RVBShift_ri for TH_TST instruction. NFC
  [VPlan] Only generate first lane for VPPredInstPHI if no others used.
  [ELF] Don't call getPPC64TargetInfo outside Driver. NFC
  [GISel] Don't preserve NSW flag when converting G_MUL of INT_MIN to G_SHL. (llvm#111230)
  [APInt] Slightly simplify APInt::ashrSlowCase. NFC (llvm#111220)
  [Sema] Avoid repeated hash lookups (NFC) (llvm#111227)
  [Affine] Avoid repeated hash lookups (NFC) (llvm#111226)
  [Driver] Avoid repeated hash lookups (NFC) (llvm#111225)
  [clang][test] Remove a broken bytecode test
  [ELF] Pass Ctx &
  [ELF] Pass Ctx & to Relocations

Signed-off-by: kyvangka1610 <kyvangka2002@gmail.com>
krzysz00 added a commit to krzysz00/llvm-project that referenced this pull request Oct 9, 2024
…er.*.pN (llvm#110714)" v2

The first attempted reapply was llvm#111059

This reverts commit e075dcf.
krzysz00 added a commit that referenced this pull request Oct 9, 2024
…er.*.pN (#110714)" v2 (#111708)

This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds

The first attempted reapply was #111059

This reverts commit e075dcf.
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
…er.*.pN (llvm#110714)" v2 (llvm#111708)

This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds

The first attempted reapply was llvm#111059

This reverts commit e075dcf.
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
krzysz00 added a commit to krzysz00/llvm-project that referenced this pull request Oct 31, 2024
…er.*.pN (llvm#110714)" v3

This reverts commit 8a849a2.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
krzysz00 added a commit to krzysz00/llvm-project that referenced this pull request Oct 31, 2024
…er.*.pN (llvm#110714)" v3

This reverts commit 8a849a2.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
krzysz00 added a commit to krzysz00/llvm-project that referenced this pull request Oct 31, 2024
…er.*.pN (llvm#110714)" v3

This reverts commit 8a849a2.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
krzysz00 added a commit that referenced this pull request Nov 1, 2024
…er.*.pN (#110714)" v3 (#114443)

This reverts commit 8a849a2.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…er.*.pN (llvm#110714)" v3 (llvm#114443)

This reverts commit 8a849a2.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…er.*.pN (llvm#110714)" v3 (llvm#114443)

This reverts commit 8a849a2.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…er.*.pN (llvm#110714)" v3 (llvm#114443)

This reverts commit 8a849a2.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants