Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id #89612

Merged
merged 2 commits into from
May 22, 2024

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Apr 22, 2024

This provides access to the special scalar source value
SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 22, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-ir

Author: Jay Foad (jayfoad)

Changes

This provides access to the special scalar source value
SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.


Full diff: https://github.com/llvm/llvm-project/pull/89612.diff

4 Files Affected:

  • (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+3)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+5-1)
  • (modified) llvm/lib/Target/AMDGPU/SOPInstructions.td (+6)
  • (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll (+34)
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index ee9a5d7a343980..1ec48650eba7a9 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2485,6 +2485,9 @@ class AMDGPUGlobalLoadLDS : Intrinsic <
   "", [SDNPMemOperand]>;
 def int_amdgcn_global_load_lds : AMDGPUGlobalLoadLDS;
 
+def int_amdgcn_pops_exiting_wave_id :
+  DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
 //===----------------------------------------------------------------------===//
 // GFX10 Intrinsics
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index aa4ec785bf02a3..6f4906a696d3af 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4825,11 +4825,15 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
       OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
       break;
     }
-    case Intrinsic::amdgcn_s_bitreplicate:
+    case Intrinsic::amdgcn_s_bitreplicate: {
       Register MaskReg = MI.getOperand(2).getReg();
       unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
       OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
       OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
+      break;
+    }
+    case Intrinsic::amdgcn_pops_exiting_wave_id:
+      return getDefaultMappingSOP(MI);
     }
     break;
   }
diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 0b7d45ee8c027d..4525c2345d9e45 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1865,6 +1865,12 @@ let SubtargetPredicate = isNotGFX9Plus in {
 def : GetFPModePat<fpmode_mask_gfx6plus>;
 }
 
+let SubtargetPredicate = isGFX9GFX10 in
+def : GCNPat<
+  (int_amdgcn_pops_exiting_wave_id),
+  (S_MOV_B32 (i32 SRC_POPS_EXITING_WAVE_ID))
+>;
+
 //===----------------------------------------------------------------------===//
 // SOP2 Patterns
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll
new file mode 100644
index 00000000000000..4927c2ffcdf30d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s -check-prefix=SDAG
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s -check-prefix=GFX9-GISEL
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck %s -check-prefix=SDAG
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck %s -check-prefix=GFX10-GISEL
+
+define amdgpu_ps void @test(ptr addrspace(1) inreg %ptr) {
+; SDAG-LABEL: test:
+; SDAG:       ; %bb.0:
+; SDAG-NEXT:    s_mov_b32 s2, src_pops_exiting_wave_id
+; SDAG-NEXT:    v_mov_b32_e32 v0, 0
+; SDAG-NEXT:    v_mov_b32_e32 v1, s2
+; SDAG-NEXT:    global_store_dword v0, v1, s[0:1]
+; SDAG-NEXT:    s_endpgm
+;
+; GFX9-GISEL-LABEL: test:
+; GFX9-GISEL:       ; %bb.0:
+; GFX9-GISEL-NEXT:    s_mov_b32 s2, src_pops_exiting_wave_id
+; GFX9-GISEL-NEXT:    v_mov_b32_e32 v0, s2
+; GFX9-GISEL-NEXT:    v_mov_b32_e32 v1, 0
+; GFX9-GISEL-NEXT:    global_store_dword v1, v0, s[0:1]
+; GFX9-GISEL-NEXT:    s_endpgm
+;
+; GFX10-GISEL-LABEL: test:
+; GFX10-GISEL:       ; %bb.0:
+; GFX10-GISEL-NEXT:    s_mov_b32 s2, src_pops_exiting_wave_id
+; GFX10-GISEL-NEXT:    v_mov_b32_e32 v1, 0
+; GFX10-GISEL-NEXT:    v_mov_b32_e32 v0, s2
+; GFX10-GISEL-NEXT:    global_store_dword v1, v0, s[0:1]
+; GFX10-GISEL-NEXT:    s_endpgm
+  %id = call i32 @llvm.amdgcn.pops.exiting.wave.id()
+  store i32 %id, ptr addrspace(1) %ptr
+  ret void
+}

define amdgpu_ps void @test(ptr addrspace(1) inreg %ptr) {
; SDAG-LABEL: test:
; SDAG: ; %bb.0:
; SDAG-NEXT: s_mov_b32 s2, src_pops_exiting_wave_id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO in a future patch: fold src_pops_exiting_wave_id into uses of s2. This might be tricky because the value of src_pops_exiting_wave_id is volatile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As long as there's only a single use it may be fine.

Copy link

@Triang3l Triang3l Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible, but disgustingly dirty approach would be to expose it not as a load, but rather directly as an integer addition to an operand 🙃 But that'd definitely add some excessive responsibilities to it, especially since there are other ways the wave ID wraparound can be handled mathematically such as comparisons, and I don't know how unfriendly to the rest of the architecture that would be.

@@ -2485,6 +2485,9 @@ class AMDGPUGlobalLoadLDS : Intrinsic <
"", [SDNPMemOperand]>;
def int_amdgcn_global_load_lds : AMDGPUGlobalLoadLDS;

def int_amdgcn_pops_exiting_wave_id :
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this reads a volatile value it might need some more attributes - maybe IntrHasSideEffects?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember any of the POPS stuff, but perhaps just don't make it IntrNoMem and IntrSpeculatable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep the IntrNoMem because I'd like to get to a point where all intrinsics that touch memory also have memoperands in MIR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntrInaccessibleMemOnly? We probably should have a PseudoSourceValue for all of those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with SideEffects since it seems like the most direct and obious ways to prevent CSE-like optimizations. I don't see any added benefit in creating a PseudoSourceValue for this.

let SubtargetPredicate = isGFX9GFX10 in
def : GCNPat<
(int_amdgcn_pops_exiting_wave_id),
(S_MOV_B32_sideeffects (i32 SRC_POPS_EXITING_WAVE_ID))
Copy link
Contributor

@arsenm arsenm Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really need side effects? If it's directly reading a reserved register you can't do much of anything with it anyway. Reserved registers are approximately volatile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs side effects to prevent tablegen from complaining about selecting a side-effecting intrinsic to a non-side-effecting instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InaccessibleMemOnly should avoid that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're suggesting I should use InaccessibleMemOnly and give it a memoperand with a new PseudoSourceValue? I don't understand why that is preferable overall.

Copy link
Contributor

@arsenm arsenm Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It doesn't need a memory operand if it's a register dependency. You only need one for mayLoad=1 or mayStore=1 instructions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't CSE'd with a possibly writing call in between. Is this supposed to act more like readcyclecounter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to act more like readcyclecounter?

Yes. It's a volatile value that can change at any time, even without an explicit write in the IR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you just have to copy the attributes from readyclecounter, which is just DefaultAttrsIntrinsic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kind of what I've done already! readcyclecounter reads and writes mem. The new instrinsic also reads and writes mem. The only difference is that I have restricted it to inaccessible mem only, which I still think is reasonable (and could perhaps be applied to readcyclecounter too).

The current patch works for me. Are you asking for / suggesting any changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what would stop the MachineInstr getting CSEd, hoisted out of loops, rematerialized etc? We definitely need to prevent it from being moved past control flow so that you can write loops that poll the value.

The use of reserved physical register

That does not seem to work. See:

; SDAG-NEXT: s_mov_b32 s0, src_pops_exiting_wave_id
where %1:sreg_32 = S_MOV_B32 $src_pops_exiting_wave_id is hoisted by early-machinelicm.

// Variant of S_MOV_B32 used for reading volatile source values like
// SRC_POPS_EXITING_WAVE_ID.
let mayLoad = 1, mayStore = 1, maybeAtomic = 0 in
def S_MOV_B32_loadstore : SOP1_32 <"s_mov_b32">;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet managed to prevent TableGen from complaining about selecting an IntrInaccessibleMemOnly intrinsic to an instruction without mayLoad/mayStore. The flag inference stuff in CodeGenDAGPatterns::InferFromPattern is just too complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just delete it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid the pseudo. I'd probably prefer to just manually select over having it. But really that flag inference stuff in TableGen is garbage. We at minimum need to fix the properties to exactly match the attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I updated the patch to select it in C++ (and rebased).

This provides access to the special scalar source value
SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.
@jayfoad jayfoad merged commit 990bed6 into llvm:main May 22, 2024
3 of 4 checks passed
@jayfoad jayfoad deleted the pops-exiting-wave-id branch May 22, 2024 18:48
@RKSimon
Copy link
Collaborator

RKSimon commented May 23, 2024

@jayfoad This is failing on EXPENSIVE_CHECKS builds: https://lab.llvm.org/buildbot/#/builders/16/builds/65957

@jayfoad
Copy link
Contributor Author

jayfoad commented May 23, 2024

@jayfoad This is failing on EXPENSIVE_CHECKS builds: https://lab.llvm.org/buildbot/#/builders/16/builds/65957

Thanks. It's a silly mistake. I will fix in a minute...

jayfoad added a commit that referenced this pull request May 23, 2024
@jayfoad
Copy link
Contributor Author

jayfoad commented May 23, 2024

This is failing on EXPENSIVE_CHECKS builds: https://lab.llvm.org/buildbot/#/builders/16/builds/65957

Thanks. It's a silly mistake. I will fix in a minute...

Fixed by 4e0f8a4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants