Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU][AsmParser] Allow v_writelane_b32 to use SGPR and M0 as source operands at the same time #78827

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

shiltian
Copy link
Contributor

Currently the asm parser takes v_writelane_b32 v1, s13, m0 as illegal
instruction for pre-gfx11 because it uses two constant buses while the hardware
can only allow one. However, based on the comment of AMDGPUInstructionSelector::selectWritelane,
it is allowed to have M0 as lane selector and a SGPR used as SRC0 because the
lane selector doesn't count as a use of constant bus. In fact, codegen can already
generate this form, but this inconsistency is not exposed because the validation
of constant bus limitation only happens when paring an assembly but we don't have
a test case when both SGPR and M0 used as source operands for the instruction.

@llvmbot llvmbot added backend:AMDGPU mc Machine (object) code labels Jan 20, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 20, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-mc

Author: Shilei Tian (shiltian)

Changes

Currently the asm parser takes v_writelane_b32 v1, s13, m0 as illegal
instruction for pre-gfx11 because it uses two constant buses while the hardware
can only allow one. However, based on the comment of AMDGPUInstructionSelector::selectWritelane,
it is allowed to have M0 as lane selector and a SGPR used as SRC0 because the
lane selector doesn't count as a use of constant bus. In fact, codegen can already
generate this form, but this inconsistency is not exposed because the validation
of constant bus limitation only happens when paring an assembly but we don't have
a test case when both SGPR and M0 used as source operands for the instruction.


Full diff: https://github.com/llvm/llvm-project/pull/78827.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+28)
  • (added) llvm/test/MC/AMDGPU/writelane_m0.s (+10)
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index bd68054589b112..491ff9cb655cf9 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -3524,6 +3524,34 @@ bool AMDGPUAsmParser::validateConstantBusLimitations(
       !isVOPD(Opcode))
     return true;
 
+  // Based on the comment for `AMDGPUInstructionSelector::selectWritelane`:
+  // Writelane is special in that it can use SGPR and M0 (which would normally
+  // count as using the constant bus twice - but in this case it is allowed
+  // since the lane selector doesn't count as a use of the constant bus).
+  // However, it is still required to abide by the 1 SGPR rule.
+  switch (Opcode) {
+  default:
+    break;
+  case V_WRITELANE_B32_e64_gfx11:
+  case V_WRITELANE_B32_e64_gfx12:
+  case V_WRITELANE_B32_gfx10:
+  case V_WRITELANE_B32_gfx6_gfx7:
+  case V_WRITELANE_B32_vi: {
+    const MCOperand &LaneSelOp = Inst.getOperand(2);
+    if (LaneSelOp.isReg()) {
+      auto LaneSelReg = mc2PseudoReg(LaneSelOp.getReg());
+      switch (LaneSelReg) {
+      default:
+        break;
+      case M0:
+      case M0_gfxpre11:
+      case M0_gfx11plus:
+        return true;
+      }
+    }
+  }
+  }
+
   // Check special imm operands (used by madmk, etc)
   if (AMDGPU::hasNamedOperand(Opcode, AMDGPU::OpName::imm)) {
     ++NumLiterals;
diff --git a/llvm/test/MC/AMDGPU/writelane_m0.s b/llvm/test/MC/AMDGPU/writelane_m0.s
new file mode 100644
index 00000000000000..4b5c265719f8f8
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/writelane_m0.s
@@ -0,0 +1,10 @@
+// RUN: llvm-mc --triple=amdgcn --mcpu=gfx904 %s | FileCheck %s
+// RUN: llvm-mc --triple=amdgcn --mcpu=gfx940 %s | FileCheck %s
+// RUN: llvm-mc --triple=amdgcn --mcpu=gfx1010 %s | FileCheck %s
+// RUN: llvm-mc --triple=amdgcn --mcpu=gfx1030 %s | FileCheck %s
+// RUN: llvm-mc --triple=amdgcn --mcpu=gfx1100 %s | FileCheck %s
+
+.text
+  v_writelane_b32 v1, s13, m0
+
+// CHECK: v_writelane_b32 v1, s13, m0

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPUInstructionSelector::selectWritelane suggests that on targets where we can support 2 SGPRs per instruction (gfx10+) this is not a special case. That means' this is is only a behavior change for V_WRITELANE_B32_gfx6_gfx7 and V_WRITELANE_B32_vi. Can you please confirm that, and remove the newer versions of the instruction from your special case?

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp Outdated Show resolved Hide resolved
llvm/test/MC/AMDGPU/writelane_m0.s Outdated Show resolved Hide resolved
@shiltian
Copy link
Contributor Author

ping

@shiltian
Copy link
Contributor Author

ping +1

…rce operands at the same time

Currently the asm parser takes `v_writelane_b32 v1, s13, m0` as illegal
instruction for pre-gfx11 because it uses two constant buses while the hardware
can only allow one. However, based on the comment of `AMDGPUInstructionSelector::selectWritelane`,
it is allowed to have M0 as lane selector and a SGPR used as SRC0 because the
lane selector doesn't count as a use of constant bus. In fact, codegen can already
generate this form, but this inconsistency is not exposed because the validation
of constant bus limitation only happens when paring an assembly but we don't have
a test case when both SGPR and M0 used as source operands for the instruction.
Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shiltian shiltian merged commit 6a21e00 into llvm:main Jan 30, 2024
3 of 4 checks passed
@shiltian shiltian deleted the writelane branch January 30, 2024 20:39
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 11, 2024
…rce operands at the same time (llvm#78827)

Currently the asm parser takes `v_writelane_b32 v1, s13, m0` as illegal
instruction for pre-gfx11 because it uses two constant buses while the
hardware
can only allow one. However, based on the comment of
`AMDGPUInstructionSelector::selectWritelane`,
it is allowed to have M0 as lane selector and a SGPR used as SRC0
because the
lane selector doesn't count as a use of constant bus. In fact, codegen
can already
generate this form, but this inconsistency is not exposed because the
validation
of constant bus limitation only happens when paring an assembly but we
don't have
a test case when both SGPR and M0 used as source operands for the
instruction.

Change-Id: I288a92b8ae6305c4c9a7ba51f4ddba4623fc899f
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request May 8, 2024
…rce operands at the same time (llvm#78827)

Currently the asm parser takes `v_writelane_b32 v1, s13, m0` as illegal
instruction for pre-gfx11 because it uses two constant buses while the
hardware
can only allow one. However, based on the comment of
`AMDGPUInstructionSelector::selectWritelane`,
it is allowed to have M0 as lane selector and a SGPR used as SRC0
because the
lane selector doesn't count as a use of constant bus. In fact, codegen
can already
generate this form, but this inconsistency is not exposed because the
validation
of constant bus limitation only happens when paring an assembly but we
don't have
a test case when both SGPR and M0 used as source operands for the
instruction.

Change-Id: I22b1548775f3c34472657134541f4ea40bfdc9e3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants