Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm][AArch64] intrinsic to generate a ubfx instruction #80103

Closed
wants to merge 3 commits into from

Conversation

RamaMalladiAWS
Copy link

@RamaMalladiAWS RamaMalladiAWS commented Jan 31, 2024

Unsigned Bitfield Extract copies a bitfield of bits starting from bit position in the source register to the least significant bits of the destination register, and sets destination bits above the bitfield to zero.

This PR is to generate the UBFX instruction using an intrinsic function that can be invoked from the LLVM-IR.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 31, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-ir

Author: Rama Malladi (RamaMalladiAWS)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/80103.diff

4 Files Affected:

  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+3)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+13)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+6)
  • (added) llvm/test/CodeGen/AArch64/ubfx-64-intrinsic.ll (+36)
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 921e5b95ae03e..868265dcd1192 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -855,6 +855,9 @@ def int_aarch64_crc32x  : DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llv
     [IntrNoMem]>;
 def int_aarch64_crc32cx : DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],
     [IntrNoMem]>;
+def int_aarch64_ubfx : DefaultAttrsIntrinsic<
+    [llvm_anyint_ty], [llvm_anyint_ty, llvm_anyint_ty, llvm_anyint_ty],
+    [IntrNoMem, ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>]>;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 163ed520a8a67..acdc9f70cb14c 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -5230,6 +5230,19 @@ void AArch64DAGToDAGISel::Select(SDNode *Node) {
     switch (IntNo) {
     default:
       break;
+    case Intrinsic::aarch64_ubfx: {
+      SDLoc DL(Node);
+      auto lsb = cast<ConstantSDNode>(Node->getOperand(2))->getZExtValue();
+      auto width = cast<ConstantSDNode>(Node->getOperand(3))->getZExtValue();
+      auto ImmR = (VT.getSizeInBits() - lsb) % VT.getSizeInBits();
+      auto ImmS = width - 1;
+      SDValue Ops[] = {Node->getOperand(1),
+                       CurDAG->getTargetConstant(ImmR, DL, VT),
+                       CurDAG->getTargetConstant(ImmS, DL, VT)};
+      unsigned Opc = (VT == MVT::i32) ? AArch64::UBFMWri : AArch64::UBFMXri;
+      CurDAG->SelectNodeTo(Node, Opc, VT, Ops);
+      return;
+    }
     case Intrinsic::aarch64_tagp:
       SelectTagP(Node);
       return;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 03baa7497615e..cc2d094cb0755 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -2558,6 +2558,12 @@ def : Pat<(rotr GPR32:$Rn, (i64 imm0_31:$imm)),
 def : Pat<(rotr GPR64:$Rn, (i64 imm0_63:$imm)),
           (EXTRXrri GPR64:$Rn, GPR64:$Rn, imm0_63:$imm)>;
 
+def SDT_AArch64UBFX_32bit : SDTypeProfile<1, 1, [SDTCisVT<0, i32>, SDTCisVT<1, i32>]>;
+def SDT_AArch64UBFX_64bit : SDTypeProfile<1, 1, [SDTCisVT<0, i64>, SDTCisVT<1, i64>]>;
+
+def aarch64_ubfxw  : SDNode<"AArch64::UBFMWri",  SDT_AArch64UBFX_32bit>;
+def aarch64_ubfxx  : SDNode<"AArch64::UBFMXri",  SDT_AArch64UBFX_64bit>;
+
 //===----------------------------------------------------------------------===//
 // Other bitfield immediate instructions.
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/AArch64/ubfx-64-intrinsic.ll b/llvm/test/CodeGen/AArch64/ubfx-64-intrinsic.ll
new file mode 100644
index 0000000000000..a98cd99f3a16c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/ubfx-64-intrinsic.ll
@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64-none-linux-gnu %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
+
+define i32 @f32(i32 %A) nounwind {
+; CHECK-LABEL: f32:
+; CHECK-GI:      // %bb.0:
+; CHECK-GI-NEXT: ubfiz w0, w0, #3, #3
+; CHECK-GI-NEXT: ret
+entry:
+  %tmp = call i32 @llvm.aarch64.ubfx.i32(i32 %A, i32 3, i32 3)
+  ret i32 %tmp
+}
+
+define i64 @f64(i64 %A) nounwind {
+; CHECK-LABEL: f64:
+; CHECK-GI:      // %bb.0:
+; CHECK-GI-NEXT: ubfiz x0, x0, #38, #4
+; CHECK-GI-NEXT: ret
+entry:
+  %tmp = call i64 @llvm.aarch64.ubfx.i64(i64 %A, i64 38, i64 4)
+  ret i64 %tmp
+}
+
+define i64 @f64_1(i64 %A) nounwind {
+; CHECK-LABEL: f64_1:
+; CHECK-GI:      // %bb.0:
+; CHECK-GI-NEXT: ubfx x0, x0, #50, #3
+; CHECK-GI-NEXT: ret
+entry:
+  %tmp = call i64 @llvm.aarch64.ubfx.i64(i64 %A, i64 14, i64 53)
+  ret i64 %tmp
+}
+
+declare i32 @llvm.aarch64.ubfx.i32(i32, i32, i32)
+declare i64 @llvm.aarch64.ubfx.i64(i64, i64, i64)
+

@DavidSpickett DavidSpickett changed the title intrinsic to generate a ubfx instruction [llvm][AArch64] intrinsic to generate a ubfx instruction Jan 31, 2024
@@ -855,6 +855,9 @@ def int_aarch64_crc32x : DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llv
[IntrNoMem]>;
def int_aarch64_crc32cx : DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],
[IntrNoMem]>;
def int_aarch64_ubfx : DefaultAttrsIntrinsic<
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should perhaps be not under the "CRC32" section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed it in affc65a

@dwpan
Copy link

dwpan commented Jan 31, 2024

The feature will be very useful for hardware language like Verilog/SystemVerilog which insert/extract bits are parts of language reference manual.

@davemgreen
Copy link
Collaborator

For the record I have the same comment as #79672 (comment).

(Additionally, if we were to add the intrinsics, it would probably be better to have a ubfm intrinsic and push the figuring out of the width and whatnot to the frontend. UBFM is the more basic instruction, and it might be better to have a SBFM too to keep them symmetrical. There are also the same issues with the tablegen vs c++ lowering as in #79672.

But it still seems better to me to use normal instructions and optimize the backend where we need to.)

@RamaMalladiAWS
Copy link
Author

We decided to close this PR as we found LLC code-gen was good for many of our test-cases. Thank you for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants