[AArch64] add intrinsic to generate a bfi instruction #79672

RamaMalladiAWS · 2024-01-27T02:43:20Z

BFI: Bit Field Insert copies any number of low order bits from a
register into the same number of adjacent bits at any position
in the destination register.

This PR generates the BFI instruction by implementing an intrinsic
function that can be invoked from the LLVM-IR.

BFI: Bit Field Insert copies any number of low order bits from a register into the same number of adjacent bits at any position in the destination register. This PR generates the BFI instruction by implementing an intrinsic function that can be invoked from the LLVM-IR.

github-actions · 2024-01-27T02:43:38Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-01-27T02:44:07Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-ir

Author: Rama Malladi (RamaMalladiAWS)

Changes

BFI: Bit Field Insert copies any number of low order bits from a
register into the same number of adjacent bits at any position
in the destination register.

This PR generates the BFI instruction by implementing an intrinsic
function that can be invoked from the LLVM-IR.

Full diff: https://github.com/llvm/llvm-project/pull/79672.diff

4 Files Affected:

(modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+3)
(modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+13)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+8)
(added) llvm/test/CodeGen/AArch64/bfi-64-intrinsic.ll (+25)

diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 921e5b95ae03e8..9eb5154c95138f 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -855,6 +855,9 @@ def int_aarch64_crc32x  : DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llv
     [IntrNoMem]>;
 def int_aarch64_crc32cx : DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],
     [IntrNoMem]>;
+def int_aarch64_bfi : DefaultAttrsIntrinsic<
+    [llvm_anyint_ty], [llvm_anyint_ty, llvm_anyint_ty, llvm_anyint_ty, llvm_anyint_ty],
+    [IntrNoMem, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 163ed520a8a677..1fe3f95d54d131 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -5230,6 +5230,19 @@ void AArch64DAGToDAGISel::Select(SDNode *Node) {
     switch (IntNo) {
     default:
       break;
+    case Intrinsic::aarch64_bfi: {
+      SDLoc DL(Node);
+      auto lsb = cast<ConstantSDNode>(Node->getOperand(3))->getZExtValue();
+      auto width = cast<ConstantSDNode>(Node->getOperand(4))->getZExtValue();
+      auto ImmR = (VT.getSizeInBits() - lsb) % VT.getSizeInBits();
+      auto ImmS = width - 1;
+      SDValue Ops[] = {Node->getOperand(1), Node->getOperand(2),
+                       CurDAG->getConstant(ImmR, DL, VT),
+                       CurDAG->getConstant(ImmS, DL, VT)};
+      unsigned Opc = (VT == MVT::i32) ? AArch64::BFMWri : AArch64::BFMXri;
+      CurDAG->SelectNodeTo(Node, Opc, VT, Ops);
+      return;
+    }
     case Intrinsic::aarch64_tagp:
       SelectTagP(Node);
       return;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 03baa7497615e3..afa911abad7982 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -2558,6 +2558,14 @@ def : Pat<(rotr GPR32:$Rn, (i64 imm0_31:$imm)),
 def : Pat<(rotr GPR64:$Rn, (i64 imm0_63:$imm)),
           (EXTRXrri GPR64:$Rn, GPR64:$Rn, imm0_63:$imm)>;
 
+def SDT_AArch64BFI_32bit : SDTypeProfile<1, 2, [SDTCisVT<0, i32>, SDTCisVT<1, i32>,
+                                                    SDTCisVT<2, i32>]>;
+def SDT_AArch64BFI_64bit : SDTypeProfile<1, 2, [SDTCisVT<0, i64>, SDTCisVT<1, i64>,
+                                                    SDTCisVT<2, i64>]>;
+
+def aarch64_bfiw  : SDNode<"AArch64::BFMWri",  SDT_AArch64BFI_32bit>;
+def aarch64_bfix  : SDNode<"AArch64::BFMXri",  SDT_AArch64BFI_64bit>;
+
 //===----------------------------------------------------------------------===//
 // Other bitfield immediate instructions.
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/AArch64/bfi-64-intrinsic.ll b/llvm/test/CodeGen/AArch64/bfi-64-intrinsic.ll
new file mode 100644
index 00000000000000..11ecde6b6fab20
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/bfi-64-intrinsic.ll
@@ -0,0 +1,25 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64-none-linux-gnu %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
+
+define i32 @f32(i32 %A, i32 %B) nounwind {
+; CHECK-LABEL: f32:
+; CHECK-GI:      // %bb.0:
+; CHECK-GI-NEXT: bfi w0, w1, #4, #2
+; CHECK-GI-NEXT: ret
+entry:
+  %tmp32 = call i32 @llvm.aarch64.bfi.i32(i32 %A, i32 %B, i32 4, i32 2)
+  ret i32 %tmp32
+}
+
+define i64 @f64(i64 %A, i64 %B) nounwind {
+; CHECK-LABEL: f64:
+; CHECK-GI:      // %bb.0:
+; CHECK-GI-NEXT: bfi x0, x1, #23, #8
+; CHECK-GI-NEXT: ret
+entry:
+  %tmp64 = call i64 @llvm.aarch64.bfi.i64(i64 %A, i64 %B, i64 23, i64 8)
+  ret i64 %tmp64
+}
+
+declare i32 @llvm.aarch64.bfi.i32(i32, i32, i32, i32)
+declare i64 @llvm.aarch64.bfi.i64(i64, i64, i64, i64)

inclyc

Also I kindly wonder if this intrinsic is necessary for optimization/codegen. Why not prefer inline assembly?

inclyc · 2024-01-28T08:11:24Z

llvm/lib/Target/AArch64/AArch64InstrInfo.td

+def SDT_AArch64BFI_64bit : SDTypeProfile<1, 2, [SDTCisVT<0, i64>, SDTCisVT<1, i64>,
+                                                    SDTCisVT<2, i64>]>;
+
+def aarch64_bfiw  : SDNode<"AArch64::BFMWri",  SDT_AArch64BFI_32bit>;


(question) could you explain why creating a new SDNode? Is this necessary?

davemgreen · 2024-01-28T10:52:55Z

Hello. Can you explain why this is needed, as opposed to using the equivalent shift/and/ors?

RamaMalladiAWS · 2024-01-28T17:07:15Z

Hello. Can you explain why this is needed, as opposed to using the equivalent shift/and/ors?

Hi @davemgreen, one of our customers requested for such an intrinsic to be made available so that they could consume it in their IR directly. The reasoning was to use 1 instruction such bfi instead of a combination of shift, and, or.

davemgreen · 2024-01-29T08:46:40Z

OK. We would not usually add intrinsics like this without a strong motivating case, that could not be optimized in some other way. It is better to use target independent options when available, and inline assembly is available as a fallback if it is really needed. But I would recommend that they use normal and/or/shift operations and let us know about places the compiler isn't optimizing them as well as it could be.

RamaMalladiAWS · 2024-01-29T11:00:56Z

OK. We would not usually add intrinsics like this without a strong motivating case, that could not be optimized in some other way. It is better to use target independent options when available, and inline assembly is available as a fallback if it is really needed. But I would recommend that they use normal and/or/shift operations and let us know about places the compiler isn't optimizing them as well as it could be.

I completely agree with the approach @davemgreen. In this case, the IR sequence wasn't optimized to a bfi. I can try to get a test-case and create an issue for generating the bfi instruction. Thanks again.

RKSimon · 2024-01-29T11:13:31Z

@RamaMalladiAWS Do you have examples of the IR that fails to lower to BFI? These things often turn out to be either a missing middle-end canonicalization or maybe a case that could be added to existing pattern matching in the back-end.

RamaMalladiAWS · 2024-01-29T11:25:52Z

@RamaMalladiAWS Do you have examples of the IR that fails to lower to BFI? These things often turn out to be either a missing middle-end canonicalization or maybe a case that could be added to existing pattern matching in the back-end.

Yes, @RKSimon, I will try to get some test-cases in the next couple of days and we can evaluate the issues if any. Thank you.

dwpan · 2024-01-30T19:04:43Z

Hello. Can you explain why this is needed, as opposed to using the equivalent shift/and/ors?

In Verilog/SystemVerilog language, the basic type is bit or bit vector, and length is arbitrary, insert/extract bits are common features in language. Introducing corresponding intrinsics could help gradually lower it and bring more optimization opportunities in llc. Otherwise, many shift/and/or are needed to be translated and then depends on code pattern matching to recognize and optimize them.

davemgreen · 2024-01-31T17:00:13Z

I see. The issue is that the opposite is often true as well - if we add a target specific intrinsic for this then, whilst we get a single instruction being emitted, we don't see all the other optimizations that the compiler can and should be performing.

Things like constant folding, combining into other instructions, known-bits analysis or any form of vectorization will all be blocked by the intrinsic. It can take quite some work to add all those features in (if they are possible), and without them can potentially lead to worse results. Plus more things to maintain.

BFI isn't a trivial instructions to match as it involves certain masks and shifts. There might certainly be advantages to having an intrinsic. I would like to try and see what the problems would be with generated code using normal operations first though, if we can. If there are optimizations we can make based on the existing code then that would help in all cases (c, mlir, rust, etc), not just frontends that are producing the intrinsics.

RamaMalladiAWS · 2024-03-01T23:02:59Z

We decided to close this PR as we found LLC code-gen was good for many of our test-cases. Thank you for the reviews.

llvmbot added backend:AArch64 llvm:ir labels Jan 27, 2024

Merge branch 'main' into arm-bfi

7fb33f5

inclyc reviewed Jan 28, 2024

View reviewed changes

inclyc changed the title ~~intrinsic to generate a bfi instruction~~ [AArch64] add intrinsic to generate a bfi instruction Jan 28, 2024

davemgreen mentioned this pull request Feb 11, 2024

[llvm][AArch64] intrinsic to generate a ubfx instruction #80103

Closed

RamaMalladiAWS closed this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] add intrinsic to generate a bfi instruction #79672

[AArch64] add intrinsic to generate a bfi instruction #79672

RamaMalladiAWS commented Jan 27, 2024

github-actions bot commented Jan 27, 2024

llvmbot commented Jan 27, 2024 •

edited

inclyc left a comment

inclyc Jan 28, 2024

davemgreen commented Jan 28, 2024

RamaMalladiAWS commented Jan 28, 2024 •

edited

davemgreen commented Jan 29, 2024

RamaMalladiAWS commented Jan 29, 2024 •

edited

RKSimon commented Jan 29, 2024

RamaMalladiAWS commented Jan 29, 2024

dwpan commented Jan 30, 2024

davemgreen commented Jan 31, 2024

RamaMalladiAWS commented Mar 1, 2024

[AArch64] add intrinsic to generate a bfi instruction #79672

[AArch64] add intrinsic to generate a bfi instruction #79672

Conversation

RamaMalladiAWS commented Jan 27, 2024

github-actions bot commented Jan 27, 2024

llvmbot commented Jan 27, 2024 • edited

inclyc left a comment

Choose a reason for hiding this comment

inclyc Jan 28, 2024

Choose a reason for hiding this comment

davemgreen commented Jan 28, 2024

RamaMalladiAWS commented Jan 28, 2024 • edited

davemgreen commented Jan 29, 2024

RamaMalladiAWS commented Jan 29, 2024 • edited

RKSimon commented Jan 29, 2024

RamaMalladiAWS commented Jan 29, 2024

dwpan commented Jan 30, 2024

davemgreen commented Jan 31, 2024

RamaMalladiAWS commented Mar 1, 2024

llvmbot commented Jan 27, 2024 •

edited

RamaMalladiAWS commented Jan 28, 2024 •

edited

RamaMalladiAWS commented Jan 29, 2024 •

edited