[TableGen][DecoderEmitter] Add extractBits() overloads #159405

s-barannikov · 2025-09-17T17:14:20Z

These differ from fieldFromInstruction() in that StartBit and NumBits are the template parameters.

Using them in the generated code significantly speeds up compilation in release builds (up to 4 times faster, depending on the used compiler).

llvmbot · 2025-09-17T17:15:00Z

@llvm/pr-subscribers-llvm-mc

Author: Sergei Barannikov (s-barannikov)

Changes

These differ from fieldFromInstruction() in that StartBit and NumBits are the template parameters.

Using them in the generated code significantly speeds up compilation in release builds (up to 4 times faster, depending on the used compiler).

Full diff: https://github.com/llvm/llvm-project/pull/159405.diff

2 Files Affected:

(modified) llvm/include/llvm/MC/MCDecoder.h (+18)
(modified) llvm/utils/TableGen/DecoderEmitter.cpp (+3-4)

diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 175f6a9591558..770941da5f31c 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -58,6 +58,24 @@ uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
   return ((Insn >> StartBit) & Mask).to_ullong();
 }
 
+template <unsigned StartBit, unsigned NumBits, typename T>
+inline std::enable_if_t<std::is_unsigned_v<T>, T> extractBits(T Val) {
+  static_assert(StartBit + NumBits <= std::numeric_limits<T>::digits);
+  return (Val >> StartBit) & maskTrailingOnes<T>(NumBits);
+}
+
+template <unsigned StartBit, unsigned NumBits, size_t N>
+uint64_t extractBits(const std::bitset<N> &Val) {
+  static_assert(StartBit + NumBits <= N);
+  std::bitset<N> Mask = maskTrailingOnes<uint64_t>(NumBits);
+  return ((Val >> StartBit) & Mask).to_ullong();
+}
+
+template <unsigned StartBit, unsigned NumBits>
+uint64_t extractBits(const APInt &Val) {
+  return Val.extractBitsAsZExtValue(NumBits, StartBit);
+}
+
 } // namespace llvm::MCD
 
 #endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/utils/TableGen/DecoderEmitter.cpp b/llvm/utils/TableGen/DecoderEmitter.cpp
index 3a464e01042dc..5a087eefaaf8d 100644
--- a/llvm/utils/TableGen/DecoderEmitter.cpp
+++ b/llvm/utils/TableGen/DecoderEmitter.cpp
@@ -1031,8 +1031,7 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     // One variable part and no/zero constant part. Initialize `tmp` with the
     // variable part.
     auto [Base, Width, Offset] = OpInfo.fields().front();
-    OS << Indent << "tmp = fieldFromInstruction(insn, " << Base << ", " << Width
-       << ')';
+    OS << Indent << "tmp = extractBits<" << Base << ", " << Width << ">(insn)";
     if (Offset)
       OS << " << " << Offset;
     OS << ";\n";
@@ -1042,8 +1041,8 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     OS << Indent << "tmp = " << format_hex(OpInfo.InitValue.value_or(0), 0)
        << ";\n";
     for (auto [Base, Width, Offset] : OpInfo.fields()) {
-      OS << Indent << "tmp |= fieldFromInstruction(insn, " << Base << ", "
-         << Width << ')';
+      OS << Indent << "tmp |= extractBits<" << Base << ", " << Width
+         << ">(insn)";
       if (Offset)
         OS << " << " << Offset;
       OS << ";\n";

llvmbot · 2025-09-17T17:15:00Z

@llvm/pr-subscribers-tablegen

Author: Sergei Barannikov (s-barannikov)

Changes

These differ from fieldFromInstruction() in that StartBit and NumBits are the template parameters.

Using them in the generated code significantly speeds up compilation in release builds (up to 4 times faster, depending on the used compiler).

Full diff: https://github.com/llvm/llvm-project/pull/159405.diff

2 Files Affected:

(modified) llvm/include/llvm/MC/MCDecoder.h (+18)
(modified) llvm/utils/TableGen/DecoderEmitter.cpp (+3-4)

diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 175f6a9591558..770941da5f31c 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -58,6 +58,24 @@ uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
   return ((Insn >> StartBit) & Mask).to_ullong();
 }
 
+template <unsigned StartBit, unsigned NumBits, typename T>
+inline std::enable_if_t<std::is_unsigned_v<T>, T> extractBits(T Val) {
+  static_assert(StartBit + NumBits <= std::numeric_limits<T>::digits);
+  return (Val >> StartBit) & maskTrailingOnes<T>(NumBits);
+}
+
+template <unsigned StartBit, unsigned NumBits, size_t N>
+uint64_t extractBits(const std::bitset<N> &Val) {
+  static_assert(StartBit + NumBits <= N);
+  std::bitset<N> Mask = maskTrailingOnes<uint64_t>(NumBits);
+  return ((Val >> StartBit) & Mask).to_ullong();
+}
+
+template <unsigned StartBit, unsigned NumBits>
+uint64_t extractBits(const APInt &Val) {
+  return Val.extractBitsAsZExtValue(NumBits, StartBit);
+}
+
 } // namespace llvm::MCD
 
 #endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/utils/TableGen/DecoderEmitter.cpp b/llvm/utils/TableGen/DecoderEmitter.cpp
index 3a464e01042dc..5a087eefaaf8d 100644
--- a/llvm/utils/TableGen/DecoderEmitter.cpp
+++ b/llvm/utils/TableGen/DecoderEmitter.cpp
@@ -1031,8 +1031,7 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     // One variable part and no/zero constant part. Initialize `tmp` with the
     // variable part.
     auto [Base, Width, Offset] = OpInfo.fields().front();
-    OS << Indent << "tmp = fieldFromInstruction(insn, " << Base << ", " << Width
-       << ')';
+    OS << Indent << "tmp = extractBits<" << Base << ", " << Width << ">(insn)";
     if (Offset)
       OS << " << " << Offset;
     OS << ";\n";
@@ -1042,8 +1041,8 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     OS << Indent << "tmp = " << format_hex(OpInfo.InitValue.value_or(0), 0)
        << ";\n";
     for (auto [Base, Width, Offset] : OpInfo.fields()) {
-      OS << Indent << "tmp |= fieldFromInstruction(insn, " << Base << ", "
-         << Width << ')';
+      OS << Indent << "tmp |= extractBits<" << Base << ", " << Width
+         << ">(insn)";
       if (Offset)
         OS << " << " << Offset;
       OS << ";\n";

These differ from `fieldFromInstruction()` in that StartBit and NumBits are the template parameters. Using them in the generated code significantly speeds up compilation in release builds (up to 4 times faster, depending on the used compiler).

llvm/include/llvm/MC/MCDecoder.h

s-barannikov · 2025-09-17T17:21:48Z

I'll have to drop static_asserts because they trigger on code that will never be executed (remember that decodeToMCInst has switch cases for all instruction widths). It would be nice if we enabled --specialize-decoders-per-bitwidth unconditionally, then the asserts could be helpful.

llvm/include/llvm/MC/MCDecoder.h

jurahul · 2025-09-17T17:23:19Z

Any idea why compilation is faster? I'd expect templating code will slow down compilation

s-barannikov · 2025-09-17T17:27:41Z

Any idea why compilation is faster? I'd expect templating code will slow down compilation

I've seen InstCombine, SimplifyCFG, RegisterCoalescer and TwoAddressInstructionPass in the time report. I'll try to collect more data.
Can you try this on your downstream target without use-fn-table-in-decode-to-mcinst and see if it is still helpful?

jurahul · 2025-09-17T17:33:04Z

Any idea why compilation is faster? I'd expect templating code will slow down compilation

I've seen InstCombine, SimplifyCFG, RegisterCoalescer and TwoAddressInstructionPass in the time report. I'll try to collect more data. Can you try this on your downstream target without use-fn-table-in-decode-to-mcinst and see if it is still helpful?

Nah, I was just curious. Trying this on our downstream target is not that easy as we are a currently quite a bit behind from upstream.

lenary

LGTM

s-barannikov · 2025-09-17T18:26:59Z

Sorry, I can no longer reproduce the announced speedup. There is a difference, but it is small.
Either I had some unstaged changes that made the difference, or I just compared different things. 🤦‍♂️ The signal was stable though :)

s-barannikov · 2025-09-17T19:10:16Z

It looks like I compared with and without the --specialize-decoders-per-bitwidth option, at least the numbers look similar to what I was getting.
Closing.

s-barannikov requested review from jayfoad, topperc and jurahul September 17, 2025 17:14

llvmbot added tablegen llvm:mc Machine (object) code labels Sep 17, 2025

s-barannikov force-pushed the tablegen/decoder/extract-bits branch from 7ddc9c1 to 07c9725 Compare September 17, 2025 17:15

lenary reviewed Sep 17, 2025

View reviewed changes

llvm/include/llvm/MC/MCDecoder.h Show resolved Hide resolved

jurahul reviewed Sep 17, 2025

View reviewed changes

llvm/include/llvm/MC/MCDecoder.h Show resolved Hide resolved

lenary approved these changes Sep 17, 2025

View reviewed changes

s-barannikov closed this Sep 17, 2025

s-barannikov deleted the tablegen/decoder/extract-bits branch September 24, 2025 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TableGen][DecoderEmitter] Add extractBits() overloads #159405

[TableGen][DecoderEmitter] Add extractBits() overloads #159405

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

llvmbot commented Sep 17, 2025

Uh oh!

llvmbot commented Sep 17, 2025

Uh oh!

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

Uh oh!

jurahul commented Sep 17, 2025

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

jurahul commented Sep 17, 2025

Uh oh!

lenary left a comment

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

Uh oh!

[TableGen][DecoderEmitter] Add extractBits() overloads #159405

[TableGen][DecoderEmitter] Add extractBits() overloads #159405

Uh oh!

Conversation

s-barannikov commented Sep 17, 2025

Uh oh!

llvmbot commented Sep 17, 2025

Uh oh!

llvmbot commented Sep 17, 2025

Uh oh!

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

Uh oh!

jurahul commented Sep 17, 2025

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

jurahul commented Sep 17, 2025

Uh oh!

lenary left a comment

Choose a reason for hiding this comment

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

s-barannikov commented Sep 17, 2025

Uh oh!

Uh oh!