Skip to content

Conversation

s-barannikov
Copy link
Contributor

These differ from fieldFromInstruction() in that StartBit and NumBits are the template parameters.

Using them in the generated code significantly speeds up compilation in release builds (up to 4 times faster, depending on the used compiler).

@llvmbot llvmbot added tablegen llvm:mc Machine (object) code labels Sep 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2025

@llvm/pr-subscribers-llvm-mc

Author: Sergei Barannikov (s-barannikov)

Changes

These differ from fieldFromInstruction() in that StartBit and NumBits are the template parameters.

Using them in the generated code significantly speeds up compilation in release builds (up to 4 times faster, depending on the used compiler).


Full diff: https://github.com/llvm/llvm-project/pull/159405.diff

2 Files Affected:

  • (modified) llvm/include/llvm/MC/MCDecoder.h (+18)
  • (modified) llvm/utils/TableGen/DecoderEmitter.cpp (+3-4)
diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 175f6a9591558..770941da5f31c 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -58,6 +58,24 @@ uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
   return ((Insn >> StartBit) & Mask).to_ullong();
 }
 
+template <unsigned StartBit, unsigned NumBits, typename T>
+inline std::enable_if_t<std::is_unsigned_v<T>, T> extractBits(T Val) {
+  static_assert(StartBit + NumBits <= std::numeric_limits<T>::digits);
+  return (Val >> StartBit) & maskTrailingOnes<T>(NumBits);
+}
+
+template <unsigned StartBit, unsigned NumBits, size_t N>
+uint64_t extractBits(const std::bitset<N> &Val) {
+  static_assert(StartBit + NumBits <= N);
+  std::bitset<N> Mask = maskTrailingOnes<uint64_t>(NumBits);
+  return ((Val >> StartBit) & Mask).to_ullong();
+}
+
+template <unsigned StartBit, unsigned NumBits>
+uint64_t extractBits(const APInt &Val) {
+  return Val.extractBitsAsZExtValue(NumBits, StartBit);
+}
+
 } // namespace llvm::MCD
 
 #endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/utils/TableGen/DecoderEmitter.cpp b/llvm/utils/TableGen/DecoderEmitter.cpp
index 3a464e01042dc..5a087eefaaf8d 100644
--- a/llvm/utils/TableGen/DecoderEmitter.cpp
+++ b/llvm/utils/TableGen/DecoderEmitter.cpp
@@ -1031,8 +1031,7 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     // One variable part and no/zero constant part. Initialize `tmp` with the
     // variable part.
     auto [Base, Width, Offset] = OpInfo.fields().front();
-    OS << Indent << "tmp = fieldFromInstruction(insn, " << Base << ", " << Width
-       << ')';
+    OS << Indent << "tmp = extractBits<" << Base << ", " << Width << ">(insn)";
     if (Offset)
       OS << " << " << Offset;
     OS << ";\n";
@@ -1042,8 +1041,8 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     OS << Indent << "tmp = " << format_hex(OpInfo.InitValue.value_or(0), 0)
        << ";\n";
     for (auto [Base, Width, Offset] : OpInfo.fields()) {
-      OS << Indent << "tmp |= fieldFromInstruction(insn, " << Base << ", "
-         << Width << ')';
+      OS << Indent << "tmp |= extractBits<" << Base << ", " << Width
+         << ">(insn)";
       if (Offset)
         OS << " << " << Offset;
       OS << ";\n";

@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2025

@llvm/pr-subscribers-tablegen

Author: Sergei Barannikov (s-barannikov)

Changes

These differ from fieldFromInstruction() in that StartBit and NumBits are the template parameters.

Using them in the generated code significantly speeds up compilation in release builds (up to 4 times faster, depending on the used compiler).


Full diff: https://github.com/llvm/llvm-project/pull/159405.diff

2 Files Affected:

  • (modified) llvm/include/llvm/MC/MCDecoder.h (+18)
  • (modified) llvm/utils/TableGen/DecoderEmitter.cpp (+3-4)
diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 175f6a9591558..770941da5f31c 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -58,6 +58,24 @@ uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
   return ((Insn >> StartBit) & Mask).to_ullong();
 }
 
+template <unsigned StartBit, unsigned NumBits, typename T>
+inline std::enable_if_t<std::is_unsigned_v<T>, T> extractBits(T Val) {
+  static_assert(StartBit + NumBits <= std::numeric_limits<T>::digits);
+  return (Val >> StartBit) & maskTrailingOnes<T>(NumBits);
+}
+
+template <unsigned StartBit, unsigned NumBits, size_t N>
+uint64_t extractBits(const std::bitset<N> &Val) {
+  static_assert(StartBit + NumBits <= N);
+  std::bitset<N> Mask = maskTrailingOnes<uint64_t>(NumBits);
+  return ((Val >> StartBit) & Mask).to_ullong();
+}
+
+template <unsigned StartBit, unsigned NumBits>
+uint64_t extractBits(const APInt &Val) {
+  return Val.extractBitsAsZExtValue(NumBits, StartBit);
+}
+
 } // namespace llvm::MCD
 
 #endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/utils/TableGen/DecoderEmitter.cpp b/llvm/utils/TableGen/DecoderEmitter.cpp
index 3a464e01042dc..5a087eefaaf8d 100644
--- a/llvm/utils/TableGen/DecoderEmitter.cpp
+++ b/llvm/utils/TableGen/DecoderEmitter.cpp
@@ -1031,8 +1031,7 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     // One variable part and no/zero constant part. Initialize `tmp` with the
     // variable part.
     auto [Base, Width, Offset] = OpInfo.fields().front();
-    OS << Indent << "tmp = fieldFromInstruction(insn, " << Base << ", " << Width
-       << ')';
+    OS << Indent << "tmp = extractBits<" << Base << ", " << Width << ">(insn)";
     if (Offset)
       OS << " << " << Offset;
     OS << ";\n";
@@ -1042,8 +1041,8 @@ static void emitBinaryParser(raw_ostream &OS, indent Indent,
     OS << Indent << "tmp = " << format_hex(OpInfo.InitValue.value_or(0), 0)
        << ";\n";
     for (auto [Base, Width, Offset] : OpInfo.fields()) {
-      OS << Indent << "tmp |= fieldFromInstruction(insn, " << Base << ", "
-         << Width << ')';
+      OS << Indent << "tmp |= extractBits<" << Base << ", " << Width
+         << ">(insn)";
       if (Offset)
         OS << " << " << Offset;
       OS << ";\n";

These differ from `fieldFromInstruction()` in that StartBit and NumBits
are the template parameters.

Using them in the generated code significantly speeds up compilation in
release builds (up to 4 times faster, depending on the used compiler).
@s-barannikov s-barannikov force-pushed the tablegen/decoder/extract-bits branch from 7ddc9c1 to 07c9725 Compare September 17, 2025 17:15
@s-barannikov
Copy link
Contributor Author

I'll have to drop static_asserts because they trigger on code that will never be executed (remember that decodeToMCInst has switch cases for all instruction widths). It would be nice if we enabled --specialize-decoders-per-bitwidth unconditionally, then the asserts could be helpful.

@jurahul
Copy link
Contributor

jurahul commented Sep 17, 2025

Any idea why compilation is faster? I'd expect templating code will slow down compilation

@s-barannikov
Copy link
Contributor Author

Any idea why compilation is faster? I'd expect templating code will slow down compilation

I've seen InstCombine, SimplifyCFG, RegisterCoalescer and TwoAddressInstructionPass in the time report. I'll try to collect more data.
Can you try this on your downstream target without use-fn-table-in-decode-to-mcinst and see if it is still helpful?

@jurahul
Copy link
Contributor

jurahul commented Sep 17, 2025

Any idea why compilation is faster? I'd expect templating code will slow down compilation

I've seen InstCombine, SimplifyCFG, RegisterCoalescer and TwoAddressInstructionPass in the time report. I'll try to collect more data. Can you try this on your downstream target without use-fn-table-in-decode-to-mcinst and see if it is still helpful?

Nah, I was just curious. Trying this on our downstream target is not that easy as we are a currently quite a bit behind from upstream.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s-barannikov
Copy link
Contributor Author

Sorry, I can no longer reproduce the announced speedup. There is a difference, but it is small.
Either I had some unstaged changes that made the difference, or I just compared different things. 🤦‍♂️ The signal was stable though :)

@s-barannikov
Copy link
Contributor Author

It looks like I compared with and without the --specialize-decoders-per-bitwidth option, at least the numbers look similar to what I was getting.
Closing.

@s-barannikov s-barannikov deleted the tablegen/decoder/extract-bits branch September 24, 2025 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:mc Machine (object) code tablegen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants