Skip to content

Conversation

@Jinjie-Huang
Copy link
Contributor

@Jinjie-Huang Jinjie-Huang commented Oct 28, 2025

In some edge cases, a binary may contain direct branch or call instructions whose target do not point to a valid executable instruction. This can occur due to compiler bugs, hand-written assembly, binary corruption, or when control flow targets a data by mistake.

We also encountered the problems as described in this issue, where "data in code" within OpenSSL's hand-written assembly was misidentified as instructions(island identification seems fail due to the absence of a corresponding data symbol). The problem occurred because a data sequence was incorrectly disassembled as a "jb" instruction.
The point here is that the data should not be pointed to by any edge, so this patch tries to address this by validating the destination address for direct branches and calls. If the target instruction is invalid(implies a corrupted control flow), this function will be set ignored.

Although this approach appears helpful for addressing the 'data in code' problem, its validation might be compromised if the data can be disassembled as normal instruction.

@llvmbot
Copy link
Member

llvmbot commented Oct 28, 2025

@llvm/pr-subscribers-bolt

Author: Jinjie Huang (Jinjie-Huang)

Changes

In some edge cases, a binary may contain direct branch or call instructions whose targets do not point to the beginning of a valid, executable instruction. This can occur due to compiler bugs, hand-written assembly, binary corruption, or when control flow targets a data section by mistake.

We also encountered the problems where "data in code" within OpenSSL's hand-written assembly was misidentified as instructions, the same as described in this issue.

The problem occurred because a data sequence was incorrectly disassembled as a "jb" instruction. So, this patch tries to address this by validating the instruction at the target address for direct branches and calls. If the target instruction is invalid(implies a corrupted control flow), and the function will be set ignored.


Full diff: https://github.com/llvm/llvm-project/pull/165406.diff

2 Files Affected:

  • (modified) bolt/include/bolt/Core/BinaryFunction.h (+7)
  • (modified) bolt/lib/Core/BinaryFunction.cpp (+40)
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index b215a1558cbb4..6fdc8336bf1b9 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -2236,6 +2236,13 @@ class BinaryFunction {
   /// it is probably another function.
   bool isSymbolValidInScope(const SymbolRef &Symbol, uint64_t SymbolSize) const;
 
+  /// Validates if the target of a direct branch/call is a valid
+  /// executable instruction.
+  /// Return true if the target is valid, false otherwise.
+  bool validateBranchTarget(uint64_t TargetAddress,
+                            uint64_t AbsoluteInstrAddr,
+                            const ArrayRef<uint8_t> &CurrentFunctionData);
+
   /// Disassemble function from raw data.
   /// If successful, this function will populate the list of instructions
   /// for this function together with offsets from the function start
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 84023efe1084e..821a302d14ba1 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1283,6 +1283,41 @@ BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
   return std::nullopt;
 }
 
+bool BinaryFunction::validateBranchTarget(
+    uint64_t TargetAddress, uint64_t AbsoluteInstrAddr,
+    const ArrayRef<uint8_t> &CurrentFunctionData) {
+  if (auto *TargetFunc =
+          BC.getBinaryFunctionContainingAddress(TargetAddress)) {
+    const uint64_t TargetOffset = TargetAddress - TargetFunc->getAddress();
+    ArrayRef<uint8_t> TargetFunctionData;
+    // Check if the target address is within the current function.
+    if (TargetFunc == this) {
+      TargetFunctionData = CurrentFunctionData;
+    } else {
+      // external call/branch, fetch the binary data for target
+      ErrorOr<ArrayRef<uint8_t>> TargetDataOrErr = TargetFunc->getData();
+      assert(TargetDataOrErr && "function data is not available");
+      TargetFunctionData = *TargetDataOrErr;
+    }
+
+    MCInst TargetInst;
+    uint64_t TargetInstSize;
+    if (!BC.SymbolicDisAsm->getInstruction(
+            TargetInst, TargetInstSize, TargetFunctionData.slice(TargetOffset),
+            TargetAddress, nulls())) {
+      // If the target address cannot be disassembled well,
+      // it implies a corrupted control flow.
+      BC.errs() << "BOLT-WARNING: direct branch/call at 0x"
+                << Twine::utohexstr(AbsoluteInstrAddr) << " in function "
+                << *this << " targets an invalid instruction at 0x"
+                << Twine::utohexstr(TargetAddress) << "\n";
+      return false;
+    }
+  }
+
+  return true;
+}
+
 Error BinaryFunction::disassemble() {
   NamedRegionTimer T("disassemble", "Disassemble function", "buildfuncs",
                      "Build Binary Functions", opts::TimeBuild);
@@ -1396,6 +1431,11 @@ Error BinaryFunction::disassemble() {
       uint64_t TargetAddress = 0;
       if (MIB->evaluateBranch(Instruction, AbsoluteInstrAddr, Size,
                               TargetAddress)) {
+        if (!validateBranchTarget(TargetAddress, AbsoluteInstrAddr,
+                                  FunctionData)) {
+          setIgnored();
+          break;
+        }
         // Check if the target is within the same function. Otherwise it's
         // a call, possibly a tail call.
         //

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Jinjie-Huang Jinjie-Huang marked this pull request as draft October 29, 2025 03:05
@Jinjie-Huang Jinjie-Huang changed the title [BOLT] Validate that direct branch/call targets are valid instructions [BOLT] Validate direct call/branch targets, ignore those invalid functions Oct 29, 2025
@Jinjie-Huang Jinjie-Huang changed the title [BOLT] Validate direct call/branch targets, ignore those invalid functions [BOLT] Add validation for direct call/branch targets, bypassing invalid functions. Oct 29, 2025
@Jinjie-Huang Jinjie-Huang changed the title [BOLT] Add validation for direct call/branch targets, bypassing invalid functions. [BOLT] Add validation for direct call/branch targets, bypassing invalid functions Oct 29, 2025
@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_validata_branch_target branch 2 times, most recently from d981ff9 to df40e1a Compare October 29, 2025 15:04
@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_validata_branch_target branch from 8978366 to 25b21b9 Compare October 31, 2025 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants