-
Notifications
You must be signed in to change notification settings - Fork 15k
[BOLT] Add validation for direct call/branch targets, bypassing invalid functions #165406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[BOLT] Add validation for direct call/branch targets, bypassing invalid functions #165406
Conversation
|
@llvm/pr-subscribers-bolt Author: Jinjie Huang (Jinjie-Huang) ChangesIn some edge cases, a binary may contain direct We also encountered the problems where "data in code" within OpenSSL's hand-written assembly was misidentified as instructions, the same as described in this issue. The problem occurred because a data sequence was incorrectly disassembled as a "jb" instruction. So, this patch tries to address this by validating the instruction at the target address for direct branches and calls. If the target instruction is invalid(implies a corrupted control flow), and the function will be set ignored. Full diff: https://github.com/llvm/llvm-project/pull/165406.diff 2 Files Affected:
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index b215a1558cbb4..6fdc8336bf1b9 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -2236,6 +2236,13 @@ class BinaryFunction {
/// it is probably another function.
bool isSymbolValidInScope(const SymbolRef &Symbol, uint64_t SymbolSize) const;
+ /// Validates if the target of a direct branch/call is a valid
+ /// executable instruction.
+ /// Return true if the target is valid, false otherwise.
+ bool validateBranchTarget(uint64_t TargetAddress,
+ uint64_t AbsoluteInstrAddr,
+ const ArrayRef<uint8_t> &CurrentFunctionData);
+
/// Disassemble function from raw data.
/// If successful, this function will populate the list of instructions
/// for this function together with offsets from the function start
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 84023efe1084e..821a302d14ba1 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1283,6 +1283,41 @@ BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
return std::nullopt;
}
+bool BinaryFunction::validateBranchTarget(
+ uint64_t TargetAddress, uint64_t AbsoluteInstrAddr,
+ const ArrayRef<uint8_t> &CurrentFunctionData) {
+ if (auto *TargetFunc =
+ BC.getBinaryFunctionContainingAddress(TargetAddress)) {
+ const uint64_t TargetOffset = TargetAddress - TargetFunc->getAddress();
+ ArrayRef<uint8_t> TargetFunctionData;
+ // Check if the target address is within the current function.
+ if (TargetFunc == this) {
+ TargetFunctionData = CurrentFunctionData;
+ } else {
+ // external call/branch, fetch the binary data for target
+ ErrorOr<ArrayRef<uint8_t>> TargetDataOrErr = TargetFunc->getData();
+ assert(TargetDataOrErr && "function data is not available");
+ TargetFunctionData = *TargetDataOrErr;
+ }
+
+ MCInst TargetInst;
+ uint64_t TargetInstSize;
+ if (!BC.SymbolicDisAsm->getInstruction(
+ TargetInst, TargetInstSize, TargetFunctionData.slice(TargetOffset),
+ TargetAddress, nulls())) {
+ // If the target address cannot be disassembled well,
+ // it implies a corrupted control flow.
+ BC.errs() << "BOLT-WARNING: direct branch/call at 0x"
+ << Twine::utohexstr(AbsoluteInstrAddr) << " in function "
+ << *this << " targets an invalid instruction at 0x"
+ << Twine::utohexstr(TargetAddress) << "\n";
+ return false;
+ }
+ }
+
+ return true;
+}
+
Error BinaryFunction::disassemble() {
NamedRegionTimer T("disassemble", "Disassemble function", "buildfuncs",
"Build Binary Functions", opts::TimeBuild);
@@ -1396,6 +1431,11 @@ Error BinaryFunction::disassemble() {
uint64_t TargetAddress = 0;
if (MIB->evaluateBranch(Instruction, AbsoluteInstrAddr, Size,
TargetAddress)) {
+ if (!validateBranchTarget(TargetAddress, AbsoluteInstrAddr,
+ FunctionData)) {
+ setIgnored();
+ break;
+ }
// Check if the target is within the same function. Otherwise it's
// a call, possibly a tail call.
//
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
d981ff9 to
df40e1a
Compare
8978366 to
25b21b9
Compare
In some edge cases, a binary may contain direct
branchorcallinstructions whose target do not point to a valid executable instruction. This can occur due to compiler bugs, hand-written assembly, binary corruption, or when control flow targets a data by mistake.We also encountered the problems as described in this issue, where "data in code" within OpenSSL's hand-written assembly was misidentified as instructions(island identification seems fail due to the absence of a corresponding data symbol). The problem occurred because a data sequence was incorrectly disassembled as a "jb" instruction.
The point here is that the data should not be pointed to by any edge, so this patch tries to address this by validating the destination address for direct branches and calls. If the target instruction is invalid(implies a corrupted control flow), this function will be set ignored.
Although this approach appears helpful for addressing the 'data in code' problem, its validation might be compromised if the data can be disassembled as normal instruction.