-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit #169630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit #169630
Conversation
Created using spr 1.3.7
|
@llvm/pr-subscribers-lldb Author: Felipe de Azevedo Piovezan (felipepiovezan) ChangesCurrently, UnwindAssemblyInstEmulation visits instructions in the order
() Only push if this instruction hasn't been enqueued before. Note that:
This means that consecutive instructions are visited one after the The main reason this patch is NFCI and not NFC is that, now, the The motivation for this patch is to change step 2.2 so that it only commit-id:dce6b515 Full diff: https://github.com/llvm/llvm-project/pull/169630.diff 1 Files Affected:
diff --git a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
index b9d665902dc45..074746c0fab3d 100644
--- a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
+++ b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
@@ -25,6 +25,8 @@
#include "lldb/Utility/Log.h"
#include "lldb/Utility/Status.h"
#include "lldb/Utility/StreamString.h"
+#include "llvm/ADT/SmallSet.h"
+#include <deque>
using namespace lldb;
using namespace lldb_private;
@@ -150,29 +152,38 @@ bool UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly(
EmulateInstruction::InstructionCondition last_condition =
EmulateInstruction::UnconditionalCondition;
- for (const InstructionSP &inst : inst_list.Instructions()) {
- if (!inst)
- continue;
- DumpInstToLog(log, *inst, inst_list);
+ std::deque<std::size_t> to_visit = {0};
+ llvm::SmallSet<std::size_t, 0> enqueued = {0};
+
+ // Instructions reachable through jumps are inserted on the front.
+ // The next instruction in inserted on the back.
+ // Pop from the back to ensure non-branching instructions are visited
+ // sequentially.
+ while (!to_visit.empty()) {
+ std::size_t current_index = to_visit.back();
+ Instruction &inst = *inst_list.GetInstructionAtIndex(current_index);
+ to_visit.pop_back();
+ DumpInstToLog(log, inst, inst_list);
m_curr_row_modified = false;
m_forward_branch_offset = 0;
lldb::addr_t current_offset =
- inst->GetAddress().GetFileAddress() - base_addr;
+ inst.GetAddress().GetFileAddress() - base_addr;
auto it = saved_unwind_states.upper_bound(current_offset);
assert(it != saved_unwind_states.begin() &&
"Unwind row for the function entry missing");
--it; // Move it to the row corresponding to the current offset
- // If the offset of m_state.row doesn't match with the offset we see in
- // saved_unwind_states then we have to update current unwind state to
- // the saved values. It is happening after we processed an epilogue and a
- // return to caller instruction.
+ // When state is forwarded through a branch, the offset of m_state.row is
+ // different from the offset available in saved_unwind_states. Use the
+ // forwarded state in this case, as the previous instruction may have been
+ // an unconditional jump.
+ // FIXME: this assignment can always be done unconditionally.
if (it->second.row.GetOffset() != m_state.row.GetOffset())
m_state = it->second;
- m_inst_emulator_up->SetInstruction(inst->GetOpcode(), inst->GetAddress(),
+ m_inst_emulator_up->SetInstruction(inst.GetOpcode(), inst.GetAddress(),
nullptr);
const EmulateInstruction::InstructionCondition new_condition =
m_inst_emulator_up->GetInstructionCondition();
@@ -199,13 +210,21 @@ bool UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly(
// If the current instruction is a branch forward then save the current
// CFI information for the offset where we are branching.
+ Address branch_address = inst.GetAddress();
+ branch_address.Slide(m_forward_branch_offset);
if (m_forward_branch_offset != 0 &&
- range.ContainsFileAddress(inst->GetAddress().GetFileAddress() +
- m_forward_branch_offset)) {
+ range.ContainsFileAddress(branch_address.GetFileAddress())) {
if (auto [it, inserted] = saved_unwind_states.emplace(
current_offset + m_forward_branch_offset, m_state);
- inserted)
+ inserted) {
it->second.row.SetOffset(current_offset + m_forward_branch_offset);
+ if (std::size_t dest_instr_index =
+ inst_list.GetIndexOfInstructionAtAddress(branch_address);
+ dest_instr_index < inst_list.GetSize()) {
+ to_visit.push_front(dest_instr_index);
+ enqueued.insert(dest_instr_index);
+ }
+ }
}
// Were there any changes to the CFI while evaluating this instruction?
@@ -213,12 +232,17 @@ bool UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly(
// Save the modified row if we don't already have a CFI row in the
// current address
const lldb::addr_t next_inst_offset =
- current_offset + inst->GetOpcode().GetByteSize();
+ current_offset + inst.GetOpcode().GetByteSize();
if (saved_unwind_states.count(next_inst_offset) == 0) {
m_state.row.SetOffset(next_inst_offset);
saved_unwind_states.emplace(next_inst_offset, m_state);
}
}
+
+ const size_t next_idx = current_index + 1;
+ const bool never_enqueued = enqueued.insert(next_idx).second;
+ if (never_enqueued && next_idx < inst_list.GetSize())
+ to_visit.push_back(next_idx);
}
for (auto &[_, state] : saved_unwind_states)
|
|
Argh, I wish github stopped formatting commit messages when displaying them as the PR message. |
lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
Outdated
Show resolved
Hide resolved
|
Some of the |
DavidSpickett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see anything worrying, but Jason's the expert in this area.
| // Pop from the back to ensure non-branching instructions are visited | ||
| // sequentially. | ||
| while (!to_visit.empty()) { | ||
| std::size_t current_index = to_visit.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pop_back here. So that there's one less line where the deque is in the in-between state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pop_back does not return the value popped. You might be thinking of pop_back_val, which is an LLVM ADT function available in things like SmallVector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I just forgot that C++ be weird sometimes.
I'd put the pop_back() immediately after the back() to make it clear there isn't a state in this code where you rely on having a copy of the index but that index still being the back of the queue.
But it's only one line away so do whatever you prefer.
Surprising to me there isn't a pop_back that returns a value but if there were, you'd be "paying" for the copy and hoping the compiler was smart enough to discard it. And you know how C++ is with paying for things you don't use.
| // Pop from the back to ensure non-branching instructions are visited | ||
| // sequentially. | ||
| while (!to_visit.empty()) { | ||
| std::size_t current_index = to_visit.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current index remains unmodified, right?
| std::size_t current_index = to_visit.back(); | |
| const std::size_t current_index = to_visit.back(); |
| std::deque<std::size_t> to_visit = {0}; | ||
| llvm::SmallSet<std::size_t, 0> enqueued = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of initializing them like this? Why not
std::deque<std::size_t> to_visit;
llvm::SmallSet<std::size_t, 0> enqueued;
This seems needlessly confusing. Now I'm forced to double check that this is initializing both to zero elements and not adding a zero to the list (i.e. = {{0}}).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding zero to the list! Otherwise the loop below would never be entered 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, okay case in point that it's confusing, but at least it's confusing for the right reasons. My mistake was that for it to call the ctor it would have to omit the =.
Created using spr 1.3.7
|
It is the thanksgiving holiday in the US, so I will ping Jason next week |
Currently, UnwindAssemblyInstEmulation visits instructions in the order
in which they appear in a function. This commit makes an NFCI change to
UnwindAssemblyInstEmulation so that it follows the function's CFG:
2.1 Visit the instruction in the back queue to compute the new unwind
state.
2.2 Push(+) the next instruction to the back of the queue.
2.3 If the instruction is a forward branch with a known branch target,
push(+) the destination instruction to the front of the queue.
(+) Only push if this instruction hasn't been enqueued before.
(+) When pushing an instruction, the current unwind state is attached to
it.
Note that:
This means that consecutive instructions are visited one after the
other; this is important to support "conditional blocks" 1 of
instructions (see the line with "if last_condition != new_condition").
This is arguably a very Thumb specific thing, so maybe it shouldn't be
in the generic algorithm; that said, it is already in the code, so we
have to support it.
The main reason this patch is NFCI and not NFC is that, now, the
destination of a forward branch is visited in a slightly different
moment than before. This should not cause any changes in output, as if a
branch destination is reachable through two different paths, any well
behaved compiler will generate the same unwind state in both paths.
The motivation for this patch is to change step 2.2 so that it only
pushes the next instruction if the current instruction is not an
unconditional branch / return, and to change step 2.3 so that backwards
branches are also allowed, fixing the bug described by 2.
Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit #169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation #169631
[lldb] Add DisassemblerLLVMC::IsBarrier API #169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633
commit-id:dce6b515