[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit #169630

felipepiovezan · 2025-11-26T10:55:16Z

Currently, UnwindAssemblyInstEmulation visits instructions in the order
in which they appear in a function. This commit makes an NFCI change to
UnwindAssemblyInstEmulation so that it follows the function's CFG:

The first instruction is enqueued.
While the queue is not empty:
2.1 Visit the instruction in the back queue to compute the new unwind
state.
2.2 Push(+) the next instruction to the back of the queue.
2.3 If the instruction is a forward branch with a known branch target,
push(+) the destination instruction to the front of the queue.

(+) Only push if this instruction hasn't been enqueued before.
(+) When pushing an instruction, the current unwind state is attached to
it.

Note that:

the "next instruction" is pushed to the back of the queue,
a branch target is pushed to the front of the queue, and
we always dequeue from the back of the queue.

This means that consecutive instructions are visited one after the
other; this is important to support "conditional blocks" 1 of
instructions (see the line with "if last_condition != new_condition").
This is arguably a very Thumb specific thing, so maybe it shouldn't be
in the generic algorithm; that said, it is already in the code, so we
have to support it.

The main reason this patch is NFCI and not NFC is that, now, the
destination of a forward branch is visited in a slightly different
moment than before. This should not cause any changes in output, as if a
branch destination is reachable through two different paths, any well
behaved compiler will generate the same unwind state in both paths.

The motivation for this patch is to change step 2.2 so that it only
pushes the next instruction if the current instruction is not an
unconditional branch / return, and to change step 2.3 so that backwards
branches are also allowed, fixing the bug described by 2.

Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit #169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation #169631
[lldb] Add DisassemblerLLVMC::IsBarrier API #169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633

commit-id:dce6b515

Created using spr 1.3.7

llvmbot · 2025-11-26T10:55:55Z

@llvm/pr-subscribers-lldb

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

Currently, UnwindAssemblyInstEmulation visits instructions in the order
in which they appear in a function. This commit makes an NFCI change to
UnwindAssemblyInstEmulation so that it follows the function's CFG:

The first instruction is enqueued.
While the queue is not empty:
2.1 Visit the instruction in the back queue to compute the new unwind
state.
2.2 Push() the next instruction to the back of the queue.
2.3 If the instruction is a forward branch with a known branch target,
push() the destination instruction to the front of the queue.

() Only push if this instruction hasn't been enqueued before.
() When pushing an instruction, the current unwind state is attached to
it.

Note that:

the "next instruction" is pushed to the back of the queue,
a branch target is pushed to the front of the queue, and
we always dequeue from the back of the queue.

This means that consecutive instructions are visited one after the
other; this is important to support "conditional blocks" 1 of
instructions (see the line with "if last_condition != new_condition").
This is arguably a very Thumb specific thing, so maybe it shouldn't be
in the generic algorithm; that said, it is already in the code, so we
have to support it.

The main reason this patch is NFCI and not NFC is that, now, the
destination of a forward branch is visited in a slightly different
moment than before. This should not cause any changes in output, as if a
branch destination is reachable through two different paths, any well
behaved compiler will generate the same unwind state in both paths.

The motivation for this patch is to change step 2.2 so that it only
pushes the next instruction if the current instruction is not an
unconditional branch / return, and to change step 2.3 so that backwards
branches are also allowed, fixing the bug described by 2.

commit-id:dce6b515

Full diff: https://github.com/llvm/llvm-project/pull/169630.diff

1 Files Affected:

(modified) lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp (+38-14)

diff --git a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
index b9d665902dc45..074746c0fab3d 100644
--- a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
+++ b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
@@ -25,6 +25,8 @@
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/Status.h"
 #include "lldb/Utility/StreamString.h"
+#include "llvm/ADT/SmallSet.h"
+#include <deque>
 
 using namespace lldb;
 using namespace lldb_private;
@@ -150,29 +152,38 @@ bool UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly(
   EmulateInstruction::InstructionCondition last_condition =
       EmulateInstruction::UnconditionalCondition;
 
-  for (const InstructionSP &inst : inst_list.Instructions()) {
-    if (!inst)
-      continue;
-    DumpInstToLog(log, *inst, inst_list);
+  std::deque<std::size_t> to_visit = {0};
+  llvm::SmallSet<std::size_t, 0> enqueued = {0};
+
+  // Instructions reachable through jumps are inserted on the front.
+  // The next instruction in inserted on the back.
+  // Pop from the back to ensure non-branching instructions are visited
+  // sequentially.
+  while (!to_visit.empty()) {
+    std::size_t current_index = to_visit.back();
+    Instruction &inst = *inst_list.GetInstructionAtIndex(current_index);
+    to_visit.pop_back();
+    DumpInstToLog(log, inst, inst_list);
 
     m_curr_row_modified = false;
     m_forward_branch_offset = 0;
 
     lldb::addr_t current_offset =
-        inst->GetAddress().GetFileAddress() - base_addr;
+        inst.GetAddress().GetFileAddress() - base_addr;
     auto it = saved_unwind_states.upper_bound(current_offset);
     assert(it != saved_unwind_states.begin() &&
            "Unwind row for the function entry missing");
     --it; // Move it to the row corresponding to the current offset
 
-    // If the offset of m_state.row doesn't match with the offset we see in
-    // saved_unwind_states then we have to update current unwind state to
-    // the saved values. It is happening after we processed an epilogue and a
-    // return to caller instruction.
+    // When state is forwarded through a branch, the offset of m_state.row is
+    // different from the offset available in saved_unwind_states. Use the
+    // forwarded state in this case, as the previous instruction may have been
+    // an unconditional jump.
+    // FIXME: this assignment can always be done unconditionally.
     if (it->second.row.GetOffset() != m_state.row.GetOffset())
       m_state = it->second;
 
-    m_inst_emulator_up->SetInstruction(inst->GetOpcode(), inst->GetAddress(),
+    m_inst_emulator_up->SetInstruction(inst.GetOpcode(), inst.GetAddress(),
                                        nullptr);
     const EmulateInstruction::InstructionCondition new_condition =
         m_inst_emulator_up->GetInstructionCondition();
@@ -199,13 +210,21 @@ bool UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly(
 
     // If the current instruction is a branch forward then save the current
     // CFI information for the offset where we are branching.
+    Address branch_address = inst.GetAddress();
+    branch_address.Slide(m_forward_branch_offset);
     if (m_forward_branch_offset != 0 &&
-        range.ContainsFileAddress(inst->GetAddress().GetFileAddress() +
-                                  m_forward_branch_offset)) {
+        range.ContainsFileAddress(branch_address.GetFileAddress())) {
       if (auto [it, inserted] = saved_unwind_states.emplace(
               current_offset + m_forward_branch_offset, m_state);
-          inserted)
+          inserted) {
         it->second.row.SetOffset(current_offset + m_forward_branch_offset);
+        if (std::size_t dest_instr_index =
+                inst_list.GetIndexOfInstructionAtAddress(branch_address);
+            dest_instr_index < inst_list.GetSize()) {
+          to_visit.push_front(dest_instr_index);
+          enqueued.insert(dest_instr_index);
+        }
+      }
     }
 
     // Were there any changes to the CFI while evaluating this instruction?
@@ -213,12 +232,17 @@ bool UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly(
       // Save the modified row if we don't already have a CFI row in the
       // current address
       const lldb::addr_t next_inst_offset =
-          current_offset + inst->GetOpcode().GetByteSize();
+          current_offset + inst.GetOpcode().GetByteSize();
       if (saved_unwind_states.count(next_inst_offset) == 0) {
         m_state.row.SetOffset(next_inst_offset);
         saved_unwind_states.emplace(next_inst_offset, m_state);
       }
     }
+
+    const size_t next_idx = current_index + 1;
+    const bool never_enqueued = enqueued.insert(next_idx).second;
+    if (never_enqueued && next_idx < inst_list.GetSize())
+      to_visit.push_back(next_idx);
   }
 
   for (auto &[_, state] : saved_unwind_states)

felipepiovezan · 2025-11-26T11:02:45Z

Argh, I wish github stopped formatting commit messages when displaying them as the PR message.

lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp

DavidSpickett · 2025-11-26T13:41:06Z

Some of the * in the PR description got eaten by the Markdown renderer. You could use +, something that isn't meaningful for Markdown.

DavidSpickett

Didn't see anything worrying, but Jason's the expert in this area.

DavidSpickett · 2025-11-26T13:43:31Z

lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp

+  // Pop from the back to ensure non-branching instructions are visited
+  // sequentially.
+  while (!to_visit.empty()) {
+    std::size_t current_index = to_visit.back();


Use pop_back here. So that there's one less line where the deque is in the in-between state.

pop_back does not return the value popped. You might be thinking of pop_back_val, which is an LLVM ADT function available in things like SmallVector

No I just forgot that C++ be weird sometimes.

I'd put the pop_back() immediately after the back() to make it clear there isn't a state in this code where you rely on having a copy of the index but that index still being the back of the queue.

But it's only one line away so do whatever you prefer.

Surprising to me there isn't a pop_back that returns a value but if there were, you'd be "paying" for the copy and hoping the compiler was smart enough to discard it. And you know how C++ is with paying for things you don't use.

JDevlieghere · 2025-11-26T19:42:42Z

lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp

+  // Pop from the back to ensure non-branching instructions are visited
+  // sequentially.
+  while (!to_visit.empty()) {
+    std::size_t current_index = to_visit.back();


The current index remains unmodified, right?

Suggested change

std::size_t current_index = to_visit.back();

const std::size_t current_index = to_visit.back();

JDevlieghere · 2025-11-26T19:46:27Z

lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp

+  std::deque<std::size_t> to_visit = {0};
+  llvm::SmallSet<std::size_t, 0> enqueued = {0};


What's the point of initializing them like this? Why not

std::deque<std::size_t> to_visit; llvm::SmallSet<std::size_t, 0> enqueued;

This seems needlessly confusing. Now I'm forced to double check that this is initializing both to zero elements and not adding a zero to the list (i.e. = {{0}}).

I am adding zero to the list! Otherwise the loop below would never be entered 😅

Hah, okay case in point that it's confusing, but at least it's confusing for the right reasons. My mistake was that for it to call the ctor it would have to omit the =.

Created using spr 1.3.7

felipepiovezan · 2025-11-27T09:55:53Z

It is the thanksgiving holiday in the US, so I will ping Jason next week

[𝘀𝗽𝗿] initial version

9295ef8

Created using spr 1.3.7

felipepiovezan requested a review from JDevlieghere as a code owner November 26, 2025 10:55

llvmbot added the lldb label Nov 26, 2025

This was referenced Nov 26, 2025

[lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation #169631

Open

[lldb] Add DisassemblerLLVMC::IsBarrier API #169632

Open

[lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633

Open

felipepiovezan requested review from DavidSpickett and jasonmolenda November 26, 2025 11:21

felipepiovezan commented Nov 26, 2025

View reviewed changes

lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp Outdated Show resolved Hide resolved

DavidSpickett reviewed Nov 26, 2025

View reviewed changes

JDevlieghere reviewed Nov 26, 2025

View reviewed changes

Address review comments

f00f52c

Created using spr 1.3.7

	std::size_t current_index = to_visit.back();
	const std::size_t current_index = to_visit.back();

		std::deque<std::size_t> to_visit = {0};
		llvm::SmallSet<std::size_t, 0> enqueued = {0};

[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit #169630

Are you sure you want to change the base?

[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit #169630

Uh oh!

Conversation

felipepiovezan commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 26, 2025

Uh oh!

felipepiovezan commented Nov 26, 2025

Uh oh!

Uh oh!

DavidSpickett commented Nov 26, 2025

Uh oh!

DavidSpickett left a comment

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

felipepiovezan Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

JDevlieghere Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

JDevlieghere Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

felipepiovezan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

JDevlieghere Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

felipepiovezan commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

felipepiovezan commented Nov 26, 2025 •

edited

Loading