Skip to content

[lldb] Add DWARFExpressionEntry and GetExpressionEntryAtAddress() to … #144238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 12, 2025

Conversation

UltimateForce21
Copy link
Contributor

This patch introduces a new struct and helper API in DWARFExpressionList to expose
variable location metadata (base address, end address, and DWARFExpression pointer) for
a given PC address. It will be used in later patches to annotate disassembly instructions
with source-level variable locations.

New struct

/// Represents one entry in a DWARFExpressionList, with its range and expr.
struct DWARFExpressionEntry {
  lldb::addr_t base;           // file‐address start of this location range
  lldb::addr_t end;            // file‐address end of this range (exclusive)
  const DWARFExpression *expr; // the DWARF expression for this range
};

New API

/// Retrieve the DWARFExpressionEntry covering a particular instruction.
///
/// \param func_load_addr
///     The load address of the start of the function containing this location list;
///     used to translate between file offsets and load addresses. If this is
///     LLDB_INVALID_ADDRESS, the stored CU base (m_func_file_addr) is used.
///
/// \param load_addr
///     The load address of the *current* PC (i.e., the instruction for which
///     we want its variable‐location entry). We first convert this back into
///     the function’s file‐address space to find the correct DWARF range.
///
/// \returns
///     On success, an entry whose `[base,end)` covers this PC; else an Error.
llvm::Expected<DWARFExpressionEntry>
GetExpressionEntryAtAddress(lldb::addr_t func_load_addr,
                            lldb::addr_t load_addr) const;

Rationale

LLDB already provides:

const DWARFExpression *
GetExpressionAtAddress(lldb::addr_t func_load_addr,
                       lldb::addr_t load_addr) const;

However, this only returns the DWARF expression itself, without the file‐address start (base) and end (end) of the location range. Those bounds are crucial for:

  1. Detecting range beginnings: render a var = annotation exactly when a variable’s live‐range starts.

  2. Detecting range continuation: optionally display a “|” on subsequent instructions in the same range.

  3. Detecting state changes: know when a variable moves (e.g. from one register to another), becomes a constant, or goes out of scope.

These primitives form the foundation for the Rich Disassembler feature proposed for GSoC 25.

…DWARFExpressionList

This introduces a new API for retrieving DWARF expression metadata associated with variable location entries at a given PC address. It provides the base, end, and expression pointer for downstream consumers such as disassembler annotations.

Intended for use in richer instruction annotations in Instruction::Dump().
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Jun 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 14, 2025

@llvm/pr-subscribers-lldb

Author: Abdullah Mohammad Amin (UltimateForce21)

Changes

This patch introduces a new struct and helper API in DWARFExpressionList to expose
variable location metadata (base address, end address, and DWARFExpression pointer) for
a given PC address. It will be used in later patches to annotate disassembly instructions
with source-level variable locations.

New struct

/// Represents one entry in a DWARFExpressionList, with its range and expr.
struct DWARFExpressionEntry {
  lldb::addr_t base;           // file‐address start of this location range
  lldb::addr_t end;            // file‐address end of this range (exclusive)
  const DWARFExpression *expr; // the DWARF expression for this range
};

New API

/// Retrieve the DWARFExpressionEntry covering a particular instruction.
///
/// \param func_load_addr
///     The load address of the start of the function containing this location list;
///     used to translate between file offsets and load addresses. If this is
///     LLDB_INVALID_ADDRESS, the stored CU base (m_func_file_addr) is used.
///
/// \param load_addr
///     The load address of the *current* PC (i.e., the instruction for which
///     we want its variable‐location entry). We first convert this back into
///     the function’s file‐address space to find the correct DWARF range.
///
/// \returns
///     On success, an entry whose `[base,end)` covers this PC; else an Error.
llvm::Expected&lt;DWARFExpressionEntry&gt;
GetExpressionEntryAtAddress(lldb::addr_t func_load_addr,
                            lldb::addr_t load_addr) const;

Rationale

LLDB already provides:

const DWARFExpression *
GetExpressionAtAddress(lldb::addr_t func_load_addr,
                       lldb::addr_t load_addr) const;

However, this only returns the DWARF expression itself, without the file‐address start (base) and end (end) of the location range. Those bounds are crucial for:

  1. Detecting range beginnings: render a var = <location> annotation exactly when a variable’s live‐range starts.

  2. Detecting range continuation: optionally display a “|” on subsequent instructions in the same range.

  3. Detecting state changes: know when a variable moves (e.g. from one register to another), becomes a constant, or goes out of scope.

These primitives form the foundation for the Rich Disassembler feature proposed for GSoC 25.


Full diff: https://github.com/llvm/llvm-project/pull/144238.diff

2 Files Affected:

  • (modified) lldb/include/lldb/Expression/DWARFExpressionList.h (+12)
  • (modified) lldb/source/Expression/DWARFExpressionList.cpp (+21)
diff --git a/lldb/include/lldb/Expression/DWARFExpressionList.h b/lldb/include/lldb/Expression/DWARFExpressionList.h
index d8f8ec247ed56..a329b37393018 100644
--- a/lldb/include/lldb/Expression/DWARFExpressionList.h
+++ b/lldb/include/lldb/Expression/DWARFExpressionList.h
@@ -59,6 +59,18 @@ class DWARFExpressionList {
 
   lldb::addr_t GetFuncFileAddress() { return m_func_file_addr; }
 
+  /// Represents an entry in the DWARFExpressionList with all needed metadata
+  struct DWARFExpressionEntry {
+    lldb::addr_t base;
+    lldb::addr_t end;
+    const DWARFExpression *expr;
+  };
+
+  /// Returns the entry (base, end, data) for a given PC address
+  llvm::Expected<DWARFExpressionEntry>
+  GetExpressionEntryAtAddress(lldb::addr_t func_load_addr,
+                              lldb::addr_t load_addr) const;
+
   const DWARFExpression *GetExpressionAtAddress(lldb::addr_t func_load_addr,
                                                 lldb::addr_t load_addr) const;
 
diff --git a/lldb/source/Expression/DWARFExpressionList.cpp b/lldb/source/Expression/DWARFExpressionList.cpp
index 04592a1eb7ff4..b55bc7120c4af 100644
--- a/lldb/source/Expression/DWARFExpressionList.cpp
+++ b/lldb/source/Expression/DWARFExpressionList.cpp
@@ -53,6 +53,27 @@ bool DWARFExpressionList::ContainsAddress(lldb::addr_t func_load_addr,
   return GetExpressionAtAddress(func_load_addr, addr) != nullptr;
 }
 
+llvm::Expected<DWARFExpressionList::DWARFExpressionEntry>
+DWARFExpressionList::GetExpressionEntryAtAddress(lldb::addr_t func_load_addr,
+                                                 lldb::addr_t load_addr) const {
+  if (const DWARFExpression *expr = GetAlwaysValidExpr()) {
+    return DWARFExpressionEntry{0, LLDB_INVALID_ADDRESS, expr};
+  }
+
+  if (func_load_addr == LLDB_INVALID_ADDRESS)
+    func_load_addr = m_func_file_addr;
+
+  addr_t addr = load_addr - func_load_addr + m_func_file_addr;
+  uint32_t index = m_exprs.FindEntryIndexThatContains(addr);
+  if (index == UINT32_MAX) {
+    return llvm::createStringError(llvm::inconvertibleErrorCode(),
+                                   "No DWARF expression found for address 0x%llx", addr);
+  }
+
+  const Entry &entry = *m_exprs.GetEntryAtIndex(index);
+  return DWARFExpressionEntry{entry.base, entry.GetRangeEnd(), &entry.data};
+}
+
 const DWARFExpression *
 DWARFExpressionList::GetExpressionAtAddress(lldb::addr_t func_load_addr,
                                             lldb::addr_t load_addr) const {

@UltimateForce21
Copy link
Contributor Author

@adrian-prantl @JDevlieghere I’ve opened my first PR for the new DWARFExpression API. Looking forward to your feedback!

Comment on lines 64 to 65
lldb::addr_t base;
lldb::addr_t end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these file or load addresses? If you use lldb::addr_t (as opposed to the Address class), it's always a good idea to make that part of the variable name (like you did for func_load_addr) or at least add a Doxygen-style comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you do want to use Address, there is already an AddressRange class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are file addresses (without offsets applied), I will try to make that more clear both in the comments and in the variable names. I'll also take a look into the AddressRange class and see if I can utilize it, thank you for the pointers.

uint32_t index = m_exprs.FindEntryIndexThatContains(addr);
if (index == UINT32_MAX) {
return llvm::createStringError(llvm::inconvertibleErrorCode(),
"No DWARF expression found for address 0x%llx", addr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally using Expected is desirable, especially if there is actionable information in the error message (and information that only this function can provide). Lookups are borderline cases. If it's expected during normal operation that client will call this API and the lookup fails, then I would return a std::optional instead. The client can turn this into a more targeted error message if needed.
In short — I would return a std::optional here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I will refactor it to return std::optional instead.

const Entry &entry = *m_exprs.GetEntryAtIndex(index);
return DWARFExpressionEntry{entry.base, entry.GetRangeEnd(), &entry.data};
}

const DWARFExpression *
DWARFExpressionList::GetExpressionAtAddress(lldb::addr_t func_load_addr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: Do we still need this API, or is it just a subset of the function above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know the new function GetExpressionEntryAtAddress, does almost exact same thing as this one. I don't see this function being used anywhere else in the lldb code base, so maybe it can be removed, but to be safe I did not want to alter it. But from what I can tell, it should be okay to remove it. We can also choose to keep it and maybe refactor it to just call the GetExpressionEntryAtAddress and just return the DWARFExpression *expr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's completely unused, and there is no indication that was added in support of any downstream clients (either as a comment or in the commit message) we could consider deleting it in a separate PR. Not urgent.

UltimateForce21 and others added 5 commits June 19, 2025 11:55
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
updating code style for function GetExpressionEntryAtAddress

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention.

Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage
Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.

/// Represents an entry in the DWARFExpressionList with all needed metadata.
struct DWARFExpressionEntry {
AddressRange file_range; /// Represents a DWARF location range in the DWARF unit’s file‐address space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either:

AddressRange file_range; ///< Represents a DWARF location range in the DWARF unit’s file‐address space.

or (preferred)

 /// Represents a DWARF location range in the DWARF unit’s file‐address space.
 AddressRange file_range;

func_load_addr = m_func_file_addr;
lldb::addr_t file_pc = load_addr - func_load_addr + m_func_file_addr;

uint32_t idx = m_exprs.FindEntryIndexThatContains(file_pc);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use

  const Entry *FindEntryThatContains(B addr) const {
```?

Copy link
Contributor Author

@UltimateForce21 UltimateForce21 Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure, seems like it would be easier for implementation, I was just following the same logic, as

const DWARFExpression *
DWARFExpressionList::GetExpressionAtAddress(lldb::addr_t func_load_addr,
                                            lldb::addr_t load_addr) const ()

will update to use const Entry *FindEntryThatContains(B addr) const () instead

UltimateForce21 and others added 3 commits June 28, 2025 12:59
Updated commenting style for struct DWARFExpressionEntry
Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry>
DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`
func_load_addr = m_func_file_addr;

// translate to file-relative PC
lldb::addr_t file_pc = load_addr - func_load_addr + m_func_file_addr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be using subOverflow (https://llvm.org/doxygen/MathExtras_8h.html) here to guard against wrap-around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug into MathExtras and found that its AddOverflow and SubOverflow helpers are only defined for signed types. I they rely on two’s-complement signed overflow semantics and return the truncated signed result. Since lldb::addr_t is an unsigned address, I think using those helpers would either fail to compile or silently convert to signed semantics. So I tried to implement the signed overflow check into the function. Please let me know if you think it is alright.

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with under/overflow checks added.

UltimateForce21 and others added 2 commits July 2, 2025 11:45
Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>
This patch adds explicit checks:
  - ensure `load_addr >= func_load_addr` to avoid underflow,
  - compute and verify a temporary delta variable, then verify `delta + m_func_file_addr` does
    not exceed `addr_t` max to avoid overflow.
DWARFExpressionList::GetExpressionEntryAtAddress(lldb::addr_t func_load_addr,
lldb::addr_t load_addr) const {
if (const DWARFExpression *always = GetAlwaysValidExpr()) {
AddressRange full_range(m_func_file_addr, /*size=*/LLDB_INVALID_ADDRESS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the argument is byte_size so let's match that.

Suggested change
AddressRange full_range(m_func_file_addr, /*size=*/LLDB_INVALID_ADDRESS);
AddressRange full_range(m_func_file_addr, /*byte_size=*/LLDB_INVALID_ADDRESS);

I'm somewhat skeptical of LLDB_INVALID_ADDRESS as the size. Is there precedent for using that to indicate we don't know the size? I'm worried about whether the existing code is resilient to that. Would it make sense to make the AddressRange optional in the struct for the always-valid-single-expression case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe using an optional AddressRange the way you said is a safer option and probably more clear for users, I have committed the change.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

github-actions bot commented Jul 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@JDevlieghere JDevlieghere merged commit 7cb8439 into llvm:main Jul 12, 2025
9 checks passed
Copy link

@UltimateForce21 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants