-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[lldb] Add DWARFExpressionEntry and GetExpressionEntryAtAddress() to … #144238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Add DWARFExpressionEntry and GetExpressionEntryAtAddress() to … #144238
Conversation
…DWARFExpressionList This introduces a new API for retrieving DWARF expression metadata associated with variable location entries at a given PC address. It provides the base, end, and expression pointer for downstream consumers such as disassembler annotations. Intended for use in richer instruction annotations in Instruction::Dump().
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lldb Author: Abdullah Mohammad Amin (UltimateForce21) ChangesThis patch introduces a new struct and helper API in New struct
New API
RationaleLLDB already provides:
However, this only returns the DWARF expression itself, without the file‐address start (base) and end (end) of the location range. Those bounds are crucial for:
These primitives form the foundation for the Rich Disassembler feature proposed for GSoC 25. Full diff: https://github.com/llvm/llvm-project/pull/144238.diff 2 Files Affected:
diff --git a/lldb/include/lldb/Expression/DWARFExpressionList.h b/lldb/include/lldb/Expression/DWARFExpressionList.h
index d8f8ec247ed56..a329b37393018 100644
--- a/lldb/include/lldb/Expression/DWARFExpressionList.h
+++ b/lldb/include/lldb/Expression/DWARFExpressionList.h
@@ -59,6 +59,18 @@ class DWARFExpressionList {
lldb::addr_t GetFuncFileAddress() { return m_func_file_addr; }
+ /// Represents an entry in the DWARFExpressionList with all needed metadata
+ struct DWARFExpressionEntry {
+ lldb::addr_t base;
+ lldb::addr_t end;
+ const DWARFExpression *expr;
+ };
+
+ /// Returns the entry (base, end, data) for a given PC address
+ llvm::Expected<DWARFExpressionEntry>
+ GetExpressionEntryAtAddress(lldb::addr_t func_load_addr,
+ lldb::addr_t load_addr) const;
+
const DWARFExpression *GetExpressionAtAddress(lldb::addr_t func_load_addr,
lldb::addr_t load_addr) const;
diff --git a/lldb/source/Expression/DWARFExpressionList.cpp b/lldb/source/Expression/DWARFExpressionList.cpp
index 04592a1eb7ff4..b55bc7120c4af 100644
--- a/lldb/source/Expression/DWARFExpressionList.cpp
+++ b/lldb/source/Expression/DWARFExpressionList.cpp
@@ -53,6 +53,27 @@ bool DWARFExpressionList::ContainsAddress(lldb::addr_t func_load_addr,
return GetExpressionAtAddress(func_load_addr, addr) != nullptr;
}
+llvm::Expected<DWARFExpressionList::DWARFExpressionEntry>
+DWARFExpressionList::GetExpressionEntryAtAddress(lldb::addr_t func_load_addr,
+ lldb::addr_t load_addr) const {
+ if (const DWARFExpression *expr = GetAlwaysValidExpr()) {
+ return DWARFExpressionEntry{0, LLDB_INVALID_ADDRESS, expr};
+ }
+
+ if (func_load_addr == LLDB_INVALID_ADDRESS)
+ func_load_addr = m_func_file_addr;
+
+ addr_t addr = load_addr - func_load_addr + m_func_file_addr;
+ uint32_t index = m_exprs.FindEntryIndexThatContains(addr);
+ if (index == UINT32_MAX) {
+ return llvm::createStringError(llvm::inconvertibleErrorCode(),
+ "No DWARF expression found for address 0x%llx", addr);
+ }
+
+ const Entry &entry = *m_exprs.GetEntryAtIndex(index);
+ return DWARFExpressionEntry{entry.base, entry.GetRangeEnd(), &entry.data};
+}
+
const DWARFExpression *
DWARFExpressionList::GetExpressionAtAddress(lldb::addr_t func_load_addr,
lldb::addr_t load_addr) const {
|
@adrian-prantl @JDevlieghere I’ve opened my first PR for the new DWARFExpression API. Looking forward to your feedback! |
lldb::addr_t base; | ||
lldb::addr_t end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these file or load addresses? If you use lldb::addr_t
(as opposed to the Address
class), it's always a good idea to make that part of the variable name (like you did for func_load_addr
) or at least add a Doxygen-style comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if you do want to use Address, there is already an AddressRange class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are file addresses (without offsets applied), I will try to make that more clear both in the comments and in the variable names. I'll also take a look into the AddressRange class and see if I can utilize it, thank you for the pointers.
uint32_t index = m_exprs.FindEntryIndexThatContains(addr); | ||
if (index == UINT32_MAX) { | ||
return llvm::createStringError(llvm::inconvertibleErrorCode(), | ||
"No DWARF expression found for address 0x%llx", addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally using Expected is desirable, especially if there is actionable information in the error message (and information that only this function can provide). Lookups are borderline cases. If it's expected during normal operation that client will call this API and the lookup fails, then I would return a std::optional
instead. The client can turn this into a more targeted error message if needed.
In short — I would return a std::optional here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will refactor it to return std::optional instead.
const Entry &entry = *m_exprs.GetEntryAtIndex(index); | ||
return DWARFExpressionEntry{entry.base, entry.GetRangeEnd(), &entry.data}; | ||
} | ||
|
||
const DWARFExpression * | ||
DWARFExpressionList::GetExpressionAtAddress(lldb::addr_t func_load_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: Do we still need this API, or is it just a subset of the function above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know the new function GetExpressionEntryAtAddress, does almost exact same thing as this one. I don't see this function being used anywhere else in the lldb code base, so maybe it can be removed, but to be safe I did not want to alter it. But from what I can tell, it should be okay to remove it. We can also choose to keep it and maybe refactor it to just call the GetExpressionEntryAtAddress and just return the DWARFExpression *expr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's completely unused, and there is no indication that was added in support of any downstream clients (either as a comment or in the commit message) we could consider deleting it in a separate PR. Not urgent.
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
updating code style for function GetExpressionEntryAtAddress Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention. Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage
Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.
|
||
/// Represents an entry in the DWARFExpressionList with all needed metadata. | ||
struct DWARFExpressionEntry { | ||
AddressRange file_range; /// Represents a DWARF location range in the DWARF unit’s file‐address space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either:
AddressRange file_range; ///< Represents a DWARF location range in the DWARF unit’s file‐address space.
or (preferred)
/// Represents a DWARF location range in the DWARF unit’s file‐address space.
AddressRange file_range;
func_load_addr = m_func_file_addr; | ||
lldb::addr_t file_pc = load_addr - func_load_addr + m_func_file_addr; | ||
|
||
uint32_t idx = m_exprs.FindEntryIndexThatContains(file_pc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use
const Entry *FindEntryThatContains(B addr) const {
```?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sure, seems like it would be easier for implementation, I was just following the same logic, as
const DWARFExpression *
DWARFExpressionList::GetExpressionAtAddress(lldb::addr_t func_load_addr,
lldb::addr_t load_addr) const ()
will update to use const Entry *FindEntryThatContains(B addr) const ()
instead
Updated commenting style for struct DWARFExpressionEntry
Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry> DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`
func_load_addr = m_func_file_addr; | ||
|
||
// translate to file-relative PC | ||
lldb::addr_t file_pc = load_addr - func_load_addr + m_func_file_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using subOverflow (https://llvm.org/doxygen/MathExtras_8h.html) here to guard against wrap-around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug into MathExtras and found that its AddOverflow and SubOverflow helpers are only defined for signed types. I they rely on two’s-complement signed overflow semantics and return the truncated signed result. Since lldb::addr_t is an unsigned address, I think using those helpers would either fail to compile or silently convert to signed semantics. So I tried to implement the signed overflow check into the function. Please let me know if you think it is alright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with under/overflow checks added.
Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>
This patch adds explicit checks: - ensure `load_addr >= func_load_addr` to avoid underflow, - compute and verify a temporary delta variable, then verify `delta + m_func_file_addr` does not exceed `addr_t` max to avoid overflow.
DWARFExpressionList::GetExpressionEntryAtAddress(lldb::addr_t func_load_addr, | ||
lldb::addr_t load_addr) const { | ||
if (const DWARFExpression *always = GetAlwaysValidExpr()) { | ||
AddressRange full_range(m_func_file_addr, /*size=*/LLDB_INVALID_ADDRESS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the argument is byte_size
so let's match that.
AddressRange full_range(m_func_file_addr, /*size=*/LLDB_INVALID_ADDRESS); | |
AddressRange full_range(m_func_file_addr, /*byte_size=*/LLDB_INVALID_ADDRESS); |
I'm somewhat skeptical of LLDB_INVALID_ADDRESS
as the size. Is there precedent for using that to indicate we don't know the size? I'm worried about whether the existing code is resilient to that. Would it make sense to make the AddressRange optional in the struct for the always-valid-single-expression case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe using an optional AddressRange
the way you said is a safer option and probably more clear for users, I have committed the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
✅ With the latest revision this PR passed the C/C++ code formatter. |
@UltimateForce21 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
This patch introduces a new struct and helper API in
DWARFExpressionList
to exposevariable location metadata (base address, end address, and DWARFExpression pointer) for
a given PC address. It will be used in later patches to annotate disassembly instructions
with source-level variable locations.
New struct
New API
Rationale
LLDB already provides:
However, this only returns the DWARF expression itself, without the file‐address start (base) and end (end) of the location range. Those bounds are crucial for:
Detecting range beginnings: render a var = annotation exactly when a variable’s live‐range starts.
Detecting range continuation: optionally display a “|” on subsequent instructions in the same range.
Detecting state changes: know when a variable moves (e.g. from one register to another), becomes a constant, or goes out of scope.
These primitives form the foundation for the Rich Disassembler feature proposed for GSoC 25.