-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summarize std::string's when created from data. #89110
base: main
Are you sure you want to change the base?
Summarize std::string's when created from data. #89110
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lldb Author: Jacob Lalonde (Jlalond) Changesstd::string children are normally formatted as their summary [my_string_prop] = "hello world". Sbvalues/ValueObjects created from data do not follow the same summary formatter path and instead hit the value object printer. This change fixes that and adds tests for the future. Added tests in TestDataFormatterStdString.py and a formatter for a custom struct. Full diff: https://github.com/llvm/llvm-project/pull/89110.diff 4 Files Affected:
diff --git a/lldb/source/Plugins/Language/CPlusPlus/LibStdcpp.cpp b/lldb/source/Plugins/Language/CPlusPlus/LibStdcpp.cpp
index 86bb575af5ca34..0da01e9ca07fb6 100644
--- a/lldb/source/Plugins/Language/CPlusPlus/LibStdcpp.cpp
+++ b/lldb/source/Plugins/Language/CPlusPlus/LibStdcpp.cpp
@@ -254,13 +254,13 @@ bool lldb_private::formatters::LibStdcppStringSummaryProvider(
} else
addr_of_string =
valobj.GetAddressOf(scalar_is_load_addr, &addr_type);
- if (addr_of_string != LLDB_INVALID_ADDRESS) {
+ if (addr_of_string != LLDB_INVALID_ADDRESS ||
+ addr_type == eAddressTypeHost) {
switch (addr_type) {
case eAddressTypeLoad: {
ProcessSP process_sp(valobj.GetProcessSP());
if (!process_sp)
return false;
-
StringPrinter::ReadStringAndDumpToStreamOptions options(valobj);
Status error;
lldb::addr_t addr_of_data =
@@ -287,8 +287,31 @@ bool lldb_private::formatters::LibStdcppStringSummaryProvider(
} else
return true;
} break;
- case eAddressTypeHost:
- break;
+ case eAddressTypeHost: {
+ // We have the host address of our std::string
+ // But we need to read the pointee data from the debugged process.
+ ProcessSP process_sp(valobj.GetProcessSP());
+ DataExtractor data;
+ Status error;
+ valobj.GetData(data, error);
+ if (error.Fail())
+ return false;
+ // We want to read the address from std::string, which is the first 8 bytes.
+ lldb::offset_t offset = 0;
+ lldb::addr_t addr = data.GetAddress(&offset);
+ if (!addr)
+ {
+ stream.Printf("nullptr");
+ return true;
+ }
+
+ std::string contents;
+ process_sp->ReadCStringFromMemory(addr, contents, error);
+ if (error.Fail())
+ return false;
+ stream.Printf("%s", contents.c_str());
+ return true;
+ } break;
case eAddressTypeInvalid:
case eAddressTypeFile:
break;
diff --git a/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/ConvertToDataFormatter.py b/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/ConvertToDataFormatter.py
new file mode 100644
index 00000000000000..57e42c920f8294
--- /dev/null
+++ b/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/ConvertToDataFormatter.py
@@ -0,0 +1,50 @@
+# coding=utf8
+"""
+Helper formmater to verify Std::String by created via SBData
+"""
+
+import lldb
+
+class SyntheticFormatter:
+ def __init__(self, valobj, dict):
+ self.valobj = valobj
+
+ def num_children(self):
+ return 6
+
+ def has_children(self):
+ return True
+
+ def get_child_at_index(self, index):
+ name = None
+ match index:
+ case 0:
+ name = "short_string"
+ case 1:
+ name = "long_string"
+ case 2:
+ name = "short_string_ptr"
+ case 3:
+ name = "long_string_ptr"
+ case 4:
+ name = "short_string_ref"
+ case 5:
+ name = "long_string_ref"
+ case _:
+ return None
+
+ child = self.valobj.GetChildMemberWithName(name)
+ valType = child.GetType()
+ return self.valobj.CreateValueFromData(name,
+ child.GetData(),
+ valType)
+
+
+def __lldb_init_module(debugger, dict):
+ typeName = "string_container"
+ debugger.HandleCommand(
+ 'type synthetic add -x "'
+ + typeName
+ + '" --python-class '
+ + f"{__name__}.SyntheticFormatter"
+ )
diff --git a/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/TestDataFormatterStdString.py b/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/TestDataFormatterStdString.py
index 0b78fb7dc3911c..2adfe47f9a3cfd 100644
--- a/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/TestDataFormatterStdString.py
+++ b/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/TestDataFormatterStdString.py
@@ -37,6 +37,8 @@ def test_with_run_command(self):
substrs=["stopped", "stop reason = breakpoint"],
)
+ self.runCmd("command script import ./ConvertToDataFormatter.py", check=True)
+
# This is the function to remove the custom formats in order to have a
# clean slate for the next test case.
def cleanup():
@@ -60,6 +62,7 @@ def cleanup():
var_rQ = self.frame().FindVariable("rQ")
var_pq = self.frame().FindVariable("pq")
var_pQ = self.frame().FindVariable("pQ")
+ var_str_container = self.frame().FindVariable("sc")
self.assertEqual(var_wempty.GetSummary(), 'L""', "wempty summary wrong")
self.assertEqual(
@@ -89,6 +92,31 @@ def cleanup():
"pQ summary wrong",
)
+ self.assertEqual(
+ var_str_container.GetChildAtIndex(0).GetSummary(),
+ '"u22"',
+ "string container child wrong")
+ self.assertEqual(
+ var_str_container.GetChildAtIndex(1).GetSummary(),
+ '"quite a long std::string with lots of info inside it inside a struct"',
+ "string container child wrong")
+ self.assertEqual(
+ var_str_container.GetChildAtIndex(2).GetSummary(),
+ '"u22"',
+ "string container child wrong")
+ self.assertEqual(
+ var_str_container.GetChildAtIndex(3).GetSummary(),
+ '"quite a long std::string with lots of info inside it inside a struct"',
+ "string container child wrong")
+ self.assertEqual(
+ var_str_container.GetChildAtIndex(4).GetSummary(),
+ '"u22"',
+ "string container child wrong")
+ self.assertEqual(
+ var_str_container.GetChildAtIndex(5).GetSummary(),
+ '"quite a long std::string with lots of info inside it inside a struct"',
+ "string container child wrong")
+
self.runCmd("next")
self.assertEqual(var_S.GetSummary(), 'L"!!!!!"', "new S summary wrong")
diff --git a/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/main.cpp b/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/main.cpp
index eefb96c4573e4e..b169a2e6945568 100644
--- a/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/main.cpp
+++ b/lldb/test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/string/main.cpp
@@ -1,5 +1,14 @@
#include <string>
+struct string_container {
+ std::string short_string;
+ std::string long_string;
+ std::string *short_string_ptr = &short_string;
+ std::string *long_string_ptr = &long_string;
+ std::string &short_string_ref = short_string;
+ std::string &long_string_ref = long_string;
+};
+
int main()
{
std::wstring wempty(L"");
@@ -11,6 +20,10 @@ int main()
std::string Q("quite a long std::strin with lots of info inside it");
auto &rq = q, &rQ = Q;
std::string *pq = &q, *pQ = &Q;
+ string_container sc;
+ sc.short_string = "u22";
+ sc.long_string = "quite a long std::string with lots of info inside it inside a struct";
+
S.assign(L"!!!!!"); // Set break point at this line.
return 0;
}
|
@@ -254,13 +254,13 @@ bool lldb_private::formatters::LibStdcppStringSummaryProvider( | |||
} else | |||
addr_of_string = | |||
valobj.GetAddressOf(scalar_is_load_addr, &addr_type); | |||
if (addr_of_string != LLDB_INVALID_ADDRESS) { | |||
if (addr_of_string != LLDB_INVALID_ADDRESS || | |||
addr_type == eAddressTypeHost) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this since we iterate all cases anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the switch statement is within the if block, if we don't add this exception we will return false and fall back to the ValueObjectPrinter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh wow, so for this specific situation we will have addr_of_string == LLDB_INVALID_ADDRESS
?
Would you mind adding a comment for that?
Most of this change makes sense. However, you are assuming that if the address type of a ValueObject is Host, its children are necessarily going to be in the target. But there's actually an API that answers that question: ValueObject::GetAddressTypeOfChildren. You should consult that rather than assuming the children live in the target. |
What should we do if the child address type is File? I don't know how that could ever happen, and should we support it? Because currently we didn't support summarizing host or file. |
@@ -37,6 +37,8 @@ def test_with_run_command(self): | |||
substrs=["stopped", "stop reason = breakpoint"], | |||
) | |||
|
|||
self.runCmd("command script import ./ConvertToDataFormatter.py", check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of changing existing test method to test create from data case, let's add a new test method instead so that both default and create from data formatters are tested.
It doesn't look like C++ allows you to make |
if (addr_of_string != LLDB_INVALID_ADDRESS || | ||
addr_type == eAddressTypeHost) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does addr_of_string = valobj.GetAddressOf(scalar_is_load_addr, &addr_type);
return eAddressTypeHost
with addr_of_string
being set to LLDB_INVALID_ADDRESS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That I don't know. ValueObject does explicitly return LLDB_INVALID_ADDRESS
when addressType == eAddressTypeHost
. I thought this was weird and potentially returning the address of the in memory buffer made more sense, but that seemed like a major refactor for a minor string issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be more careful here. GetAddressOf is really meant to do "can you find an address in the target for this object". We use it that way in a whole bunch of places, e.g.:
cstr_address = GetAddressOf(true, &cstr_address_type);
} else {
// We have a pointer
cstr_address = GetPointerValue(&cstr_address_type);
}
if (cstr_address == 0 || cstr_address == LLDB_INVALID_ADDRESS) {
if (cstr_address_type == eAddressTypeHost && is_array) {
const char *cstr = GetDataExtractor().PeekCStr(0);
So in that case we are expecting a host address type to return an invalid address from GetAddressOf.
This change worries me, I don't think it will be what other code expects? Maybe you can get the value you need like the code above does instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could take the approach you provided. Greg and I originally talked and mentioned how GetAddressOf
will return the pointer within a buffer in certain cases. I'm not enough of an expert here to have strong opinions but I think we're in a confusing middleground, where GetAddressOf works for Load addresses, but fails for a host address even if we have a pointer internally to a data buffer.
I think solving that is more fundamental to my (2nd?) PR in lldb, but if we can all agree on the returning invaild for host addresses. I'm okay with that, as long as we also add a comment/documentation to the call that HostAddress addresses are invalid and to instead use a data extractor. My only opinion on that as a new contributor is the API to work with ValueObject's data is clunky, because I can seemingly only construct a copy via GetData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be more careful here. GetAddressOf is really meant to do "can you find an address in the target for this object". We use it that way in a whole bunch of places, e.g.:
cstr_address = GetAddressOf(true, &cstr_address_type); } else { // We have a pointer cstr_address = GetPointerValue(&cstr_address_type); } if (cstr_address == 0 || cstr_address == LLDB_INVALID_ADDRESS) { if (cstr_address_type == eAddressTypeHost && is_array) { const char *cstr = GetDataExtractor().PeekCStr(0);
So in that case we are expecting a host address type to return an invalid address from GetAddressOf.
This change worries me, I don't think it will be what other code expects? Maybe you can get the value you need like the code above does instead?
The code above works, but I think it doesn't total sense from a readability standpoint. We have a GetAddressOf
function that return the address kind and a value, I am not sure why we wouldn't want to take advantage of that and use it? The above code is a bit interesting as it makes a bunch of assumptions like "is this a host address and if this is an array, I know my data is in the data extractor at offset zero". I mean it works, but does it make sense to duplicate these kind of assumptions?
Make sense to fix GetAddressOf to take advantage of the API it is implementing. If the address kind can be host and we can return a valid host address value, I would say we use it. We will need to look over all uses of this API internally if we do change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine too, but you have to be careful here since getting host vrs. target address attribution wrong is one quick way to crash lldb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to fix GetAddressOf to take advantage of the API it is implementing. If the address kind can be host and we can return a valid host address value, I would say we use it. We will need to look over all uses of this API internally if we do change it.
If we decide to go forward with this refactor, I think we should probably split this out into it's own independent PR and put this one on pause. As Jim mentioned above there are currently places that make assumptions based on the value object being host and getting back an invalid address, and we would need to correct for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to fix GetAddressOf to take advantage of the API it is implementing. If the address kind can be host and we can return a valid host address value, I would say we use it. We will need to look over all uses of this API internally if we do change it.
If we decide to go forward with this refactor, I think we should probably split this out into it's own independent PR and put this one on pause. As Jim mentioned above there are currently places that make assumptions based on the value object being host and getting back an invalid address, and we would need to correct for that.
I would be interesting to see how many places do this kind of pattern. If it is only a few locations, then yes a quick PR to enable this correctly would be nice, but if it is many, we can just copy the code that Jim mentions to work around the issue.
if (child_addressType == eAddressTypeLoad) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The {
goes at the end of the if
line per llvm coding guidelines
if (!addr) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The {
goes at the end of the if
line per llvm coding guidelines
if (child_addressType == eAddressTypeHost) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The {
goes at the end of the if
line per llvm coding guidelines
DataExtractor data; | ||
Status error; | ||
valobj.GetData(data, error); | ||
if (error.Fail()) | ||
return false; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the code that detects the short string optimization? I don't see it here anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't short string optimization be covered by the check at the address type of children (Line 303)? If the children are host we directly read the std::string from the data extractor
if (!dataStart) | ||
return false; | ||
|
||
const std::string* str = static_cast<const std::string*>(dataStart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't assume that the host std::string will match the debugee std::string. If we are remote debugging to another architecture, then we will have problems.
return true; | ||
} | ||
|
||
if (child_addressType == eAddressTypeHost) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is never being run because no one will set the child address type to host memory.
✅ With the latest revision this PR passed the C/C++ code formatter. |
✅ With the latest revision this PR passed the Python code formatter. |
…ould not use the built in summary provider, but default to the valueobjectprinter std::string children are normally formatted as their summary [my_string_prop] = "hello world". Sbvalues created from data do not follow the same summary formatter path and instead hit the value object printer. This change fixes that and adds tests for the future. Added tests in TestDataFormatterStdString.py and a formatter for a custom struct.
…o the test class to reduce code duplication
…ge libstdcpp to dereference that address to get char* for std::string when formatting eAddressTypeHost
29b55fc
to
afc30d4
Compare
@jimingham it turned out when created by data, the value object's address points to it's buffer which itself contains the char* of the string. I'm not sure how we can support the small string optimization if this is the case, my testing was against a small string "test2", and it was always saved as a char* for the value object. I think for this PR we can probably ignore it and open an issue to support the small string optimization. Please review when you have time |
#include "lldb/Utility/Endian.h" | ||
#include "lldb/Utility/Status.h" | ||
#include "lldb/Utility/Stream.h" | ||
#include "lldb/lldb-types.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually put the top level lldb includes first in the lldb includes list.
<= GetCompilerType().GetByteSize(exe_scope) | ||
|| m_value.GetValueType() == Value::ValueType::LoadAddress) | ||
return DoCast(compiler_type); | ||
if (compiler_type.GetByteSize(exe_scope) <= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a reformatting, right? We try to only clang-format the code the PR actually changes, (using git-clang-format) as it makes reviews easier. Also clang-format changes its mind about how to format over time, and so doing other code introduces spurious diffs that make archeology harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah I ran git-clang-format
against this.
Specifically, git fetch upstream; git-clang-format upstream/main
and it produced this. I'll manually revert this one
@@ -0,0 +1,49 @@ | |||
# coding=utf8 | |||
""" | |||
Helper formmater to verify Std::String by created via SBData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formmater -> formatter
@@ -17,6 +17,15 @@ def setUp(self): | |||
# Find the line number to break at. | |||
self.line = line_number("main.cpp", "// Set break point at this line.") | |||
|
|||
# This is the function to remove the custom formats in order to have a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make all your custom summaries in a category (e.g. pass -w SOME_UNIQUE_NAME
then you can delete just the category with type category delete
. That way you know you are only deleting what you added.
Categories come into being disabled, so if you do it this way, you need to add:
type category enable SOME_UNIQUE_NAME
to your __lldb_init_module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see you didn't originally do that, so don't feel required to change what was there just moved... But it's still the better way to do it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already old enough that we can lump it in!
We should definitely put more comments in the ValueObject.h header, a lot of those interfaces are mysterious. GetAddressOf is solving a very particular problem, which is that if someone does:
People really want that to work, but $0 is a "constant result" so where it lives is not clear. It's also important for data formatters that want to use a pointer to some ValueObject or ValueObject child. For expr results we make a shadow copy in the target so we can resolve this issue (this is one of the jobs of the So it is really for finding addresses in the target, and should fail if we can't realize that. It's other job is if an expression result is a pointer to a variable in the target, you want to be able to use |
std::string children are normally formatted as their summary [my_string_prop] = "hello world". Sbvalues/ValueObjects created from data do not follow the same summary formatter path and instead hit the value object printer. This change fixes that and adds tests for the future.
Example:
Will now summarize like all other strings to "test3"
Added tests in TestDataFormatterStdString.py and a formatter for a custom struct.