-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[lldb][Formatters] Do not recursively dereference pointer type when creating formatter candicates list. #124048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…reating formatter candicates list.
@llvm/pr-subscribers-lldb Author: Zequan Wu (ZequanWu) ChangesCurrently, the type llvm-project/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp Lines 6927 to 6934 in 3ef90f8
The current behaviour might cause crash on pretty printers. If the pretty printer calls An example is the built-in pretty printer for libstdc++
After this change, data formatter for T can only be used for Full diff: https://github.com/llvm/llvm-project/pull/124048.diff 4 Files Affected:
diff --git a/lldb/include/lldb/DataFormatters/FormatManager.h b/lldb/include/lldb/DataFormatters/FormatManager.h
index db2fe99c44cafc..9329ee930125a6 100644
--- a/lldb/include/lldb/DataFormatters/FormatManager.h
+++ b/lldb/include/lldb/DataFormatters/FormatManager.h
@@ -163,7 +163,7 @@ class FormatManager : public IFormatChangeListener {
GetPossibleMatches(ValueObject &valobj, lldb::DynamicValueType use_dynamic) {
FormattersMatchVector matches;
GetPossibleMatches(valobj, valobj.GetCompilerType(), use_dynamic, matches,
- FormattersMatchCandidate::Flags(), true);
+ FormattersMatchCandidate::Flags(), true, true);
return matches;
}
@@ -180,6 +180,7 @@ class FormatManager : public IFormatChangeListener {
lldb::DynamicValueType use_dynamic,
FormattersMatchVector &entries,
FormattersMatchCandidate::Flags current_flags,
+ bool dereference_ptr = true,
bool root_level = false);
std::atomic<uint32_t> m_last_revision;
diff --git a/lldb/source/DataFormatters/FormatManager.cpp b/lldb/source/DataFormatters/FormatManager.cpp
index 3b891cecb1c8b9..d6d6935f3e002c 100644
--- a/lldb/source/DataFormatters/FormatManager.cpp
+++ b/lldb/source/DataFormatters/FormatManager.cpp
@@ -174,7 +174,8 @@ void FormatManager::DisableAllCategories() {
void FormatManager::GetPossibleMatches(
ValueObject &valobj, CompilerType compiler_type,
lldb::DynamicValueType use_dynamic, FormattersMatchVector &entries,
- FormattersMatchCandidate::Flags current_flags, bool root_level) {
+ FormattersMatchCandidate::Flags current_flags, bool dereference_ptr,
+ bool root_level) {
compiler_type = compiler_type.GetTypeForFormatters();
ConstString type_name(compiler_type.GetTypeName());
// A ValueObject that couldn't be made correctly won't necessarily have a
@@ -222,14 +223,15 @@ void FormatManager::GetPossibleMatches(
if (compiler_type.IsPointerType()) {
CompilerType non_ptr_type = compiler_type.GetPointeeType();
- GetPossibleMatches(valobj, non_ptr_type, use_dynamic, entries,
- current_flags.WithStrippedPointer());
+ if (dereference_ptr)
+ GetPossibleMatches(valobj, non_ptr_type, use_dynamic, entries,
+ current_flags.WithStrippedPointer(), false);
if (non_ptr_type.IsTypedefType()) {
CompilerType deffed_pointed_type =
non_ptr_type.GetTypedefedType().GetPointerType();
// this is not exactly the usual meaning of stripping typedefs
GetPossibleMatches(valobj, deffed_pointed_type, use_dynamic, entries,
- current_flags.WithStrippedTypedef());
+ current_flags.WithStrippedTypedef(), dereference_ptr);
}
}
diff --git a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py
index f70162bf285839..fea40252a2a75c 100644
--- a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py
+++ b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py
@@ -53,3 +53,15 @@ def cleanup():
# the match.
self.expect("frame variable y", substrs=["(Foo &", ") y = 0x", "IntLRef"])
self.expect("frame variable z", substrs=["(Foo &&", ") z = 0x", "IntRRef"])
+
+ # Test lldb doesn't dereference pointer more than once.
+ # xp has type Foo**, so it can only uses summary for Foo* or int*, not
+ # summary for Foo or int.
+ self.expect("frame variable xp", substrs=["(Foo **) xp = 0x", "IntPointer"])
+
+ self.runCmd('type summary delete "int *"')
+ self.runCmd('type summary add --cascade true -s "MyInt" "int"')
+ self.expect("frame variable xp", substrs=["(Foo **) xp = 0x"])
+
+ self.runCmd('type summary add --cascade true -s "FooPointer" "Foo *"')
+ self.expect("frame variable xp", substrs=["(Foo **) xp = 0x", "FooPointer"])
diff --git a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp
index 13102106551086..41b1d282344b91 100644
--- a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp
+++ b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp
@@ -5,6 +5,9 @@ int main() {
Foo* x = &lval;
Foo& y = lval;
Foo&& z = 1;
+
+ // Test lldb doesn't dereference pointer more than once.
+ Foo** xp = &x;
return 0; // Set breakpoint here
}
|
You can test this locally with the following command:darker --check --diff -r dd860bcfb57df429c0a1ad2e2d869ff3b795bc4d...83197bd326521352d145e4ef64edee7e5eaf41b5 lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py View the diff from darker here.--- TestPtrRef2Typedef.py 2025-01-23 02:32:11.000000 +0000
+++ TestPtrRef2Typedef.py 2025-01-23 18:45:48.665024 +0000
@@ -53,11 +53,11 @@
# the match.
self.expect("frame variable y", substrs=["(Foo &", ") y = 0x", "IntLRef"])
self.expect("frame variable z", substrs=["(Foo &&", ") z = 0x", "IntRRef"])
# Test lldb doesn't dereference pointer more than once.
- # xp has type Foo**, so it can only uses summary for Foo* or int*, not
+ # xp has type Foo**, so it can only uses summary for Foo* or int*, not
# summary for Foo or int.
self.expect("frame variable xp", substrs=["(Foo **) xp = 0x", "IntPointer"])
self.runCmd('type summary delete "int *"')
self.runCmd('type summary add --cascade true -s "MyInt" "int"')
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, but I'd like to hear what @jimingham thinks about this.
The current behaviour might cause crash on pretty printers. If the pretty printer calls SBValue::GetChildMemberWithName when compute the synthetic children or summary string and its type is T**, this might cause crash as it will return nullptr or None in this case
The code you linked to is problematic even with this fix. The returned value could be null due to bad/missing debug info or due to a new version of libstdc++ renaming the (private) member. That code should be using GetChildAtNamePath
, which checks that all the intermediate nodes are present.
How does this patch fit with the:
settings in And if a formatter says it can handle pointers, it's up to the formatter to do that right. |
This has no impact for reference. For pointers, if |
If you write the formatter knowing that you've asked for a match against all the pointers to this type, it's up to the formatter to not assume it's only being passed T or T*. I still don't see why we should be cutting this off in lldb. If I have a std::vector variable, I want to see its size in the variable printing, regardless of whether the handle to it is a pointer, or a pointer to a pointer. It seems artificial to cut that off for me. |
If anything, we should allow |
The reason that we should only dereference once is that many formatters relies on For example, when using the builtin libc++ std::vector formatter, lldb shows inconsistent size count for
If we really want to have the formatter apply to all pointer types (no matter how many layers there are), it can be achieved with regex like |
It still sounds like this is a change to accommodate formatters that aren't correctly written. Regex formatters are slower than name match ones and more fallible to write, so I'm not sure "you can write a regex instead" is a good substitute. |
It isn't that hard to have your formatter dereference the ValueObject it was provided till it's of the form that it wants to deal with (type or pointer) and then pass that to the main formatter. If the objection is that's a pain to do for every formatter, we could have the type matcher do the dereferencing and always pass the formatter the ValueObject that is the type they registered the formatter for. This just seems like a really arbitrary restriction, not particularly user-friendly, and is being imposed just to work around formatter bugs, which doesn't strike me as a good motivation. |
Do you mean pass the formatter the ValueObject with type |
@labath What do you think? The solution above will still use the formatters for |
I still feel like we should offer options here. If I as a formatter writer am interested in handling all the pointer types of the type I've registered my formatter for, I should be allowed to do that. If I say, my formatter is valid for all the pointer types as well as the type I've registered for, but I only want to write the formatter to handle Values of the original type, we could offer to do the dereferencing for them, and then pass the formatter in. And of course, if you only want to be asked about values of the type you registered your formatter for, you can do that already. |
Before I answer that, I want to go back to the question of dereferencing. While I don't think that skipping of arbitrary numbers of points is wrong (it's consistent and easy to explain), I also don't find it useful. One level -- sure, Even though I don't think it should be the default, I can definitely see reasons for why one would want to see the contents of I don't think this argument holds for more than one level of pointers. If I have a
Even if the users really does want to see the final object, they can still do that even if we don't dereference all the levels for them. It just means they need to do that themselves. As for how does this fit in with documented behavior, I don't see much of a problem with that. It's true that you could interpret In addition to that, the Another factor, which I think supports the fact that skipping all levels isn't the right default is that ~all of our pretty printers are currently broken for multiple pointer levels:
If this was one incident, you could dismiss it as a buggy formatter, but since it affects all of them, I think this points to a deeper problem. We can fix all of our formatters to dereference all pointers, but that won't change all the pretty printers out there, and I'm sure that most of them have the same bug. We could try to fix all of them by dereferencing the value before handing it off to the formatter, but I don't think that's useful. The fact that we didn't notice this until now tells me users don't have that many double pointers floating around, and so they likely will not complain if we stop printing this output (which was incorrect anyway). TL:DR: I think it would be better to skip just one level of pointers. If someone really wants to format an arbitrary number of pointers, they can always use a regex to register a pretty printer for |
I think that argument has some force for synthetic child providers, but less so for summaries. I'd like to see the size of a std::vector regardless of whether I'm holding it as a *, **, etc. |
If I was holding a vector as a It's true that a summary provider is less intrusive (it doesn't override existing children -- though it can hide them!), than a synthetic child provider, but I think it would be very confusing to have different behavior here. The thing I like about doing this for one level is that (as Zequan points out) it nicely matches what our SBValue API does, which makes most formatters "just work". |
On Jan 27, 2025, at 5:30 AM, Pavel Labath ***@***.***> wrote:
If I was holding a vector as a vector**, then I would would to see its summary/size as well. And maybe even the children too. But the thing I questioning is how often does one hold a C++ container as a double pointer. Is there like some common programming pattern that produces those? The only time I've ever done that is when I was experimenting with how "frame variable" handles pointers and references. I have seen and worked with double pointers, but this was always in some kind of legacy/C context. A combination of double pointers and a std::vector (which essentially makes it a triple pointer) would be very.. eclectic here. Modern C++ would likely turn at least one of those pointers into an iterator, or a smart pointer, or a reference..
If this were super difficult, I'd be uncomfortable about telling other people that their usages were too esoteric to support but willing to do so as a tradeoff. But I don't think it's a good idea to do that unless we have to. After all, you're basing the argument on common uses of vectors, but really it should be about "the usage of any data type someone might have pointers to, and want to see the summary of." We're providing the tools, not the uses.
It's true that a summary provider is less intrusive (it doesn't override existing children -- though it can hide them!), than a synthetic child provider, but I think it would be very confusing to have different behavior here. The thing I like about doing this for one level is that (as Zequan points out) it nicely matches what our SBValue API does, which makes most formatters "just work".
I still don't see the force of this argument, it's just arguing by analogy. But given one can write a regex matcher to implement more general formatters, so long as the limitations of the normal formatter matching is clearly spelled out, I'll go along with whatever the consensus is here.
Jim
… —
Reply to this email directly, view it on GitHub <#124048 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW77ZSDPBSJIY5Y7S7T2MYYG7AVCNFSM6AAAAABVWKKJIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJVG43DQNRQGA>.
You are receiving this because you were mentioned.
|
I'm fine with either this change or a change to add |
I'd rather keep the flexibility, if you don't mind doing the work to add that. For now, it seems okay for the current libc++/libstdc++ ones to declare what they support (which is 1 pointer depth). If someone has a reason to come back and extend these ones in the future, that's fine but a separate piece of work.
Jim
… On Jan 27, 2025, at 2:32 PM, Zequan Wu ***@***.***> wrote:
I'm fine with either this change or a change to add --skip-pointers [N] option to let users specify how many layer of pointers they want it matches to. The latter is more flexible. Then the question is if we should use --skip-pointers 1 (which means only dereference one layer pointer) for lldb's built-in libc++/libstdc++ pretty printers. I think we should do so and make that default.
—
Reply to this email directly, view it on GitHub <#124048 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW6LAMB7JGPL5EYY75L2M2XZPAVCNFSM6AAAAABVWKKJIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJXGAZTOMBYGU>.
You are receiving this because you were mentioned.
|
After rethinking about the option, it will cause some compatibility issues. Currently, --skip-pointers is boolean option and python formatters use the flag lldb.eTypeOptionSkipPointers to disable/enable it. If we switch it to accept a numeric number, we probably need a new SBAPI to specify the parameter, and this will break existing users who use the |
|
It's not super-difficult, but it's not free either. In particular, any user-facing API we add to support this use case is something that we may have to live with for a very long time. If the uses are there, then so be it, but I don't think that's the case, which is why I'm asking if you (or anyone) knows of something like that (and also providing some reasons as to why I think it's unlikely to exist).
It's not a killer argument for sure, but it's nice when things are consistent. I think this behavior is nice and consistent because the pretty printer does not have to worry about the kind of object it gets. It can just call
I don't know how easy would it be, but another option would be to keep a single option, but treat |
I find this is tricky because all the flags for data formatters are set with |
If you want to preserve the current behavior, then CreateWithSummaryString should set the summary up to recurse infinitely. Then people who wanted new behavior would have to call SetPointerMatchDepth. Since that's new behavior, it's fine that the old API's don't support it. We allow overloads that are going to be unambiguous on the Python side - so we could also make a version of the CreateSummary API's that take an extra depth parameter, you wouldn't have to add the SetDepth API. I think it would be weird to change these defining features of the summary after creating it, so I'd vote for that over a SetDepth call. TTTT, whenever we've made an APi that takes some set of options in the SB API's and we've done it with some bools or some flags, we always end up regretting it. We have an SBTypeSummaryOptions, maybe we should add the flags ivar to that and some API's to set it's various bits, and add the depth to that, and provide versions of the API that take this options object. But that's a bunch more work you maybe haven't signed up for... In which case, adding the depth variant would be fine too. |
…add` command. (#138209) Currently, the type `T`'s summary formatter will be matched for `T`, `T*`, `T**` and so on. This is unexpected in many data formatters. Such unhandled cases could cause the data formatter to crash. An example would be the lldb's built-in data formatter for `std::optional`: ``` $ cat main.cpp #include <optional> int main() { std::optional<int> o_null; auto po_null = &o_null; auto ppo_null = &po_null; auto pppo_null = &ppo_null; return 0; } $ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null" [lldb crash] ``` This change adds an options `--pointer-match-depth` to `type summary add` command to allow users to specify how many layer of pointers can be dereferenced at most when matching a summary formatter of type `T`, as Jim suggested [here](#124048). By default, this option has value 1 which means summary formatter for `T` could also be used for `T*` but not `T**` nor beyond. This option is no-op when `--skip-pointers` is set as well. I didn't add such option for `type synthetic add`, `type format add`, `type filter add`, because it useful for those command. Instead, they all have the pointer match depth of 1. When printing a type `T*`, lldb never print the children of `T` even if there is a synthetic formatter registered for `T`.
Closed this as #138209 is preferred. |
…pe summary add` command. (#138209) Currently, the type `T`'s summary formatter will be matched for `T`, `T*`, `T**` and so on. This is unexpected in many data formatters. Such unhandled cases could cause the data formatter to crash. An example would be the lldb's built-in data formatter for `std::optional`: ``` $ cat main.cpp #include <optional> int main() { std::optional<int> o_null; auto po_null = &o_null; auto ppo_null = &po_null; auto pppo_null = &ppo_null; return 0; } $ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null" [lldb crash] ``` This change adds an options `--pointer-match-depth` to `type summary add` command to allow users to specify how many layer of pointers can be dereferenced at most when matching a summary formatter of type `T`, as Jim suggested [here](llvm/llvm-project#124048). By default, this option has value 1 which means summary formatter for `T` could also be used for `T*` but not `T**` nor beyond. This option is no-op when `--skip-pointers` is set as well. I didn't add such option for `type synthetic add`, `type format add`, `type filter add`, because it useful for those command. Instead, they all have the pointer match depth of 1. When printing a type `T*`, lldb never print the children of `T` even if there is a synthetic formatter registered for `T`.
…add` command. (llvm#138209) Currently, the type `T`'s summary formatter will be matched for `T`, `T*`, `T**` and so on. This is unexpected in many data formatters. Such unhandled cases could cause the data formatter to crash. An example would be the lldb's built-in data formatter for `std::optional`: ``` $ cat main.cpp #include <optional> int main() { std::optional<int> o_null; auto po_null = &o_null; auto ppo_null = &po_null; auto pppo_null = &ppo_null; return 0; } $ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null" [lldb crash] ``` This change adds an options `--pointer-match-depth` to `type summary add` command to allow users to specify how many layer of pointers can be dereferenced at most when matching a summary formatter of type `T`, as Jim suggested [here](llvm#124048). By default, this option has value 1 which means summary formatter for `T` could also be used for `T*` but not `T**` nor beyond. This option is no-op when `--skip-pointers` is set as well. I didn't add such option for `type synthetic add`, `type format add`, `type filter add`, because it useful for those command. Instead, they all have the pointer match depth of 1. When printing a type `T*`, lldb never print the children of `T` even if there is a synthetic formatter registered for `T`.
…add` command. (llvm#138209) Currently, the type `T`'s summary formatter will be matched for `T`, `T*`, `T**` and so on. This is unexpected in many data formatters. Such unhandled cases could cause the data formatter to crash. An example would be the lldb's built-in data formatter for `std::optional`: ``` $ cat main.cpp #include <optional> int main() { std::optional<int> o_null; auto po_null = &o_null; auto ppo_null = &po_null; auto pppo_null = &ppo_null; return 0; } $ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null" [lldb crash] ``` This change adds an options `--pointer-match-depth` to `type summary add` command to allow users to specify how many layer of pointers can be dereferenced at most when matching a summary formatter of type `T`, as Jim suggested [here](llvm#124048). By default, this option has value 1 which means summary formatter for `T` could also be used for `T*` but not `T**` nor beyond. This option is no-op when `--skip-pointers` is set as well. I didn't add such option for `type synthetic add`, `type format add`, `type filter add`, because it useful for those command. Instead, they all have the pointer match depth of 1. When printing a type `T*`, lldb never print the children of `T` even if there is a synthetic formatter registered for `T`.
Currently, the type
T
's formatter will be matched forT
,T*
,T**
and so on. This is inconsistent with the behaviour ofSBValue::GetChildMemberWithName
which can only dereference the pointer type at most once if the sbvalue is a pointer type, because the API eventually callsTypeSystemClang::GetIndexOfChildMemberWithName
(llvm-project/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
Lines 6927 to 6934 in 3ef90f8
The current behaviour might cause crash on pretty printers. If the pretty printer calls
SBValue::GetChildMemberWithName
when compute the synthetic children or summary string and its type isT**
, this might cause crash as it will return nullptr or None in this case.An example is the built-in pretty printer for libstdc++
std::optional
when it callsGetChildMemberWithName
on a nullptr returned from the previous call toGetChildMemberWithName
(llvm-project/lldb/source/Plugins/Language/CPlusPlus/GenericOptional.cpp
Lines 74 to 75 in 3ef90f8
After this change, data formatter for T can only be used for
T
andT*
(if skip pointer is set to false).