Skip to content

[lldb][Formatters] Do not recursively dereference pointer type when creating formatter candicates list. #124048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ZequanWu
Copy link
Contributor

Currently, the type T's formatter will be matched for T, T*, T** and so on. This is inconsistent with the behaviour of SBValue::GetChildMemberWithName which can only dereference the pointer type at most once if the sbvalue is a pointer type, because the API eventually calls TypeSystemClang::GetIndexOfChildMemberWithName (

case clang::Type::Pointer: {
CompilerType pointee_clang_type(GetPointeeType(type));
if (pointee_clang_type.IsAggregateType()) {
return pointee_clang_type.GetIndexOfChildMemberWithName(
name, omit_empty_base_classes, child_indexes);
}
} break;
) for C/C++.

The current behaviour might cause crash on pretty printers. If the pretty printer calls SBValue::GetChildMemberWithName when compute the synthetic children or summary string and its type is T**, this might cause crash as it will return nullptr or None in this case.

An example is the built-in pretty printer for libstdc++ std::optional when it calls GetChildMemberWithName on a nullptr returned from the previous call to GetChildMemberWithName (

engaged_sp = m_backend.GetChildMemberWithName("_M_payload")
->GetChildMemberWithName("_M_engaged");
):

$ cat main.cpp
#include <optional>

int main() {
  std::optional<int> o_null;
  auto po_null = &o_null;
  auto ppo_null = &po_null;
  auto pppo_null = &ppo_null;
  return 0;
}
$ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null"
[lldb crash]

After this change, data formatter for T can only be used for T and T* (if skip pointer is set to false).

@ZequanWu ZequanWu requested a review from labath January 23, 2025 02:53
@llvmbot llvmbot added the lldb label Jan 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-lldb

Author: Zequan Wu (ZequanWu)

Changes

Currently, the type T's formatter will be matched for T, T*, T** and so on. This is inconsistent with the behaviour of SBValue::GetChildMemberWithName which can only dereference the pointer type at most once if the sbvalue is a pointer type, because the API eventually calls TypeSystemClang::GetIndexOfChildMemberWithName (

case clang::Type::Pointer: {
CompilerType pointee_clang_type(GetPointeeType(type));
if (pointee_clang_type.IsAggregateType()) {
return pointee_clang_type.GetIndexOfChildMemberWithName(
name, omit_empty_base_classes, child_indexes);
}
} break;
) for C/C++.

The current behaviour might cause crash on pretty printers. If the pretty printer calls SBValue::GetChildMemberWithName when compute the synthetic children or summary string and its type is T**, this might cause crash as it will return nullptr or None in this case.

An example is the built-in pretty printer for libstdc++ std::optional when it calls GetChildMemberWithName on a nullptr returned from the previous call to GetChildMemberWithName (

engaged_sp = m_backend.GetChildMemberWithName("_M_payload")
->GetChildMemberWithName("_M_engaged");
):

$ cat main.cpp
#include &lt;optional&gt;

int main() {
  std::optional&lt;int&gt; o_null;
  auto po_null = &amp;o_null;
  auto ppo_null = &amp;po_null;
  auto pppo_null = &amp;ppo_null;
  return 0;
}
$ clang++ -g main.cpp &amp;&amp; lldb -o "b 8" -o "r" -o "v pppo_null"
[lldb crash]

After this change, data formatter for T can only be used for T and T* (if skip pointer is set to false).


Full diff: https://github.com/llvm/llvm-project/pull/124048.diff

4 Files Affected:

  • (modified) lldb/include/lldb/DataFormatters/FormatManager.h (+2-1)
  • (modified) lldb/source/DataFormatters/FormatManager.cpp (+6-4)
  • (modified) lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py (+12)
  • (modified) lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp (+3)
diff --git a/lldb/include/lldb/DataFormatters/FormatManager.h b/lldb/include/lldb/DataFormatters/FormatManager.h
index db2fe99c44cafc..9329ee930125a6 100644
--- a/lldb/include/lldb/DataFormatters/FormatManager.h
+++ b/lldb/include/lldb/DataFormatters/FormatManager.h
@@ -163,7 +163,7 @@ class FormatManager : public IFormatChangeListener {
   GetPossibleMatches(ValueObject &valobj, lldb::DynamicValueType use_dynamic) {
     FormattersMatchVector matches;
     GetPossibleMatches(valobj, valobj.GetCompilerType(), use_dynamic, matches,
-                       FormattersMatchCandidate::Flags(), true);
+                       FormattersMatchCandidate::Flags(), true, true);
     return matches;
   }
 
@@ -180,6 +180,7 @@ class FormatManager : public IFormatChangeListener {
                                  lldb::DynamicValueType use_dynamic,
                                  FormattersMatchVector &entries,
                                  FormattersMatchCandidate::Flags current_flags,
+                                 bool dereference_ptr = true,
                                  bool root_level = false);
 
   std::atomic<uint32_t> m_last_revision;
diff --git a/lldb/source/DataFormatters/FormatManager.cpp b/lldb/source/DataFormatters/FormatManager.cpp
index 3b891cecb1c8b9..d6d6935f3e002c 100644
--- a/lldb/source/DataFormatters/FormatManager.cpp
+++ b/lldb/source/DataFormatters/FormatManager.cpp
@@ -174,7 +174,8 @@ void FormatManager::DisableAllCategories() {
 void FormatManager::GetPossibleMatches(
     ValueObject &valobj, CompilerType compiler_type,
     lldb::DynamicValueType use_dynamic, FormattersMatchVector &entries,
-    FormattersMatchCandidate::Flags current_flags, bool root_level) {
+    FormattersMatchCandidate::Flags current_flags, bool dereference_ptr,
+    bool root_level) {
   compiler_type = compiler_type.GetTypeForFormatters();
   ConstString type_name(compiler_type.GetTypeName());
   // A ValueObject that couldn't be made correctly won't necessarily have a
@@ -222,14 +223,15 @@ void FormatManager::GetPossibleMatches(
 
   if (compiler_type.IsPointerType()) {
     CompilerType non_ptr_type = compiler_type.GetPointeeType();
-    GetPossibleMatches(valobj, non_ptr_type, use_dynamic, entries,
-                       current_flags.WithStrippedPointer());
+    if (dereference_ptr)
+      GetPossibleMatches(valobj, non_ptr_type, use_dynamic, entries,
+                        current_flags.WithStrippedPointer(), false);
     if (non_ptr_type.IsTypedefType()) {
       CompilerType deffed_pointed_type =
           non_ptr_type.GetTypedefedType().GetPointerType();
       // this is not exactly the usual meaning of stripping typedefs
       GetPossibleMatches(valobj, deffed_pointed_type, use_dynamic, entries,
-                         current_flags.WithStrippedTypedef());
+                         current_flags.WithStrippedTypedef(), dereference_ptr);
     }
   }
 
diff --git a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py
index f70162bf285839..fea40252a2a75c 100644
--- a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py
+++ b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py
@@ -53,3 +53,15 @@ def cleanup():
         # the match.
         self.expect("frame variable y", substrs=["(Foo &", ") y = 0x", "IntLRef"])
         self.expect("frame variable z", substrs=["(Foo &&", ") z = 0x", "IntRRef"])
+
+        # Test lldb doesn't dereference pointer more than once.
+        # xp has type Foo**, so it can only uses summary for Foo* or int*, not 
+        # summary for Foo or int.
+        self.expect("frame variable xp", substrs=["(Foo **) xp = 0x", "IntPointer"])
+
+        self.runCmd('type summary delete "int *"')
+        self.runCmd('type summary add --cascade true -s "MyInt" "int"')
+        self.expect("frame variable xp", substrs=["(Foo **) xp = 0x"])
+
+        self.runCmd('type summary add --cascade true -s "FooPointer" "Foo *"')
+        self.expect("frame variable xp", substrs=["(Foo **) xp = 0x", "FooPointer"])
diff --git a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp
index 13102106551086..41b1d282344b91 100644
--- a/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp
+++ b/lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/main.cpp
@@ -5,6 +5,9 @@ int main() {
 	Foo* x = &lval;
 	Foo& y = lval;
 	Foo&& z = 1;
+
+	// Test lldb doesn't dereference pointer more than once.
+	Foo** xp = &x;
 	return 0; // Set breakpoint here
 }
 

@ZequanWu ZequanWu requested a review from jimingham January 23, 2025 02:55
Copy link

github-actions bot commented Jan 23, 2025

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r dd860bcfb57df429c0a1ad2e2d869ff3b795bc4d...83197bd326521352d145e4ef64edee7e5eaf41b5 lldb/test/API/functionalities/data-formatter/ptr_ref_typedef/TestPtrRef2Typedef.py
View the diff from darker here.
--- TestPtrRef2Typedef.py	2025-01-23 02:32:11.000000 +0000
+++ TestPtrRef2Typedef.py	2025-01-23 18:45:48.665024 +0000
@@ -53,11 +53,11 @@
         # the match.
         self.expect("frame variable y", substrs=["(Foo &", ") y = 0x", "IntLRef"])
         self.expect("frame variable z", substrs=["(Foo &&", ") z = 0x", "IntRRef"])
 
         # Test lldb doesn't dereference pointer more than once.
-        # xp has type Foo**, so it can only uses summary for Foo* or int*, not 
+        # xp has type Foo**, so it can only uses summary for Foo* or int*, not
         # summary for Foo or int.
         self.expect("frame variable xp", substrs=["(Foo **) xp = 0x", "IntPointer"])
 
         self.runCmd('type summary delete "int *"')
         self.runCmd('type summary add --cascade true -s "MyInt" "int"')

Copy link

github-actions bot commented Jan 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but I'd like to hear what @jimingham thinks about this.

The current behaviour might cause crash on pretty printers. If the pretty printer calls SBValue::GetChildMemberWithName when compute the synthetic children or summary string and its type is T**, this might cause crash as it will return nullptr or None in this case

The code you linked to is problematic even with this fix. The returned value could be null due to bad/missing debug info or due to a new version of libstdc++ renaming the (private) member. That code should be using GetChildAtNamePath, which checks that all the intermediate nodes are present.

@jimingham
Copy link
Collaborator

How does this patch fit with the:

   -p ( --skip-pointers )
        Don't use this format for pointers-to-type objects.

   -r ( --skip-references )
        Don't use this format for references-to-type objects.

settings in type <formatter> add? It seems like we let the formatters control whether they should also apply to pointers and references to those types. Shouldn't that be what controls whether formatters also apply to pointers?

And if a formatter says it can handle pointers, it's up to the formatter to do that right.

@ZequanWu
Copy link
Contributor Author

How does this patch fit with the:

   -p ( --skip-pointers )
        Don't use this format for pointers-to-type objects.

   -r ( --skip-references )
        Don't use this format for references-to-type objects.

settings in type <formatter> add? It seems like we let the formatters control whether they should also apply to pointers and references to those types. Shouldn't that be what controls whether formatters also apply to pointers?

And if a formatter says it can handle pointers, it's up to the formatter to do that right.

This has no impact for reference. For pointers, if -p is set in type <formatter> add, the new formatter will apply only the type T and this patch has no impact for that. This patch changes the behaviour when -p is not set. Currently, lldb apply the new formatter to T, T*, T** and so on if -p is not set. This change let lldb only apply the formatter to T and T*.

@jimingham
Copy link
Collaborator

jimingham commented Jan 23, 2025

How does this patch fit with the:

   -p ( --skip-pointers )
        Don't use this format for pointers-to-type objects.

   -r ( --skip-references )
        Don't use this format for references-to-type objects.

settings in type <formatter> add? It seems like we let the formatters control whether they should also apply to pointers and references to those types. Shouldn't that be what controls whether formatters also apply to pointers?
And if a formatter says it can handle pointers, it's up to the formatter to do that right.

This has no impact for reference. For pointers, if -p is set in type <formatter> add, the new formatter will apply only the type T and this patch has no impact for that. This patch changes the behaviour when -p is not set. Currently, lldb apply the new formatter to T, T*, T** and so on if -p is not set. This change let lldb only apply the formatter to T and T*.

If you write the formatter knowing that you've asked for a match against all the pointers to this type, it's up to the formatter to not assume it's only being passed T or T*. I still don't see why we should be cutting this off in lldb. If I have a std::vector variable, I want to see its size in the variable printing, regardless of whether the handle to it is a pointer, or a pointer to a pointer. It seems artificial to cut that off for me.

@jimingham
Copy link
Collaborator

If anything, we should allow --skip-pointers 1 for formatter writers that only want to handle one level of pointer to the type.

@ZequanWu
Copy link
Contributor Author

How does this patch fit with the:

   -p ( --skip-pointers )
        Don't use this format for pointers-to-type objects.

   -r ( --skip-references )
        Don't use this format for references-to-type objects.

settings in type <formatter> add? It seems like we let the formatters control whether they should also apply to pointers and references to those types. Shouldn't that be what controls whether formatters also apply to pointers?
And if a formatter says it can handle pointers, it's up to the formatter to do that right.

This has no impact for reference. For pointers, if -p is set in type <formatter> add, the new formatter will apply only the type T and this patch has no impact for that. This patch changes the behaviour when -p is not set. Currently, lldb apply the new formatter to T, T*, T** and so on if -p is not set. This change let lldb only apply the formatter to T and T*.

If you write the formatter knowing that you've asked for a match against all the pointers to this type, it's up to the formatter to not assume it's only being passed T or T*. I still don't see why we should be cutting this off in lldb. If I have a std::vector variable, I want to see its size in the variable printing, regardless of whether the handle to it is a pointer, or a pointer to a pointer. It seems artificial to cut that off for me.

The reason that we should only dereference once is that many formatters relies on ValueObject::GetChildMemberWithName to calculate the synthetic children or summary strings. For C/C++, that function eventually calls TypeSystemClang::GetIndexOfChildMemberWithName which will only dereference pointer type once. This will cause inconsistent result when printing T* and T** if they share the same data formatters for T.

For example, when using the builtin libc++ std::vector formatter, lldb shows inconsistent size count for std::vector<int> * and std::vector<int> ** (if you use libstd++ std::vector, you can still see the inconsistence):

Process 2570716 launched: '/tmp/a.out' (x86_64)
Process 2570716 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x000055555555523e a.out`main at main.cpp:7:3
   4      std::vector<int> v = {1,2};
   5      auto vp = &v;
   6      auto vpp = &vp;
-> 7      return 0;
   8    }
(lldb) v vp vpp
(std::vector<int> *) vp = 0x00007fffffffd520 size=2
(std::vector<int> **) vpp = 0x00007fffffffd500 size=1

If anything, we should allow --skip-pointers 1 for formatter writers that only want to handle one level of pointer to the type.

If we really want to have the formatter apply to all pointer types (no matter how many layers there are), it can be achieved with regex like type summary add -x -s "MyInt" "int[ ]*[\*]*".

@jimingham
Copy link
Collaborator

It still sounds like this is a change to accommodate formatters that aren't correctly written. Regex formatters are slower than name match ones and more fallible to write, so I'm not sure "you can write a regex instead" is a good substitute.

@jimingham
Copy link
Collaborator

It isn't that hard to have your formatter dereference the ValueObject it was provided till it's of the form that it wants to deal with (type or pointer) and then pass that to the main formatter.

If the objection is that's a pain to do for every formatter, we could have the type matcher do the dereferencing and always pass the formatter the ValueObject that is the type they registered the formatter for.

This just seems like a really arbitrary restriction, not particularly user-friendly, and is being imposed just to work around formatter bugs, which doesn't strike me as a good motivation.

@ZequanWu
Copy link
Contributor Author

If the objection is that's a pain to do for every formatter, we could have the type matcher do the dereferencing and always pass the formatter the ValueObject that is the type they registered the formatter for.

Do you mean pass the formatter the ValueObject with type T that was registered when printing T* or T**, etc? Then the dereferencing needs to happen when we trying to get the summary or the synthetic value, could be cached of course. Is that right? This sounds good to me.

@ZequanWu
Copy link
Contributor Author

@labath What do you think? The solution above will still use the formatters for T when T** is printed.

@jimingham
Copy link
Collaborator

jimingham commented Jan 23, 2025

I still feel like we should offer options here. If I as a formatter writer am interested in handling all the pointer types of the type I've registered my formatter for, I should be allowed to do that. If I say, my formatter is valid for all the pointer types as well as the type I've registered for, but I only want to write the formatter to handle Values of the original type, we could offer to do the dereferencing for them, and then pass the formatter in. And of course, if you only want to be asked about values of the type you registered your formatter for, you can do that already.

@labath
Copy link
Collaborator

labath commented Jan 24, 2025

@labath What do you think? The solution above will still use the formatters for T when T** is printed.

Before I answer that, I want to go back to the question of dereferencing. While I don't think that skipping of arbitrary numbers of points is wrong (it's consistent and easy to explain), I also don't find it useful. One level -- sure, Even though I don't think it should be the default, I can definitely see reasons for why one would want to see the contents of T when one has a variable of type T*. Reason: many codebases -- lldb included -- use T* for function "return" types or for optional arguments, so if you have a T*, it's quite likely you're interested in seeing the T object rather than the pointer itself.

I don't think this argument holds for more than one level of pointers. If I have a T** or T*** (I've seen those in the wild), what are the chances I want to see the T object at the end of the chain? I think they're very low. I think it's more likely, I'm looking at either:

  • some generic code which works with U*, where U happens to be a T*. In this case I'm more likely to want to see the pointer values that the final object.
  • or some C-like representation of an array (argc, argv), in which case formatter will just tell me something about the first element of the array (but maybe I want to see some other element, or information about the array itself)

Even if the users really does want to see the final object, they can still do that even if we don't dereference all the levels for them. It just means they need to do that themselves.

As for how does this fit in with documented behavior, I don't see much of a problem with that. It's true that you could interpret Don't use this format for pointers-to-type objects. as skipping any level of pointers, but I think you could also interpret it as skipping just one. It says "pointers to type", not "arbitrary level of pointers" or anything like that. For comparison, the cascade option (which skips arbitrary levels of typedefs) says this (emphasis mine) "If true, cascade through typedef chains.", which is a lot more explicit. It also makes more sense because typedefs are often chained and the language (the C family at least) treats typedefs (but not pointers) transparently.

In addition to that, the skip-references option uses the exact same language as for pointers, but it skips only one level of references (although that's because there are no reference chains at the language level, not because we chose to implement it that way). So, even though you can rightfully accuse me of retconning this, I think the result would be fine.

Another factor, which I think supports the fact that skipping all levels isn't the right default is that ~all of our pretty printers are currently broken for multiple pointer levels:

(std::map<int, int> *) pm = 0x00007fffffffd798 size=1
(std::optional<int> *) po =  Has Value=true 
(std::tuple<int, int> *) pt = 0x00007fffffffd780 size=2
(std::vector<int> *) pv = 0x00007fffffffd760 size=2
(std::map<int, int> **) ppm = 0x00007fffffffd758 size=0   # points to pm
(std::optional<int> **) ppo =  Has Value=false    # points to po
(std::tuple<int, int> **) ppt = 0x00007fffffffd748 size=0  # points to pt
(std::vector<int> **) ppv = 0x00007fffffffd740 size=1   # points to pv

If this was one incident, you could dismiss it as a buggy formatter, but since it affects all of them, I think this points to a deeper problem. We can fix all of our formatters to dereference all pointers, but that won't change all the pretty printers out there, and I'm sure that most of them have the same bug. We could try to fix all of them by dereferencing the value before handing it off to the formatter, but I don't think that's useful. The fact that we didn't notice this until now tells me users don't have that many double pointers floating around, and so they likely will not complain if we stop printing this output (which was incorrect anyway).

TL:DR: I think it would be better to skip just one level of pointers. If someone really wants to format an arbitrary number of pointers, they can always use a regex to register a pretty printer for T (\*)*

@jimingham
Copy link
Collaborator

I think that argument has some force for synthetic child providers, but less so for summaries. I'd like to see the size of a std::vector regardless of whether I'm holding it as a *, **, etc.

@labath
Copy link
Collaborator

labath commented Jan 27, 2025

If I was holding a vector as a vector**, then I would would to see its summary/size as well. And maybe even the children too. But the thing I questioning is how often does one hold a C++ container as a double pointer. Is there like some common programming pattern that produces those? The only time I've ever done that is when I was experimenting with how "frame variable" handles pointers and references. I have seen and worked with double pointers, but this was always in some kind of legacy/C context. A combination of double pointers and a std::vector (which essentially makes it a triple pointer) would be very.. eclectic here. Modern C++ would likely turn at least one of those pointers into an iterator, or a smart pointer, or a reference..

It's true that a summary provider is less intrusive (it doesn't override existing children -- though it can hide them!), than a synthetic child provider, but I think it would be very confusing to have different behavior here. The thing I like about doing this for one level is that (as Zequan points out) it nicely matches what our SBValue API does, which makes most formatters "just work".

@jimingham
Copy link
Collaborator

jimingham commented Jan 27, 2025 via email

@ZequanWu
Copy link
Contributor Author

I'm fine with either this change or a change to add --skip-pointers [N] option to let users specify how many layer of pointers they want it matches to. The latter is more flexible. Then the question is if we should use --skip-pointers 1 (which means only dereference one layer pointer) for lldb's built-in libc++/libstdc++ pretty printers. I think we should do so and make that default.

@jimingham
Copy link
Collaborator

jimingham commented Jan 27, 2025 via email

@ZequanWu
Copy link
Contributor Author

I'd rather keep the flexibility, if you don't mind doing the work to add that. For now, it seems okay for the current libc++/libstdc++ ones to declare what they support (which is 1 pointer depth). If someone has a reason to come back and extend these ones in the future, that's fine but a separate piece of work.

After rethinking about the option, it will cause some compatibility issues. Currently, --skip-pointers is boolean option and python formatters use the flag lldb.eTypeOptionSkipPointers to disable/enable it. If we switch it to accept a numeric number, we probably need a new SBAPI to specify the parameter, and this will break existing users who use the lldb.eTypeOptionSkipPointers flag.

@jimingham
Copy link
Collaborator

I'd rather keep the flexibility, if you don't mind doing the work to add that. For now, it seems okay for the current libc++/libstdc++ ones to declare what they support (which is 1 pointer depth). If someone has a reason to come back and extend these ones in the future, that's fine but a separate piece of work.

After rethinking about the option, it will cause some compatibility issues. Currently, --skip-pointers is boolean option and python formatters use the flag lldb.eTypeOptionSkipPointers to disable/enable it. If we switch it to accept a numeric number, we probably need a new SBAPI to specify the parameter, and this will break existing users who use the lldb.eTypeOptionSkipPointers flag.

type summary add and friends don't have that many options. Adding a "pointer matching depth" option, and having --skip-pointers and --pointer-match-depth be in separate option groups so you can't accidentally provide them together would also work.

@labath
Copy link
Collaborator

labath commented Jan 28, 2025

If this were super difficult, I'd be uncomfortable about telling other people that their usages were too esoteric to support but willing to do so as a tradeoff. But I don't think it's a good idea to do that unless we have to. After all, you're basing the argument on common uses of vectors, but really it should be about "the usage of any data type someone might have pointers to, and want to see the summary of." We're providing the tools, not the uses.

It's not super-difficult, but it's not free either. In particular, any user-facing API we add to support this use case is something that we may have to live with for a very long time. If the uses are there, then so be it, but I don't think that's the case, which is why I'm asking if you (or anyone) knows of something like that (and also providing some reasons as to why I think it's unlikely to exist).

The thing I like about doing this for one level is that (as Zequan points out) it nicely matches what our SBValue API does, which makes most formatters "just work".

I still don't see the force of this argument, it's just arguing by analogy.

It's not a killer argument for sure, but it's nice when things are consistent. I think this behavior is nice and consistent because the pretty printer does not have to worry about the kind of object it gets. It can just call GetChildMemberWithName and things will probably work. The current situation is a bit of a footgun, because you may not realize (or even want) that your formatter will work with double pointer values. And unless you're very thorough with testing, you won't even notice it until someone tries to print a double pointer and see that it prints bogus values. (And even they may not realize its a formatter problem -- they may think their program produces corrupt values).

type summary add and friends don't have that many options. Adding a "pointer matching depth" option, and having --skip-pointers and --pointer-match-depth be in separate option groups so you can't accidentally provide them together would also work.

I don't know how easy would it be, but another option would be to keep a single option, but treat true and false as aliases for 1 and 0, respectively. That won't help with the python API though...

@ZequanWu
Copy link
Contributor Author

I'd rather keep the flexibility, if you don't mind doing the work to add that. For now, it seems okay for the current libc++/libstdc++ ones to declare what they support (which is 1 pointer depth). If someone has a reason to come back and extend these ones in the future, that's fine but a separate piece of work.

After rethinking about the option, it will cause some compatibility issues. Currently, --skip-pointers is boolean option and python formatters use the flag lldb.eTypeOptionSkipPointers to disable/enable it. If we switch it to accept a numeric number, we probably need a new SBAPI to specify the parameter, and this will break existing users who use the lldb.eTypeOptionSkipPointers flag.

type summary add and friends don't have that many options. Adding a "pointer matching depth" option, and having --skip-pointers and --pointer-match-depth be in separate option groups so you can't accidentally provide them together would also work.

I find this is tricky because all the flags for data formatters are set with TypeSummaryImpl::Flags which is a 4 byte unsigned integer. I could add a new unsigned integer in TypeSummaryImpl for that. But for users who use SBAPI to create formatters, they need to do something like CreateWithSummaryString(..., options) and then SetPointerMatchDepth(depth) to specify the pointer match depth because we cannot break the existing CreateWithSummaryString APIs. That looks not desired, or maybe not that bad. What do you think?

@jimingham
Copy link
Collaborator

jimingham commented Jan 29, 2025

I'd rather keep the flexibility, if you don't mind doing the work to add that. For now, it seems okay for the current libc++/libstdc++ ones to declare what they support (which is 1 pointer depth). If someone has a reason to come back and extend these ones in the future, that's fine but a separate piece of work.

After rethinking about the option, it will cause some compatibility issues. Currently, --skip-pointers is boolean option and python formatters use the flag lldb.eTypeOptionSkipPointers to disable/enable it. If we switch it to accept a numeric number, we probably need a new SBAPI to specify the parameter, and this will break existing users who use the lldb.eTypeOptionSkipPointers flag.

type summary add and friends don't have that many options. Adding a "pointer matching depth" option, and having --skip-pointers and --pointer-match-depth be in separate option groups so you can't accidentally provide them together would also work.

I find this is tricky because all the flags for data formatters are set with TypeSummaryImpl::Flags which is a 4 byte unsigned integer. I could add a new unsigned integer in TypeSummaryImpl for that. But for users who use SBAPI to create formatters, they need to do something like CreateWithSummaryString(..., options) and then SetPointerMatchDepth(depth) to specify the pointer match depth because we cannot break the existing CreateWithSummaryString APIs. That looks not desired, or maybe not that bad. What do you think?

If you want to preserve the current behavior, then CreateWithSummaryString should set the summary up to recurse infinitely. Then people who wanted new behavior would have to call SetPointerMatchDepth. Since that's new behavior, it's fine that the old API's don't support it. We allow overloads that are going to be unambiguous on the Python side - so we could also make a version of the CreateSummary API's that take an extra depth parameter, you wouldn't have to add the SetDepth API. I think it would be weird to change these defining features of the summary after creating it, so I'd vote for that over a SetDepth call.

TTTT, whenever we've made an APi that takes some set of options in the SB API's and we've done it with some bools or some flags, we always end up regretting it. We have an SBTypeSummaryOptions, maybe we should add the flags ivar to that and some API's to set it's various bits, and add the depth to that, and provide versions of the API that take this options object. But that's a bunch more work you maybe haven't signed up for... In which case, adding the depth variant would be fine too.

ZequanWu added a commit that referenced this pull request May 28, 2025
…add` command. (#138209)

Currently, the type `T`'s summary formatter will be matched for `T`,
`T*`, `T**` and so on. This is unexpected in many data formatters. Such
unhandled cases could cause the data formatter to crash. An example
would be the lldb's built-in data formatter for `std::optional`:
```
$ cat main.cpp
#include <optional>

int main() {
  std::optional<int> o_null;
  auto po_null = &o_null;
  auto ppo_null = &po_null;
  auto pppo_null = &ppo_null;
  return 0;
}
$ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null"
[lldb crash]
```

This change adds an options `--pointer-match-depth` to `type summary
add` command to allow users to specify how many layer of pointers can be
dereferenced at most when matching a summary formatter of type `T`, as
Jim suggested
[here](#124048).
By default, this option has value 1 which means summary formatter for
`T` could also be used for `T*` but not `T**` nor beyond. This option is
no-op when `--skip-pointers` is set as well.

I didn't add such option for `type synthetic add`, `type format add`,
`type filter add`, because it useful for those command. Instead, they
all have the pointer match depth of 1. When printing a type `T*`, lldb
never print the children of `T` even if there is a synthetic formatter
registered for `T`.
@ZequanWu
Copy link
Contributor Author

Closed this as #138209 is preferred.

@ZequanWu ZequanWu closed this May 28, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 28, 2025
…pe summary add` command. (#138209)

Currently, the type `T`'s summary formatter will be matched for `T`,
`T*`, `T**` and so on. This is unexpected in many data formatters. Such
unhandled cases could cause the data formatter to crash. An example
would be the lldb's built-in data formatter for `std::optional`:
```
$ cat main.cpp
#include <optional>

int main() {
  std::optional<int> o_null;
  auto po_null = &o_null;
  auto ppo_null = &po_null;
  auto pppo_null = &ppo_null;
  return 0;
}
$ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null"
[lldb crash]
```

This change adds an options `--pointer-match-depth` to `type summary
add` command to allow users to specify how many layer of pointers can be
dereferenced at most when matching a summary formatter of type `T`, as
Jim suggested
[here](llvm/llvm-project#124048).
By default, this option has value 1 which means summary formatter for
`T` could also be used for `T*` but not `T**` nor beyond. This option is
no-op when `--skip-pointers` is set as well.

I didn't add such option for `type synthetic add`, `type format add`,
`type filter add`, because it useful for those command. Instead, they
all have the pointer match depth of 1. When printing a type `T*`, lldb
never print the children of `T` even if there is a synthetic formatter
registered for `T`.
google-yfyang pushed a commit to google-yfyang/llvm-project that referenced this pull request May 29, 2025
…add` command. (llvm#138209)

Currently, the type `T`'s summary formatter will be matched for `T`,
`T*`, `T**` and so on. This is unexpected in many data formatters. Such
unhandled cases could cause the data formatter to crash. An example
would be the lldb's built-in data formatter for `std::optional`:
```
$ cat main.cpp
#include <optional>

int main() {
  std::optional<int> o_null;
  auto po_null = &o_null;
  auto ppo_null = &po_null;
  auto pppo_null = &ppo_null;
  return 0;
}
$ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null"
[lldb crash]
```

This change adds an options `--pointer-match-depth` to `type summary
add` command to allow users to specify how many layer of pointers can be
dereferenced at most when matching a summary formatter of type `T`, as
Jim suggested
[here](llvm#124048).
By default, this option has value 1 which means summary formatter for
`T` could also be used for `T*` but not `T**` nor beyond. This option is
no-op when `--skip-pointers` is set as well.

I didn't add such option for `type synthetic add`, `type format add`,
`type filter add`, because it useful for those command. Instead, they
all have the pointer match depth of 1. When printing a type `T*`, lldb
never print the children of `T` even if there is a synthetic formatter
registered for `T`.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
…add` command. (llvm#138209)

Currently, the type `T`'s summary formatter will be matched for `T`,
`T*`, `T**` and so on. This is unexpected in many data formatters. Such
unhandled cases could cause the data formatter to crash. An example
would be the lldb's built-in data formatter for `std::optional`:
```
$ cat main.cpp
#include <optional>

int main() {
  std::optional<int> o_null;
  auto po_null = &o_null;
  auto ppo_null = &po_null;
  auto pppo_null = &ppo_null;
  return 0;
}
$ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null"
[lldb crash]
```

This change adds an options `--pointer-match-depth` to `type summary
add` command to allow users to specify how many layer of pointers can be
dereferenced at most when matching a summary formatter of type `T`, as
Jim suggested
[here](llvm#124048).
By default, this option has value 1 which means summary formatter for
`T` could also be used for `T*` but not `T**` nor beyond. This option is
no-op when `--skip-pointers` is set as well.

I didn't add such option for `type synthetic add`, `type format add`,
`type filter add`, because it useful for those command. Instead, they
all have the pointer match depth of 1. When printing a type `T*`, lldb
never print the children of `T` even if there is a synthetic formatter
registered for `T`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants