Skip to content

Compatible LLDB and GDB Python extensions#13136

Merged
gasche merged 7 commits into
ocaml:trunkfrom
NickBarnes:nick-lldb-python
May 2, 2024
Merged

Compatible LLDB and GDB Python extensions#13136
gasche merged 7 commits into
ocaml:trunkfrom
NickBarnes:nick-lldb-python

Conversation

@NickBarnes

@NickBarnes NickBarnes commented Apr 29, 2024

Copy link
Copy Markdown
Contributor

[Edited to show updated textual representations].

This PR improves the OCaml debugger extensions in several ways:

  1. The old gdb-macros script has been re-implemented in Python, and is now much faster;
  2. gdb_ocamlrun.py has been rewritten as separate debugger-agnostic ocaml.py and GDB-specific gdb.py;
  3. lldb.py has been added to support ocaml.py in LLDB (note that GDB is not available in MacOS on Apple Silicon);
  4. gdb-macros and gdb_ocamlrun.py still exist and work, but give deprecation warnings when loaded.

Use

(gdb) source tools/gdb.py

# or

(lldb) command script import tools/lldb.py

Value printing

In either debugger, any value of type value will now print in a consistent way:

value          summary                  full representation
3              3                        caml:3
(3, 4)         (3, 4)                   caml(m):(3, 4)
Some 6         (6)                      caml(m):(6)
Some(Some 6)   ((<t0:1>))               caml(m):((6))
3.4            3.4                      caml(m):3.4
X 3.2          (t1: 3.2)                caml(m):(t1: 3.2)
"Foo"          'Foo'                    caml(m):'Foo'<3>
[|0.0; 1.0 |]  (Double_array:0.0, 1.0)  caml(m):(Double_array: 0.0, 1.0)

The "summary" will appear as part of a tuple or array. Within a summary, summaries are shortened further (so this is a two-level representation).

Tag zero is omitted. The (m) or (u) (or (g) or (-)) is the GC color.

Very long strings are abbreviated (and more abbreviated in a summary):

caml(m):'It was a bright cold day in April and the clocks were striking thir'...'Mansions'<217>
caml(m):(t4: 'It was a'...'ansions'<217>, 0, (42), 7)

Long tuples are abbreviated, in full display and in summary:

caml(-):(t5: 'big record', 7, 6, 5.0, (t4:'small record', 0, <t0:1>, 4), 4, ..., 1)<9>
caml(m):((0), (1, 1), (2, 2, 2, 2), (3, 3, 3, 3, 3, 3, ..., 3), (4, 4, 4, 4, 4, 4, ..., 4), (5, 5, 5, 5, 5, 5, ..., 5), ..., (14, 14, 14, 14, 14, 14, ..., 14))<15>

Closures show the symbol names of all their functions, the arity of the first, and the closure contents. When summarized, they just show the first function name, the number of functions, and the closure length:

caml(m):closure(camlTest.f_525, camlTest.g_526(caml_tuplify2), camlTest.h_527(caml_curry2)) arity 1 (12)
caml(m):(closure(camlTest.f_525, +2)<1>)

Infix pointers (to functions in shared closures) show the function symbol name, and the summary of the containing closure:

caml(m):infix(camlTest.g_526) in 0x1101fffa8 closure(camlTest.f_525, +2)<1>

Custom blocks show the custom operations block identifier:

caml(m):custom _bigarr02<6>

(Note that "_bigarr02" is the custom operations block "identifier" for big arrays, so this last example is how a Bigarray will print).

ocaml command

In both debuggers, this introduces an ocaml command, which allows for many sub-commands. At present, this provides just one sub-command:

ocaml find

The old heap-search code in gdb-macros is now invoked with the ocaml find sub-command:

(gdb) ocaml find arg
arg caml(u):(t5: 'big record', 7, 6, 5.0, (t4:'small record', 0, <t0:1>, 4), 4, ..., 1)<9> 0x7fffe7b49e08 found:
domain 0 unswept avail wsize=16: pool 0x7fffe7b42000-0x7fffe7b4a000
(gdb) 

$Array in gdb

This PR retains the old $F convenience function in gdb, but renames it $Array (as it converts the value parameter to an array). LLDB does not support convenience functions, so there is no equivalent there.

Example of use

This is cut-and-paste from a shell:

bash-3.2$ ocamlopt test.ml
bash-3.2$ lldb a.out
(lldb) target create "a.out"
Current executable set to '/Users/nick/o/dev/a.out' (arm64).
(lldb) b caml_obj_tag
Breakpoint 1: where = a.out`caml_obj_tag [inlined] obj_tag at obj.c:39:7, address = 0x00000001000759c4
(lldb) command script import tools/lldb.py
command script import tools/lldb.py
OCaml support module loaded. Values of type 'value' will now
print as OCaml values, and an 'ocaml' command is available for
heap exploration (see 'help ocaml' for more information).
(lldb) run
Process 48498 launched: '/Users/nick/o/dev/a.out' (arm64)
Process 48498 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000759c4 a.out`caml_obj_tag [inlined] obj_tag(arg=caml(u):(t5: 'big record', 7, 6, 5.0, (t4:'small record', 0, <t0:1>, 4), 4, ..., 1)<9>) at obj.c:39:7 [opt]
   36  	{
   37  	  header_t hd;
   38  	
-> 39  	  if (Is_long (arg)) {
    	      ^
   40  	    return 1000;   /* int_tag */
   41  	  } else if ((long) arg & (sizeof (value) - 1)) {
   42  	    return 1002;   /* unaligned_tag */
Target 0: (a.out) stopped.
warning: a.out was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) p arg
(value) $0 = 4565499816 caml(u):(t5: 'big record', 7, 6, 5.0, (t4:'small record', 0, <t0:1>, 4), 4, ..., 1)<9>
(lldb) b *0x100004f50
Breakpoint 2: where = a.out`camlTest.entry + 760, address = 0x0000000100004f50
(lldb) c
Process 48498 resuming
Process 48498 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x0000000100004f50 a.out`camlTest.entry + 760
a.out`camlTest.entry:
->  0x100004f50 <+760>: str    x29, [sp, #-0x10]!
    0x100004f54 <+764>: mov    x29, sp
    0x100004f58 <+768>: ldr    x16, [x28, #0x40]
    0x100004f5c <+772>: mov    sp, x16
Target 0: (a.out) stopped.
(lldb) p (value)$x0
(value) $5 = 4565499840 caml(m):(Double_array: 0.0, 1.0, 4.0, 9.0, 16.0, 25.0, ..., 81.0)<10>
(lldb) 

@NickBarnes NickBarnes marked this pull request as ready for review April 29, 2024 16:32
@gasche

gasche commented Apr 29, 2024

Copy link
Copy Markdown
Member
Some(3)         1@Tag0                 [Caml: 1@Tag0 (i6) [m]]
(3, Some(3))    2@Tag0                 [Caml: 2@Tag0 (i3, 1@Tag0) [m]]

I don't understand these two examples. Why would the first one print i6 and the second i3 for the integer 3?
(I also was confused by the printing of the second element of the tuple, but I guess it is intentional that only the summary is printed for arguments?)

@gasche gasche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks overall fine to me, and I would propose to merge instead of nitpicking on the implementation.

I have one question: on the assumption that some people may be used to the current scripts, have muscle memory of its commands etc., why don't we also keep the old script around? (Maybe with a message to suggest using the new scripts instead?) Is there a cost to offering both?

Comment thread tools/lldb.py Outdated
def get_short_help(self):
return "Describe the location of the given OCaml value in the heap."
def get_long_help(self):
return "Wibble Wibble Wibble ##TODO"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this include some examples of how to print out values?

(lldb) help ocaml find
Describe the location of the given OCaml value in the heap.  Expects 'raw' input (see 'help raw-input'.)

Syntax: find
Wibble Wibble Wibble ##TODO

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes. Lol.

@NickBarnes

Copy link
Copy Markdown
Contributor Author
Some(3)         1@Tag0                 [Caml: 1@Tag0 (i6) [m]]
(3, Some(3))    2@Tag0                 [Caml: 2@Tag0 (i3, 1@Tag0) [m]]

I don't understand these two examples. Why would the first one print i6 and the second i3 for the integer 3? (I also was confused by the printing of the second element of the tuple, but I guess it is intentional that only the summary is printed for arguments?)

This is just a typo on my part as I wrote the matching comment. I think these textual representation choices I made are pretty ugly, and some of them should change, probably before merging. The important part of this PR is the structure: matching LLDB/GDB shims and the debugger-independent ocaml.py.

@gasche

gasche commented Apr 30, 2024

Copy link
Copy Markdown
Member

I do find the "full" printing a bit heavyweight, and I thought in particular about suggesting to drop the arity and not show the tag when it is 0. But I don't have much experience using gdb on OCaml programs and I am not sure what amount of expliciteness users need when they are deep in the bowels of a program like that.

@gasche

gasche commented Apr 30, 2024

Copy link
Copy Markdown
Member

From a high-level perspective: I think it is up to the users of these scripts to suggest improvements, and I think we should be liberal in merging them as they don't affect the correctness of the compiler distribution. My only worry is that existing users of the previous script may not appreciate having their muscle-memorized workflow break with the new version, hence my suggestion to maybe keep the old script around to make everyone happy.

@NickBarnes

Copy link
Copy Markdown
Contributor Author

I like the idea of keeping the old scripts around, and making them print deprecation and migration messages.

@NickBarnes

Copy link
Copy Markdown
Contributor Author

I do find the "full" printing a bit heavyweight, and I thought in particular about suggesting to drop the arity and not show the tag when it is 0. But I don't have much experience using gdb on OCaml programs and I am not sure what amount of expliciteness users need when they are deep in the bowels of a program like that.

I'm just tweaking the output a bit today. I agree that it is a bit heavyweight, especially with tag zero and low arity (we could just write [Caml: (3, "foo", 5) [m]] or something. It's hard to lose much more from that: The [Caml:...] is important IMO to make clear that we're stripping tag bits etc. The [m] is the GC colour bits (m: MARKED, u: UNMARKED, g: GARBAGE, -: NOT_MARKABLE) which is pretty important. I guess some of that could be switchable with an ocaml sub-command (or we could have a lighter version by default and add ocaml print to get the full version).

@NickBarnes NickBarnes marked this pull request as draft April 30, 2024 15:00
@NickBarnes

Copy link
Copy Markdown
Contributor Author

I've reworked all the textual representations, and think they are now better (though doubtless still imperfect). The code for making them is improved too! I've also reinstated the old scripts, which will still load into GDB, and run, but produce deprecation messages.

@NickBarnes NickBarnes marked this pull request as ready for review May 1, 2024 16:24
@gasche

gasche commented May 1, 2024

Copy link
Copy Markdown
Member

I'm not fond of the new printing.

value          summary                full representation
3              3                      Caml(3)
(3, 4)         (3, 4)                 Caml(3, 4) [m]
Some 6         ((6))                  Caml((6)) [m]
Some(Some 6)   (([:1:]))              Caml((((6)))) [m]
3.4            3.4                    Caml(3.4) [m]
X 3.2          Tag4(3.2)              Caml(Tag4 (3.2)) [m]
"Foo"          'Foo'                  Caml('Foo')[3] [m]
[|0.0; 1.0 |]  Double_array(0.0, 1.0) Caml(Double_array 0.0, 1.0) [m]

Pain points:

  • Caml((((6)))) is basically unreadable, I don't see a way for users to guess that this is Some(Some(6)).
  • this comes from the use of parentheses around Caml, and also to indicate blocks, which is confusing; I would rather have caml:3 and caml:((6)) for example.
  • the difference between (6) and Tag1 (6) is a bit frustrating. (I realize that this seems contradictory with the idea of eliding tag 0 by default)
  • (Double_array 0.0, 1.0) reads to me like Double_array applies to the first element 0.0 only

Possible suggestion:

  • caml:3
  • caml(m):[3 4]
  • caml(m):[[6]]
  • caml(m):3.4
  • caml(m):[4: 3.2]
  • caml(m):"Foo"(3)
  • caml(m):[Double_array: 0.0 1.0]

(Note: I never ever needed to ask for the GC marks of an OCaml value from gdb to track a bug in my code. This is obviously useful to debug GC issues and maybe some FFI bugs, but fairly niche; could this maybe be a separate command?)

@NickBarnes

Copy link
Copy Markdown
Contributor Author

I'm happy to rework the representations again. I knew they were wrong but had run out of steam for experimentation. How's this:

caml:3
caml(-):[6]
caml(-):[[6]]
caml(-):3.4
caml(-):[T1: 3.2]
caml(-):'Hello, world!'[13]
caml(-):'It was a bright cold day in April and the clocks were striking thir'...'Mansions'[217]
caml(-):[T4: 'It was a'...'ansions'[217], 0, [42], 7]
caml(-):[T2: 'Hello, world!']
caml(m):[Double_array: 0.0, 1.0, 4.0, 9.0, 16.0, 25.0, ..., 81.0][10]
caml(m):[T3: [Double_array:0.0, 1.0, 4.0, 9.0, 16.0, 25.0, ..., 81.0]]
caml(m):[[0], [1, 1], [2, 2, 2, 2], [3, 3, 3, 3, 3, 3, ..., 3], [4, 4, 4, 4, 4, 4, ..., 4], [5, 5, 5, 5, 5, 5, ..., 5], ..., [14, 14, 14, 14, 14, 14, ..., 14]][15]
caml(m):[[*1, *2], [*3, *2]]
caml(-):[T5: 'big record', 7, 6, 5.0, [T4:'small record', 0, *1, 4], 4, ..., 1][9]
caml(-):[T4: 'small record', 0, [42], 4]
caml(m):custom _bigarr02[6]
caml(m):[custom _bigarr02[6]]
caml(m):closure (camlTest.f_525, camlTest.g_526(caml_tuplify2), camlTest.h_527(caml_curry2)) arity 1 (12)
caml(m):infix(camlTest.g_526) in 0x1101fffa8 closure(camlTest.f_525, +2)[1]
caml(m):infix(camlTest.h_527) in 0x1101fffa8 closure(camlTest.f_525, +2)[1]
caml(m):[closure(camlTest.f_525, +2)[1]]
caml(m):[infix(camlTest.g_526) in closure(camlTest.f_525, +2)[1]]
caml(m):[closure(camlTest.f_525, +2)[1], infix(camlTest.g_526) in closure(camlTest.f_525, +2)[1], infix(camlTest.h_527) in closure(camlTest.f_525, +2)[1]]
caml(-):closure (camlTest.f_270) arity 1 ()
caml(m):[closure(camlTest.f_270)[0]]

I think the worst part that remains is highly-summarized blocks: when we want to elide all the contents and just communicate "a block with tag N and size k". At present this is tag*k, where tag is either the empty string, or T<n> for some numeric tag , or Double_array. But this leads to this mysterious representation in the above list: caml(m):[[*1, *2], [*3, *2]] That's obviously terrible. I think that this might be OK for non-zero tags caml(m):[[T7*3, T2*4], [Double_array*6]] so maybe I just use T0 for the zero tag? Suggestions?

@NickBarnes

Copy link
Copy Markdown
Contributor Author

User settings for structure depth and breadth, and of whether to show the GC color, could be added as an ocaml sub-command (ocaml set print-depth 5 etc etc) but I'd like to postpone that indefinitely, until somebody - likely myself - wants to scratch that itch.

@gasche

gasche commented May 2, 2024

Copy link
Copy Markdown
Member

Another thing I am less fond of when seeing your examples (thanks!) is the visual overloading of [17] as both a size and a one-element tuple.

We could use "Foo"<3> instead of "Foo"[3] (or whatever other thing we like) to avoid this issue, and use your idea of ellipsis in the "compact" representation: tag*size becomes [tag: ...]<size>: 3*4 becomes [T3: ...]<4>. This is less compact, but more regular and more discoverable.

@NickBarnes

Copy link
Copy Markdown
Contributor Author

Following your comment and in-person input from @dra27, I feel we are converging rapidly:

caml:3
caml(-):(6)
caml(-):((6))
caml(-):3.4
caml(-):(t1: 3.2)
caml(-):'Hello, world!'<13>
caml(-):'It was a bright cold day in April and the clocks were striking thir'...'Mansions'<217>
caml(-):(t4: 'It was a'...'ansions'<217>, 0, (42), 7)
caml(-):(t2: 'Hello, world!')
caml(m):(Double_array: 0.0, 1.0, 4.0, 9.0, 16.0, 25.0, ..., 81.0)<10>
caml(m):(t3: (Double_array:0.0, 1.0, 4.0, 9.0, 16.0, 25.0, ..., 81.0))
caml(m):closure(camlTest.f_525, camlTest.g_526(caml_tuplify2), camlTest.h_527(caml_curry2)) arity 1 (12)
caml(m):infix(camlTest.g_526) in 0x1101fffa8 closure(camlTest.f_525, +2)<1>
caml(m):infix(camlTest.h_527) in 0x1101fffa8 closure(camlTest.f_525, +2)<1>
caml(m):(closure(camlTest.f_525, +2)<1>)
caml(m):(infix(camlTest.g_526) in closure(camlTest.f_525, +2)<1>)
caml(m):(closure(camlTest.f_525, +2)<1>, infix(camlTest.g_526) in closure(camlTest.f_525, +2)<1>, infix(camlTest.h_527) in closure(camlTest.f_525, +2)<1>)
caml(-):closure(camlTest.f_270) arity 1 ()
caml(m):(closure(camlTest.f_270)<0>)
caml(m):((0), (1, 1), (2, 2, 2, 2), (3, 3, 3, 3, 3, 3, ..., 3), (4, 4, 4, 4, 4, 4, ..., 4), (5, 5, 5, 5, 5, 5, ..., 5), ..., (14, 14, 14, 14, 14, 14, ..., 14))<15>
caml(m):((<t0:1>, <t0:2>), (<t0:3>, <t0:2>))
caml(-):(t5: 'big record', 7, 6, 5.0, (t4:'small record', 0, <t0:1>, 4), 4, ..., 1)<9>
caml(-):(t4: 'small record', 0, (42), 4)
caml(m):custom _bigarr02<6>
caml(m):(custom _bigarr02<6>)
  • Consistent use of angle brackets for length: <7>
  • Parens rather than brackets for blocks and arrays, to match OCaml syntax more closely: (3, 4, 5)
  • Lower-case t for tags: (t3: "Foo", 5, 6.8)
  • Maximally-summarized blocks become <t0:3>or <t7:99>.

If you can live with this, let's stop bike-shedding at this point?

@gasche gasche added the merge-me label May 2, 2024
@gasche

gasche commented May 2, 2024

Copy link
Copy Markdown
Member

Agreed!

@NickBarnes

Copy link
Copy Markdown
Contributor Author

Thanks for all your input on the representations, @gasche: they are much better now than they were an hour ago!

@Octachron

Octachron commented May 2, 2024

Copy link
Copy Markdown
Member

Unicode nitpicking (even if is probably not be applicable): angle brackets are 〈 〉, not < and > which are binary operators (and for displaying contents, we don't have the input problem) . Similarly, using the unicode precomposed ellipsis might avoid relying on font ligatures for the decomposed form.

@gasche gasche merged commit 5369da4 into ocaml:trunk May 2, 2024
@gasche

gasche commented May 2, 2024

Copy link
Copy Markdown
Member

I decided to go ahead and merge. If someone is motivated to use unicode bells and whistles, just open a new small PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants