runtime/HACKING.adoc: tips on debugging the runtime #11058

gasche · 2022-02-25T09:12:43Z

I realized during the night that I don't know how to print a backtrace after observing a test program crash after an assertion failure. (I'm no C programmer; I was always too lazy to find that out, and would just run valgrind ./test to get a trace. But then valgrind is unusably slow on parallel-heavy tests.) I searched for it on the web, and here is a PR to try to save other runtime beginners the same embarrassing search.

gasche · 2022-02-25T09:17:55Z

You can view the rendered output at https://github.com/gasche/ocaml/blob/runtime-HACKING.adoc/runtime/HACKING.adoc

Current table of contents:

== Linking a test program with the debug runtime ==
== GC messages ==
== Heap verification ==
== Getting stack traces after assertion failures (Linux) ==
== (TODO) Using `rr` for deterministic replay debugging ==
== (TODO) Compiling with sanitizers ==

lthls · 2022-02-25T09:38:16Z

I'm not sure how it would fit in your PR (nice work, by the way), but when I just need a backtrace I re-run the program with gdb --args myprogram ..., then use run then bt.
If you manage to merge some of the multicore documentation on using rr, this could fit in the same section.
All of that assumes that your failure is deterministic and reasonably fast to trigger (gdb has almost no overhead, unlike valgrind). For hard-to-reproduce bugs, retrieving the core dump is your best solution.

gasche · 2022-02-25T09:43:31Z

I used to do this until this night, but in fact on my machine coredumpctl debug will run a debugger as if it was in the failed state of the last crash, so echo bt | coredumpctl debug does what you suggest after the fact. Try it at home!

Note: I am not planning to fill the sections marked TODO in the context of this specific PR, because I don't have the knowledge to do so myseelf. I think it's fine to leave TODO around here.

abbysmal · 2022-02-25T09:46:04Z

This is very nice!
I can take care of filling the rr part, as I originally wrote some of the linked Multicore OCaml wiki page, if you'd like.

xavierleroy · 2022-02-25T09:56:16Z

The classic Unix way to inspect a core dump is

    gdb <executable file> <core file>

Substitute lldb or the debugger of your choice for gdb.

You can print a backtrace (bt) but also inspect variables (print) including local variables of functions "up" the call chain.

I don't see the point of the systemd stuff, except perhaps for post-mortem debugging of services that run in the background and are launched directly by systemd.

gasche · 2022-02-25T10:07:59Z

I don't see the point of the systemd stuff, except perhaps for post-mortem debugging of services that run in the background and are launched directly by systemd.

I'm not sure either, but this is what my (recent Fedora) system does by default, so I had to learn about it anyway. In particular, you can't guess the path of the core file without using this coredumpctl tool, and it's stored in compressed format so using the "classic way" would be a pain.

If we want to also offer "classic Unix" instructions in HACKING.adoc (I guess that's a good idea if some commonly-used distribution doesn't use the systemd stuff), I would prefer if someone wrote it, as I cannot test those instructions on my own system. I guess it would also be nice to document the workflow on that proprietary operating system with good hardware.

@Engil contributing instructions for rr would be lovely. In general any help importing more documentation upstream when/where it makes sense is appreciated. Note that the Multicore Wiki section is fairly long, you may want to consider having a dedicated file (HACKING-rr.{md,adoc}?) if your import is equally long, to not get the other "tips" lost in the middle.

(I'm using the AsciiDoc format for consistency with the root HACKING.adoc, because it shows a table of contents by default. For new documentation files, people should feel free to use Markdown if they are more familiar with it.)

Then again, I would propose to get this PR merged quickly, and have further additions be done by direct pushes to trunk or follow-up PRs. (Please do add your name to the Changes entry.)

kayceesrk · 2022-02-26T03:53:17Z

There are two pages in the Multicore OCaml wiki which is relevant to debugging:

The "run until failure" hacks in the first link have been useful to debug hard to non-deterministic crashes in the runtime. In particular, once we've managed to capture a failing trace in rr, it is very likely that the bug can be fixed using just that single trace.

gasche · 2022-03-03T15:23:19Z

Would anyone be willing to approve this? I don't have the time right now to integrate some of the nice suggestions that have been made so far, and I think (unless someone of course objects to the current content) it could still be useful to have this in the hands of contributors.

abbysmal · 2022-03-03T15:41:15Z

HACKING.adoc

+structures, etc. Mostly implemented in C, with some rare bits of
+assembly code in architecture-specific files. The "includes"
+corresponding to the `.c` files are in the link:runtime/caml[]
+subdirectory.


I like this paragraph, it would be nice in the future to extend it to include other oddities (like the domain state generation and other generated things in the runtime code).

abbysmal · 2022-03-03T15:43:15Z

I am willing to approve this but I'm not sure what to take away from the conversation about the old unix way vs weird modern Linux distribution way of doing things.

On the other hand what is currently written seems already pretty sensible and I agree that having the document sit in this PR is not likely to help refining it further.
I will definitely take a stab at the rr part in the future.

I will take another look and approve it, maybe we should move some of the highlighted points here (rr, and weird modern mechanisms for debugging vs how it was) into a specific issue. (if it feels issue-worthy, if only for tracking purpose.)

gasche · 2022-03-03T15:49:31Z

Sounds very reasonable. They could also be added as TODOs inside the document directly. (If you like this idea and you volunteer, I'll give you write access to my repo so that you can push a commit doing just that.)

gasche · 2022-04-05T13:20:37Z

Any volunteers for an approval?

(Every time I remember this PR, I go back to it, realize that I could help by indeed adding more TODOs to the document, and then I switch to something else and I forget about it. I would rather let us collectively forget about it after it is merged somewhere long-term.)

nojb

LGTM. Thanks! This is very useful.

HACKING.adoc

gasche · 2022-04-05T13:55:44Z

Thanks! I rebased, and this should be good to get if/when the CI agrees.

gasche force-pushed the runtime-HACKING.adoc branch from 12bca36 to 4bb52fd Compare February 25, 2022 09:15

abbysmal reviewed Mar 3, 2022

View reviewed changes

gasche mentioned this pull request Apr 5, 2022

Restore frame-pointers support for amd64 #11144

Merged

nojb approved these changes Apr 5, 2022

View reviewed changes

nojb reviewed Apr 5, 2022

View reviewed changes

HACKING.adoc Outdated Show resolved Hide resolved

gasche force-pushed the runtime-HACKING.adoc branch from 4bb52fd to a92bdfb Compare April 5, 2022 13:52

runtime/HACKING.adoc: tips on debugging the runtime

7982ff9

gasche force-pushed the runtime-HACKING.adoc branch from a92bdfb to 7982ff9 Compare April 5, 2022 13:55

gasche added the merge-me label Apr 5, 2022

nojb merged commit 6b7301b into ocaml:trunk Apr 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/HACKING.adoc: tips on debugging the runtime #11058

runtime/HACKING.adoc: tips on debugging the runtime #11058

gasche commented Feb 25, 2022 •

edited

Loading

gasche commented Feb 25, 2022

lthls commented Feb 25, 2022

gasche commented Feb 25, 2022

abbysmal commented Feb 25, 2022

xavierleroy commented Feb 25, 2022

gasche commented Feb 25, 2022

kayceesrk commented Feb 26, 2022

gasche commented Mar 3, 2022

abbysmal Mar 3, 2022

abbysmal commented Mar 3, 2022

gasche commented Mar 3, 2022

gasche commented Apr 5, 2022

nojb left a comment

gasche commented Apr 5, 2022

runtime/HACKING.adoc: tips on debugging the runtime #11058

runtime/HACKING.adoc: tips on debugging the runtime #11058

Conversation

gasche commented Feb 25, 2022 • edited Loading

gasche commented Feb 25, 2022

lthls commented Feb 25, 2022

gasche commented Feb 25, 2022

abbysmal commented Feb 25, 2022

xavierleroy commented Feb 25, 2022

gasche commented Feb 25, 2022

kayceesrk commented Feb 26, 2022

gasche commented Mar 3, 2022

abbysmal Mar 3, 2022

Choose a reason for hiding this comment

abbysmal commented Mar 3, 2022

gasche commented Mar 3, 2022

gasche commented Apr 5, 2022

nojb left a comment

Choose a reason for hiding this comment

gasche commented Apr 5, 2022

gasche commented Feb 25, 2022 •

edited

Loading