Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/HACKING.adoc: tips on debugging the runtime #11058

Merged
merged 1 commit into from
Apr 5, 2022

Conversation

gasche
Copy link
Member

@gasche gasche commented Feb 25, 2022

I realized during the night that I don't know how to print a backtrace after observing a test program crash after an assertion failure. (I'm no C programmer; I was always too lazy to find that out, and would just run valgrind ./test to get a trace. But then valgrind is unusably slow on parallel-heavy tests.) I searched for it on the web, and here is a PR to try to save other runtime beginners the same embarrassing search.

@gasche
Copy link
Member Author

gasche commented Feb 25, 2022

You can view the rendered output at https://github.com/gasche/ocaml/blob/runtime-HACKING.adoc/runtime/HACKING.adoc

Current table of contents:

== Linking a test program with the debug runtime ==
== GC messages ==
== Heap verification ==
== Getting stack traces after assertion failures (Linux) ==
== (TODO) Using `rr` for deterministic replay debugging ==
== (TODO) Compiling with sanitizers ==

@lthls
Copy link
Contributor

lthls commented Feb 25, 2022

I'm not sure how it would fit in your PR (nice work, by the way), but when I just need a backtrace I re-run the program with gdb --args myprogram ..., then use run then bt.
If you manage to merge some of the multicore documentation on using rr, this could fit in the same section.
All of that assumes that your failure is deterministic and reasonably fast to trigger (gdb has almost no overhead, unlike valgrind). For hard-to-reproduce bugs, retrieving the core dump is your best solution.

@gasche
Copy link
Member Author

gasche commented Feb 25, 2022

I used to do this until this night, but in fact on my machine coredumpctl debug will run a debugger as if it was in the failed state of the last crash, so echo bt | coredumpctl debug does what you suggest after the fact. Try it at home!

Note: I am not planning to fill the sections marked TODO in the context of this specific PR, because I don't have the knowledge to do so myseelf. I think it's fine to leave TODO around here.

@abbysmal
Copy link
Contributor

This is very nice!
I can take care of filling the rr part, as I originally wrote some of the linked Multicore OCaml wiki page, if you'd like.

@xavierleroy
Copy link
Contributor

The classic Unix way to inspect a core dump is

    gdb <executable file> <core file>

Substitute lldb or the debugger of your choice for gdb.

You can print a backtrace (bt) but also inspect variables (print) including local variables of functions "up" the call chain.

I don't see the point of the systemd stuff, except perhaps for post-mortem debugging of services that run in the background and are launched directly by systemd.

@gasche
Copy link
Member Author

gasche commented Feb 25, 2022

I don't see the point of the systemd stuff, except perhaps for post-mortem debugging of services that run in the background and are launched directly by systemd.

I'm not sure either, but this is what my (recent Fedora) system does by default, so I had to learn about it anyway. In particular, you can't guess the path of the core file without using this coredumpctl tool, and it's stored in compressed format so using the "classic way" would be a pain.

If we want to also offer "classic Unix" instructions in HACKING.adoc (I guess that's a good idea if some commonly-used distribution doesn't use the systemd stuff), I would prefer if someone wrote it, as I cannot test those instructions on my own system. I guess it would also be nice to document the workflow on that proprietary operating system with good hardware.

@Engil contributing instructions for rr would be lovely. In general any help importing more documentation upstream when/where it makes sense is appreciated. Note that the Multicore Wiki section is fairly long, you may want to consider having a dedicated file (HACKING-rr.{md,adoc}?) if your import is equally long, to not get the other "tips" lost in the middle.

(I'm using the AsciiDoc format for consistency with the root HACKING.adoc, because it shows a table of contents by default. For new documentation files, people should feel free to use Markdown if they are more familiar with it.)

Then again, I would propose to get this PR merged quickly, and have further additions be done by direct pushes to trunk or follow-up PRs. (Please do add your name to the Changes entry.)

@kayceesrk
Copy link
Contributor

There are two pages in the Multicore OCaml wiki which is relevant to debugging:

  1. https://github.com/ocaml-multicore/ocaml-multicore/wiki/Debugger-hacks
  2. https://github.com/ocaml-multicore/ocaml-multicore/wiki/Debugging-the-OCaml-Multicore-runtime

The "run until failure" hacks in the first link have been useful to debug hard to non-deterministic crashes in the runtime. In particular, once we've managed to capture a failing trace in rr, it is very likely that the bug can be fixed using just that single trace.

@gasche
Copy link
Member Author

gasche commented Mar 3, 2022

Would anyone be willing to approve this? I don't have the time right now to integrate some of the nice suggestions that have been made so far, and I think (unless someone of course objects to the current content) it could still be useful to have this in the hands of contributors.

structures, etc. Mostly implemented in C, with some rare bits of
assembly code in architecture-specific files. The "includes"
corresponding to the `.c` files are in the link:runtime/caml[]
subdirectory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this paragraph, it would be nice in the future to extend it to include other oddities (like the domain state generation and other generated things in the runtime code).

@abbysmal
Copy link
Contributor

abbysmal commented Mar 3, 2022

I am willing to approve this but I'm not sure what to take away from the conversation about the old unix way vs weird modern Linux distribution way of doing things.

On the other hand what is currently written seems already pretty sensible and I agree that having the document sit in this PR is not likely to help refining it further.
I will definitely take a stab at the rr part in the future.

I will take another look and approve it, maybe we should move some of the highlighted points here (rr, and weird modern mechanisms for debugging vs how it was) into a specific issue. (if it feels issue-worthy, if only for tracking purpose.)

@gasche
Copy link
Member Author

gasche commented Mar 3, 2022

Sounds very reasonable. They could also be added as TODOs inside the document directly. (If you like this idea and you volunteer, I'll give you write access to my repo so that you can push a commit doing just that.)

@gasche
Copy link
Member Author

gasche commented Apr 5, 2022

Any volunteers for an approval?

(Every time I remember this PR, I go back to it, realize that I could help by indeed adding more TODOs to the document, and then I switch to something else and I forget about it. I would rather let us collectively forget about it after it is merged somewhere long-term.)

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks! This is very useful.

HACKING.adoc Outdated Show resolved Hide resolved
@gasche
Copy link
Member Author

gasche commented Apr 5, 2022

Thanks! I rebased, and this should be good to get if/when the CI agrees.

@nojb nojb merged commit 6b7301b into ocaml:trunk Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants