Use faulthandler instead of isys signal handlers #4350

VladimirSlavik · 2022-09-20T14:55:22Z

This significantly changes outputs, but provides eventually the same kinds of information, only in different order.

Previously:

Print that a signal was received, and which one.
Log to syslog that this happened.
Print userland stack trace of main process.
Save core dump:
- always to /tmp/anaconda.core.
- always, independent of system settings
- somehow very slowly - maybe forking python and running gcore isn't the most performant idea
- file size ca. 1 GB
Journal gets only the syslog message.

Now:

Print that a fatal error happened, no mention of signals.
No unique message in syslog and journal to identify that "anaconda" crashed.
Print python stack of all the threads in the main process.
Save core dump:
- to default location
- according to system settings
- needs invoking coredumpctl manually to work with the actual core dump
- blazing fast saving compared to previous state
- size ca. 430 MB, initally stored compressed to ca. 60 MB
- apparently requires more debuginfo downloads, system can run out of space or memory
Print userland stack traces of all threads to journal.

In both cases, using the (coredumpctl+)gdb+debuginfod+dnf toolchain leads to a successful interactive debugging session.

VladimirSlavik · 2022-09-20T15:16:06Z

/kickstart-test --testtype smoke

rvykydal

Looks good to me, thank you!
I wonder if we need to add / update strings in log monitor of kickstart tests. May be worth checking, but not blocking the PR.

VladimirSlavik · 2022-09-21T06:57:51Z

Certainly, and livemedia-creator too, and who knows how many other places.

VladimirSlavik · 2022-09-21T11:25:32Z

A few more details:

Looks like we might have to watch for a line like "Process [0-9]+ (anaconda) of user [0-9]+ dumped core" - not sure where: kickstart tests, lorax? Fortunately the logging "filter" is mostly shared.
The signal is mentioned in coredumpctl listing (by name instead of number).
The "good enough" userland stack traces (same as before) are visible with coredumpctl info (...) even without going into gdb and loading all the debuginfos.
coredumpctl info without query gives the details of the last recorded core dump, which should be enough in most cases.

I will amend the commit message with that.

This changes how the core dump support works, as well as the intermediate outputs. However, it provides eventually the same kinds of information, only in different places and order. Previously: - Print that a signal was received, and which one. - Log to syslog that this happened. - Print userland stack trace of main process main thread. - Save core dump: - always to /tmp/anaconda.core.<PID> - always, independent of system settings - somehow very slowly - maybe forking python and running gcore isn't the most performant idea - file size ca. 1 GB - Journal gets only the syslog message. Now: - Print that a fatal error happened, no mention of signals. - No unique message in syslog to identify that "anaconda" crashed. - Print python stack of all the threads in the main process. - Save core dump: - to default location - according to system settings - needs invoking coredumpctl manually to work with the actual core dump - blazing fast saving compared to previous state - initally stored compressed to ca. 60 MB, exports to ca. 430 MB - apparently requires more debuginfo downloads for loading in gdb, system can run out of space or memory - Journal gets more: - unique message that "Process <PID> (anaconda) of user <UID> dumped core." - userland stack traces of *all* threads - To analyze, use coredumpctl: - `coredumpctl` with no arguments lists the PID, signal, and executable - `coredumpctl info` (with no query) shows metadata and the same information as journal, in pager In both cases, using the (coredumpctl+)gdb+debuginfod+dnf toolchain leads to a successful interactive debugging session.

VladimirSlavik · 2022-09-21T11:37:07Z

Based on the above, I think we can merge this and not lose anything.

VladimirSlavik · 2022-09-21T11:37:23Z

/kickstart-test --testtype smoke

poncovka

Thank you!

VladimirSlavik · 2022-09-26T13:01:14Z

Looks like we might have to watch for a line like "Process [0-9]+ (anaconda) of user [0-9]+ dumped core" - not sure where: kickstart tests, lorax?

@bcl just a heads up that this might change how fatal errors look...

bcl · 2022-09-26T16:12:03Z

The code that lmc uses is here - https://github.com/weldr/lorax/blob/f33-branch/src/pylorax/monitor.py#L38

So as long as the new output has either 'Traceback' or 'Call Trace:' in it lmc will catch it.

VladimirSlavik · 2022-09-27T11:24:12Z

Unfortunately that's none of that.

In shell: Fatal Python error: (...)
In syslog: CRIT systemd-coredumpctl:Process [0-9]+ (anaconda) of user [0-9]+ dumped core.
In journal: systemd-coredumpctl\[[0-9]+\]: \[■\] Process [0-9]+ \(anaconda\) of user [0-9]+ dumped core.

I'll make a PR.

VladimirSlavik added the f38 Fedora 38 label Sep 20, 2022

VladimirSlavik force-pushed the master-isys-to-faulthandler branch from 31d64a2 to 593dde4 Compare September 20, 2022 14:58

rvykydal approved these changes Sep 21, 2022

View reviewed changes

VladimirSlavik force-pushed the master-isys-to-faulthandler branch from 593dde4 to 1911c78 Compare September 21, 2022 11:36

VladimirSlavik added the release note required Write a release note for this change. label Sep 26, 2022

poncovka approved these changes Sep 26, 2022

View reviewed changes

VladimirSlavik merged commit 49a2d6d into rhinstaller:master Sep 26, 2022

VladimirSlavik deleted the master-isys-to-faulthandler branch September 26, 2022 12:59

VladimirSlavik mentioned this pull request Sep 27, 2022

Update anaconda's crash messages to watch weldr/lorax#1268

Merged

VladimirSlavik removed the release note required Write a release note for this change. label Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use faulthandler instead of isys signal handlers #4350

Use faulthandler instead of isys signal handlers #4350

VladimirSlavik commented Sep 20, 2022

VladimirSlavik commented Sep 20, 2022

rvykydal left a comment

VladimirSlavik commented Sep 21, 2022

VladimirSlavik commented Sep 21, 2022 •

edited

Loading

VladimirSlavik commented Sep 21, 2022

VladimirSlavik commented Sep 21, 2022

poncovka left a comment

VladimirSlavik commented Sep 26, 2022

bcl commented Sep 26, 2022

VladimirSlavik commented Sep 27, 2022 •

edited

Loading

Use faulthandler instead of isys signal handlers #4350

Use faulthandler instead of isys signal handlers #4350

Conversation

VladimirSlavik commented Sep 20, 2022

VladimirSlavik commented Sep 20, 2022

rvykydal left a comment

Choose a reason for hiding this comment

VladimirSlavik commented Sep 21, 2022

VladimirSlavik commented Sep 21, 2022 • edited Loading

VladimirSlavik commented Sep 21, 2022

VladimirSlavik commented Sep 21, 2022

poncovka left a comment

Choose a reason for hiding this comment

VladimirSlavik commented Sep 26, 2022

bcl commented Sep 26, 2022

VladimirSlavik commented Sep 27, 2022 • edited Loading

VladimirSlavik commented Sep 21, 2022 •

edited

Loading

VladimirSlavik commented Sep 27, 2022 •

edited

Loading