Make unit failure verbose #4367

utezduyar · 2016-10-13T12:20:25Z

Something entering failed state should be more visible
in the system logs. Therefore, increase the log level
to at least warning.

Something entering failed state should be more visible in the system logs. Therefore, increase the log level to at least warning.

keszybz · 2016-10-13T16:29:53Z

I don't think this is warranted. By default logs have level info, so this already is in the logs. Most things log their own failures are error level, and adding an additional warning from systemd would just increase the amount of high-priority messages.

Do you have a specific case where this makes a positive difference?

utezduyar · 2016-10-13T18:45:12Z

Not if the failure is due to a signal. Example following:

2016-09-09T19:22:53.681+10:00 [ NOTICE ] systemd[1]: duolith.service: Main process exited, code=killed, status=10/BUS
2016-09-09T19:22:54.284+10:00 [ NOTICE ] systemd[1]: duolith.service: Unit entered failed state.
2016-09-09T19:22:54.492+10:00 [ INFO ] systemd[1]: duolith.service: Service hold-off time over, scheduling restart.

Here we couldn't catch that something was wrong as long as we filter things with ERROR level.

keszybz · 2016-10-13T23:36:30Z

So yeah, if the process dies because of a signal, and the signal is of the wrong kind (Lennart recently added a function to classify signals, that takes into account whether the processes is oneshot or not), we should bump the log level here to ERROR. It is expected that in those cases the processes did not log anything. But for the other cases, where the processes died "normally", I think the current level of notice is OK.

When this is done, as a second step, it'd be nice to decrease the log level from systemd-coredump. It currently logs the backtrace at ERROR level, but it really doesn't have to, especially considering that the bt can be rather long.

utezduyar · 2016-10-17T05:58:26Z

Don't you think it will create confusion that sometimes we see messages in the higher severity, sometimes lower. This would be hard for someone filtering the logs I believe especially if you don't even collect the logs that are less important than warning due to flash wear out.

I also believe that as a system manager, we shouldn't rely on application informing us about their abnormal exit. I understand your argument though that higher priority messages will increase in this patch's case which is warning level.

keszybz · 2016-10-17T06:25:13Z

Don't you think it will create confusion that sometimes we see messages in the higher severity, sometimes lower.

We do that quite often... most log_fulls in the source tree are for that.

if you don't even collect the logs that are less important than warning due to flash wear out.

True. But OTOH, you could say that if we keep the number of messages at high levels down, the most important ones will stand out. In my mind, the primary use case for systemd is linux distros, where you expect failing applications to log their own errors. And if they don't, fix them. I agree that it's better to have an error from systemd than nothing, but systemd has limited knowledge, and the application can usually provide a much more useful message, so this should be preferred.

As suggested in systemd#4367 (comment)

poettering · 2016-10-19T22:34:26Z

I have implemented @keszybz's suggestion to upgrade the log level when a process dies due to signal now. Let's continue discussion on the new PR I filed about this: #4415. Closing this one in favour of that.

…al (#4415) As suggested in #4367 (comment)

Make unit failure verbose

1cb413a

Something entering failed state should be more visible in the system logs. Therefore, increase the log level to at least warning.

poettering added a commit to poettering/systemd that referenced this pull request Oct 19, 2016

core: let's upgrade the log level for service processes dying of signal

c175022

As suggested in systemd#4367 (comment)

poettering mentioned this pull request Oct 19, 2016

core: let's upgrade the log level for service processes dying of signal #4415

Merged

poettering closed this Oct 19, 2016

poettering added the pid1 label Oct 19, 2016

keszybz pushed a commit that referenced this pull request Oct 19, 2016

core: let's upgrade the log level for service processes dying of sign…

5368222

…al (#4415) As suggested in #4367 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make unit failure verbose #4367

Make unit failure verbose #4367

utezduyar commented Oct 13, 2016

keszybz commented Oct 13, 2016

utezduyar commented Oct 13, 2016

keszybz commented Oct 13, 2016

utezduyar commented Oct 17, 2016

keszybz commented Oct 17, 2016

poettering commented Oct 19, 2016

Make unit failure verbose #4367

Make unit failure verbose #4367

Conversation

utezduyar commented Oct 13, 2016

keszybz commented Oct 13, 2016

utezduyar commented Oct 13, 2016

keszybz commented Oct 13, 2016

utezduyar commented Oct 17, 2016

keszybz commented Oct 17, 2016

poettering commented Oct 19, 2016