regression: crashing journal due to watchdog #125

utezduyar · 2015-06-09T20:10:14Z

The for (;;) loop in server_process_datagram might prevent journal
from feeding the watchdog if there is always something to receive in
the syslog socket. Potentially journald is restarted, applications
stall if the syslog socket is staying full....

I thought about fixing it by checking the watchdog on every iteration
of for (;;) by using watchdog_last, watchdog_period and feeding
watchdog if necessary but none of those properties are public.

Current rate limit check is done right before we store the message
(after we receive it, after we forward it to console, wall, kmsg). I
think it is too late.

Maybe the best approach is having a rate limit on sd-event
(sd-event-source) so we can map rate limit options in journald.conf to
journal's sd-event.

Thoughts?

PS: Moving the discussion off the mailing list to get it tracked.

poettering · 2015-06-09T22:42:16Z

Wouldn't it suffice if we simply bump the priority of the watchdog event source to something very high, so that it is always dispatched before anything else?

poettering · 2015-06-09T22:45:51Z

Hmm, ignore that, that made no sense, we always dispatch the watchdog event, in every single loop, if it's due, and we do so before anything else.

Not sure I grok the issue then. Can you elaborate how precisely this can happen?

utezduyar · 2015-06-10T06:26:36Z

Imagine there is always something to fetch at this fd,

systemd/src/journal/journald-server.c

Line 1171 in cde40ac

n = recvmsg(fd, &msghdr, MSG_DONTWAIT|MSG_CMSG_CLOEXEC);

.

In this case, the loop (

systemd/src/journal/journald-server.c

Line 1116 in cde40ac

for (;;) {

) will never quit. We will never return back to event processing where we feed the watchdog.

Guy with bad intention starts multiple processes, while (1) logger "hello" and it should be enough to clog the journal.

Otherwise, if the socket is constantly busy we will never return to the event loop, but we really need to to dispatch other (possibly more high-priority) events too. Hence, return after dispatching one message to the event handler, and rely on the event loop calling us back right-away. Fixes systemd#125

poettering · 2015-06-10T18:12:27Z

Can you check if PR #150 fixes the issue for you please, @utezduyar? Thank you!

poettering · 2015-06-10T18:40:54Z

@utezduyar if this fixes the issue for you this can be promptly merged!

utezduyar · 2015-06-11T08:53:32Z

@poettering Verified!

Otherwise, if the socket is constantly busy we will never return to the event loop, but we really need to to dispatch other (possibly more high-priority) events too. Hence, return after dispatching one message to the event handler, and rely on the event loop calling us back right-away. Fixes systemd#125 Related: #1318994 Cherry-picked from: a315ac4

poettering mentioned this issue Jun 10, 2015

journald: don't employ inner loop for reading from incoming sockets #150

Merged

poettering added bug 🐛 Programming errors, that need preferential fixing journal labels Jun 10, 2015

poettering added this to the v221 milestone Jun 10, 2015

poettering removed the bug 🐛 Programming errors, that need preferential fixing label Jun 10, 2015

poettering closed this as completed in #150 Jun 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression: crashing journal due to watchdog #125

regression: crashing journal due to watchdog #125

utezduyar commented Jun 9, 2015

poettering commented Jun 9, 2015

poettering commented Jun 9, 2015

utezduyar commented Jun 10, 2015

poettering commented Jun 10, 2015

poettering commented Jun 10, 2015

utezduyar commented Jun 11, 2015

regression: crashing journal due to watchdog #125

regression: crashing journal due to watchdog #125

Comments

utezduyar commented Jun 9, 2015

poettering commented Jun 9, 2015

poettering commented Jun 9, 2015

utezduyar commented Jun 10, 2015

poettering commented Jun 10, 2015

poettering commented Jun 10, 2015

utezduyar commented Jun 11, 2015