Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: crashing journal due to watchdog #125

Closed
utezduyar opened this issue Jun 9, 2015 · 6 comments
Closed

regression: crashing journal due to watchdog #125

utezduyar opened this issue Jun 9, 2015 · 6 comments
Labels
Milestone

Comments

@utezduyar
Copy link
Contributor

The for (;;) loop in server_process_datagram might prevent journal
from feeding the watchdog if there is always something to receive in
the syslog socket. Potentially journald is restarted, applications
stall if the syslog socket is staying full....

I thought about fixing it by checking the watchdog on every iteration
of for (;;) by using watchdog_last, watchdog_period and feeding
watchdog if necessary but none of those properties are public.

Current rate limit check is done right before we store the message
(after we receive it, after we forward it to console, wall, kmsg). I
think it is too late.

Maybe the best approach is having a rate limit on sd-event
(sd-event-source) so we can map rate limit options in journald.conf to
journal's sd-event.

Thoughts?

PS: Moving the discussion off the mailing list to get it tracked.

@poettering
Copy link
Member

Wouldn't it suffice if we simply bump the priority of the watchdog event source to something very high, so that it is always dispatched before anything else?

@poettering
Copy link
Member

Hmm, ignore that, that made no sense, we always dispatch the watchdog event, in every single loop, if it's due, and we do so before anything else.

Not sure I grok the issue then. Can you elaborate how precisely this can happen?

@utezduyar
Copy link
Contributor Author

Imagine there is always something to fetch at this fd,

n = recvmsg(fd, &msghdr, MSG_DONTWAIT|MSG_CMSG_CLOEXEC);
.

In this case, the loop (

) will never quit. We will never return back to event processing where we feed the watchdog.

Guy with bad intention starts multiple processes, while (1) logger "hello" and it should be enough to clog the journal.

poettering added a commit to poettering/systemd that referenced this issue Jun 10, 2015
Otherwise, if the socket is constantly busy we will never return to the
event loop, but we really need to to dispatch other (possibly more
high-priority) events too. Hence, return after dispatching one message
to the event handler, and rely on the event loop calling us back
right-away.

Fixes systemd#125
@poettering
Copy link
Member

Can you check if PR #150 fixes the issue for you please, @utezduyar? Thank you!

@poettering poettering added bug 🐛 Programming errors, that need preferential fixing journal labels Jun 10, 2015
@poettering poettering added this to the v221 milestone Jun 10, 2015
@poettering
Copy link
Member

@utezduyar if this fixes the issue for you this can be promptly merged!

@poettering poettering removed the bug 🐛 Programming errors, that need preferential fixing label Jun 10, 2015
@utezduyar
Copy link
Contributor Author

@poettering Verified!

whot pushed a commit to whot/systemd that referenced this issue Oct 10, 2017
Otherwise, if the socket is constantly busy we will never return to the
event loop, but we really need to to dispatch other (possibly more
high-priority) events too. Hence, return after dispatching one message
to the event handler, and rely on the event loop calling us back
right-away.

Fixes systemd#125

Related: #1318994
Cherry-picked from: a315ac4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants