Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-journal-upload dies due to watchdog #2010

Closed
DSpeichert opened this issue Nov 23, 2015 · 9 comments
Closed

systemd-journal-upload dies due to watchdog #2010

DSpeichert opened this issue Nov 23, 2015 · 9 comments
Labels
bug 🐛 Programming errors, that need preferential fixing journal-remote

Comments

@DSpeichert
Copy link

It seems like systemd-journal-upload is not properly notifying watchdog, which is configured in /lib/systemd/system/systemd-journal-upload.service:

WatchdogSec=3min
Nov 23 21:58:48 goweb1 systemd[1]: Started Journal Remote Upload Service.
Nov 23 22:01:48 goweb1 systemd[1]: systemd-journal-upload.service: Watchdog timeout (limit 3min)!
Nov 23 22:01:48 goweb1 systemd[1]: systemd-journal-upload.service: Main process exited, code=dumped, status=6/ABRT
Nov 23 22:01:48 goweb1 systemd[1]: systemd-journal-upload.service: Unit entered failed state.
Nov 23 22:01:48 goweb1 systemd[1]: systemd-journal-upload.service: Failed with result 'core-dump'.
Nov 23 22:57:35 goweb1 systemd[1]: Started Journal Remote Upload Service.
Nov 23 23:00:35 goweb1 systemd[1]: systemd-journal-upload.service: Watchdog timeout (limit 3min)!
Nov 23 23:00:35 goweb1 systemd[1]: systemd-journal-upload.service: Main process exited, code=dumped, status=6/ABRT
Nov 23 23:00:35 goweb1 systemd[1]: systemd-journal-upload.service: Unit entered failed state.
Nov 23 23:00:35 goweb1 systemd[1]: systemd-journal-upload.service: Failed with result 'core-dump'.
[root@goweb1 ~]# /lib/systemd/systemd-journal-upload --version
systemd 227
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN
@evverx
Copy link
Member

evverx commented Nov 24, 2015

@DSpeichert , could you attach the result of strace -s500 -p on the systemd-journal-upload pid?

@evverx
Copy link
Member

evverx commented Nov 25, 2015

Looks like a starvation of the watchdog timer.
Are there any timeouts here?

@heftig
Copy link
Contributor

heftig commented Nov 25, 2015

Sounds like journal-upload should be ported to curl_multi_perform in order to do non-blocking uploads. Or perhaps use curl_easy_* from another thread.

@DSpeichert
Copy link
Author

@evverx Emailed you the link to strace log.

@evverx
Copy link
Member

evverx commented Nov 26, 2015

@DSpeichert , thanks.

poll, write, poll, write, ..., poll, write, +++ killed by SIGABRT (core dumped) +++
It's a Curl_poll.
So, you have a starvation of the watchdog timer.

A temporary solution: increase your WatchdogSec (or remove it).

@DSpeichert
Copy link
Author

Same thing happens to systemd-journal-remote.service

@chaloulo
Copy link
Contributor

PR #2923 should solve this issue.

@poettering
Copy link
Member

Closing this as #2968 has been merged now.

@kaihendry
Copy link

This is still a problem for me on v230
http://s.natalian.org.s3.amazonaws.com/2016-06-28/systemd-journal-remote.export

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing journal-remote
Development

No branches or pull requests

7 participants