Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weird behaviour using Restart=on-watchdog #1970

Closed
aborrero opened this issue Nov 20, 2015 · 1 comment
Closed

weird behaviour using Restart=on-watchdog #1970

aborrero opened this issue Nov 20, 2015 · 1 comment

Comments

@aborrero
Copy link

I've noticed some inconsistent behaviour when a service is using
Restart=on-watchdog.

TL;DR: systemd restart the daemon but no more than 3 times.

The behaviour described below is unexpected and I was unable to find
any documentation about it.

[Unit]
Description=Conntrack Daemon

[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/usr/share/conntrackd/environmentfile
EnvironmentFile=-/etc/default/conntrackd
ExecStartPre=-/bin/rm -f /var/lock/conntrackd.lock
ExecStart=/usr/sbin/conntrackd -d -C "$CONFIG" $OPTIONS
ProtectSystem=full
ProtectHome=true
WatchdogSec=10
Restart=on-watchdog

[Install]
WantedBy=multi-user.target

This is how I trigger this situation.

Get more verbose information:
% sudo systemd-analyze set-log-level debug

I start the service by:
% sudo systemctl start conntrackd.service

At some point, I will manually stop the daemon:
% sudo kill -SIGSTOP <pid>

nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Trying to enqueue job conntrackd.service/start/replace
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Installed new job conntrackd.service/start as 2065
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Enqueued job conntrackd.service/start as 2065
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: About to execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 7847
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Changed failed -> start-pre
nov 20 12:15:12 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:15:12 cfdev02 systemd[7847]: conntrackd.service: Executing: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Child 7847 belongs to conntrackd.service
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Control process exited, code=exited status=0
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Got final SIGCHLD for state start-pre.
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: About to execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Forked /usr/sbin/conntrackd as 7850
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Changed start-pre -> start
nov 20 12:15:12 cfdev02 systemd[7850]: conntrackd.service: Executing: /usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: disabling internal cache
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: disabling external cache
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: using kernel-space event filtering
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: netlink event socket buffer size has been set to 8000000 bytes
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: initialization completed
nov 20 12:15:12 cfdev02 conntrack-tools[7875]: -- starting in daemon mode --
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7850 (MAINPID=7875)
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Supervising process 7875 which is not our child. We'll most likely not notice when it exits.
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7875 (READY=1)
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Job conntrackd.service/start finished, result=done
nov 20 12:15:12 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:15:17 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7875 (WATCHDOG=1)
nov 20 12:15:19 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7875 (WATCHDOG=1)
nov 20 12:15:24 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7875 (WATCHDOG=1)

[ 1) send SIGSTOP to the daemon ]

nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Watchdog timeout (limit 10s)!
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed running -> failed
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Unit entered failed state.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Failed with result 'watchdog'.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed failed -> auto-restart
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: cgroup is empty
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Service hold-off time over, scheduling restart.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Trying to enqueue job conntrackd.service/restart/fail
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Installed new job conntrackd.service/restart as 2172
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Enqueued job conntrackd.service/restart as 2172
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Scheduled restart job.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed auto-restart -> dead
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Job conntrackd.service/restart finished, result=done
nov 20 12:15:34 cfdev02 systemd[1]: Stopped Conntrack Daemon.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Converting job conntrackd.service/restart -> conntrackd.service/start
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: About to execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 7931
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed dead -> start-pre
nov 20 12:15:34 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:15:34 cfdev02 systemd[7931]: conntrackd.service: Executing: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Child 7931 belongs to conntrackd.service
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Control process exited, code=exited status=0
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Got final SIGCHLD for state start-pre.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: About to execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Forked /usr/sbin/conntrackd as 7933
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed start-pre -> start
nov 20 12:15:34 cfdev02 systemd[7933]: conntrackd.service: Executing: /usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:15:34 cfdev02 conntrack-tools[7933]: disabling internal cache
nov 20 12:15:34 cfdev02 conntrack-tools[7933]: disabling external cache
nov 20 12:15:34 cfdev02 conntrack-tools[7933]: using kernel-space event filtering
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7933 (MAINPID=7959)
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Supervising process 7959 which is not our child. We'll most likely not notice when it exits.
nov 20 12:15:34 cfdev02 conntrack-tools[7959]: -- starting in daemon mode --
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7959 (READY=1)
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Job conntrackd.service/start finished, result=done
nov 20 12:15:34 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:15:38 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7959 (WATCHDOG=1)
nov 20 12:15:40 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7959 (WATCHDOG=1)

[ 2) send SIGSTOP to the daemon ]

nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Watchdog timeout (limit 10s)!
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed running -> failed
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Unit entered failed state.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Failed with result 'watchdog'.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed failed -> auto-restart
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: cgroup is empty
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Service hold-off time over, scheduling restart.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Trying to enqueue job conntrackd.service/restart/fail
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Installed new job conntrackd.service/restart as 2279
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Enqueued job conntrackd.service/restart as 2279
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Scheduled restart job.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed auto-restart -> dead
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Job conntrackd.service/restart finished, result=done
nov 20 12:15:50 cfdev02 systemd[1]: Stopped Conntrack Daemon.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Converting job conntrackd.service/restart -> conntrackd.service/start
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: About to execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 7989
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed dead -> start-pre
nov 20 12:15:50 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:15:50 cfdev02 systemd[7989]: conntrackd.service: Executing: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Child 7989 belongs to conntrackd.service
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Control process exited, code=exited status=0
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Got final SIGCHLD for state start-pre.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: About to execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Forked /usr/sbin/conntrackd as 7991
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed start-pre -> start
nov 20 12:15:50 cfdev02 systemd[7991]: conntrackd.service: Executing: /usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: disabling internal cache
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: disabling external cache
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: using kernel-space event filtering
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: netlink event socket buffer size has been set to 8000000 bytes
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: initialization completed
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 7991 (MAINPID=8017)
nov 20 12:15:50 cfdev02 conntrack-tools[8017]: -- starting in daemon mode --
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Supervising process 8017 which is not our child. We'll most likely not notice when it exits.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 8017 (READY=1)
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Job conntrackd.service/start finished, result=done
nov 20 12:15:50 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:15:55 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 8017 (WATCHDOG=1)
nov 20 12:16:00 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 8017 (WATCHDOG=1)

[ 3) send SIGSTOP ]

nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Watchdog timeout (limit 10s)!
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Changed running -> failed
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Unit entered failed state.
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Failed with result 'watchdog'.
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Changed failed -> auto-restart
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: cgroup is empty
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Service hold-off time over, scheduling restart.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Trying to enqueue job conntrackd.service/restart/fail
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Installed new job conntrackd.service/restart as 2386
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Enqueued job conntrackd.service/restart as 2386
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Scheduled restart job.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed auto-restart -> dead
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Job conntrackd.service/restart finished, result=done
nov 20 12:16:11 cfdev02 systemd[1]: Stopped Conntrack Daemon.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Converting job conntrackd.service/restart -> conntrackd.service/start
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: About to execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 8047
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed dead -> start-pre
nov 20 12:16:11 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:16:11 cfdev02 systemd[8047]: conntrackd.service: Executing: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Child 8047 belongs to conntrackd.service
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Control process exited, code=exited status=0
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Got final SIGCHLD for state start-pre.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: About to execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Forked /usr/sbin/conntrackd as 8049
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed start-pre -> start
nov 20 12:16:11 cfdev02 systemd[8049]: conntrackd.service: Executing: /usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:16:11 cfdev02 conntrack-tools[8049]: disabling internal cache
nov 20 12:16:11 cfdev02 conntrack-tools[8049]: disabling external cache
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 8049 (MAINPID=8075)
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 8075 (READY=1)
nov 20 12:16:11 cfdev02 conntrack-tools[8075]: -- starting in daemon mode --
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Job conntrackd.service/start finished, result=done
nov 20 12:16:11 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:16:16 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 8075 (WATCHDOG=1)
nov 20 12:16:21 cfdev02 systemd[1]: conntrackd.service: Got notification message from PID 8075 (WATCHDOG=1)

[ 4) send SIGSTOP ]

nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Watchdog timeout (limit 10s)!
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Changed running -> stop-sigabrt
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Child 8075 belongs to conntrackd.service
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Main process exited, code=killed, status=6/ABRT
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Changed stop-sigabrt -> failed
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Unit entered failed state.
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Failed with result 'signal'.
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: cgroup is empty

As you can see, after this SIGSTOP, somehow the service enters stop-sigabrt state instead of failed with reason watchdog.

@aborrero
Copy link
Author

Ok, from IRC:

<sztanpet> aborrero see http://www.freedesktop.org/software/systemd/man/systemd.service.html#StartLimitInterval=
<sztanpet> or in other words, its working as expected

So, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant