Skip to content

Systemd fails to restart when ExecStart script fails after sending notification and before ExecStartPost script ends #8929

@ritz123

Description

@ritz123

Submission type

  • Bug report

systemd version the issue has been seen with

v237 and v233

Used distribution

Custom Distribution with Linux 4.9.87 x86_64 GNU/Linux

In case of bug report: Expected behaviour you didn't see

Systemd should restart a failed service

In case of bug report: Unexpected behaviour you saw

Systemd failed to restart a failed service

In case of bug report: Steps to reproduce the problem

Using the following service file start a service. Observe the service's log in journal files. One can observe that the even after the ExecStart script fails, systemd will not try to restart the service.

To ensure proper timing when the condition can be seen, I have used sleep in the scripts. The Main script should send the notification to systemd and die with a failure before the StartPost script ends. This is the precise condition when the problem is observed.

===========  test.service ========
[Unit]
Description=Test Service

[Service]
Type=notify
NotifyAccess=all
StandardOutput=journal
ExecStart=/start.sh
ExecStartPost=/startpost.sh
ExecStopPost=/stoppost.sh
TimeoutStartSec=0
RemainAfterExit=yes
Restart=on-failure
StartLimitInterval=0
StartLimitBurst=0

[Install]
# cat /start.sh 
#!/bin/bash
systemd-notify --ready --status="sleeping"
sleep 3s
exit 1

# cat /startpost.sh 
#!/bin/bash
sleep 9s
exit 0

# cat /stoppost.sh
#!/bin/bash
exit 0

An illustration of the different scripts' running time

# job's timing

|=^=======3s==========|(start.sh - exit 1)
  | (notify)
      |====================9s===========|(startpost.sh - exit 0)

Sample output

# systemctl status test
● test.service - Ram Service
   Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: enabl
   Active: active (exited) (Result: exit-code) since Tue 2018-05-08 14:55:00 IST
   Process: 1935 ExecStartPost=/startpost.sh (code=exited, status=0/SUCCESS)
   Process: 1931 ExecStart=/start.sh (code=exited, status=1/FAILURE)
   Main PID: 1931 (code=exited, status=1/FAILURE)
   Status: "sleeping"
   CPU: 4ms

Systemd logs

May 08 14:54:54 host-0 systemd[1]: test.service: Child 1931 belongs to test.service
May 08 14:54:54 host-0 systemd[1]: test.service: Main process exited, code=exited,
May 08 14:55:00 host-0 systemd[1]: test.service: Child 1935 belongs to test.service
May 08 14:55:00 host-0 systemd[1]: test.service: Control process exited, code=exit
May 08 14:55:00 host-0 systemd[1]: test.service: Got final SIGCHLD for state start
May 08 14:55:00 host-0 systemd[1]: test.service: Changed start-post -> exited
May 08 14:55:00 host-0 systemd[1]: test.service: Job test.service/start finished, r
May 08 14:55:00 host-0 systemd[1]: Started Test Service.
May 08 14:55:00 host-0 systemd[1]: test.service: cgroup is empty
May 08 14:55:00 host-0 systemd[1]: test.service: Failed to send unit change signal

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions