Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`systemctl` should not consider active->failed as a successful operation #6478

Open
sourcejedi opened this Issue Jul 28, 2017 · 6 comments

Comments

3 participants
@sourcejedi
Copy link
Contributor

sourcejedi commented Jul 28, 2017

Submission type

  • Bug report

systemd version the issue has been seen with

systemd-233-6.fc26.x86_64

Used distribution

Fedora 26

In case of bug report: Expected behaviour you didn't see

systemctl stop should show if my service transitions to state failed instead of inactive.

The documentation explicitly mentions services with ExecStop and no ExecStart. So we expect support for "services" that are implemented negatively, i.e. only affect the system when stopped. Similarly there's a pattern of one-shot service units where ExecStop is not limited to cleanup, e.g. systemd-backlight@.service "Load/Save backlight".

If saving the backlight fails when running systemctl stop, it is a failure which should be reported to the user. And for a service which only has ExecStop, it doesn't seem to make sense that you could never see an error from running systemctl start/stop.

Even if ExecStop is just doing some cleanup, failures are still interesting.

It's possible the ExecStop failure could be a factor in the failure of a service restart, but only the later error would be shown, which would be misleading. I'm not certain what the correct behaviour is in this case, however. I want restart to show the ExecStop error... but if it then goes ahead with running ExecStart anyway, we need a message to say that. Otherwise, if ExecStart succeeds, we will keep quiet in traditional Unix fashion, but the user will see the previous error message and assume the service was not restarted. If we could redefine restart so it doesn't proceed after ExecStop fails, that would also solve the problem, but that doesn't quite sound like a good idea to me.

In case of bug report: Unexpected behaviour you saw

systemctl stop returns EXIT_SUCCESS and shows no message, when my service enters state failed instead of inactive.

In case of bug report: Steps to reproduce the problem

# systemctl cat test.service
# /etc/systemd/system/test.service
[Service]
Type=oneshot
ExecStop=/bin/false
RemainAfterExit=yes

# systemctl start test.service
# echo $?
0
# systemctl stop test.service
# echo $?
0
systemctl status test.service
● test.service
   Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-07-28 14:23:10 BST; 5s ago
  Process: 11025 ExecStop=/bin/false (code=exited, status=1/FAILURE)

Jul 28 14:23:04 alan-laptop systemd[1]: Started test.service.
Jul 28 14:23:10 alan-laptop systemd[1]: Stopping test.service...
Jul 28 14:23:10 alan-laptop systemd[1]: test.service: Control process exited, code=exited status=1
Jul 28 14:23:10 alan-laptop systemd[1]: Stopped test.service.
Jul 28 14:23:10 alan-laptop systemd[1]: test.service: Unit entered failed state.
Jul 28 14:23:10 alan-laptop systemd[1]: test.service: Failed with result 'exit-code'.

# systemctl restart test.service
# echo $?
0

@poettering poettering added this to the v235 milestone Jul 28, 2017

@sourcejedi

This comment has been minimized.

Copy link
Contributor Author

sourcejedi commented Aug 12, 2017

I worked out more-or-less how to do this. Only then did I notice reload_result, which does exactly this for ExecReload=. Not sure yet whether I want to literally copy+paste that, but it looks promising.

@sourcejedi

This comment has been minimized.

Copy link
Contributor Author

sourcejedi commented Aug 12, 2017

Note: However it looks wrong that s->result is currently set to the same value as s->reload_result. It causes systemctl status to show active (exited) (Result: exit-code), and followed by a timestamp which applies to the active (exited) part but not the (Result: exit-code) part.

# systemctl cat test.service
# /etc/systemd/system/test.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true
ExecReload=/bin/false
# systemctl start test
# systemctl status test
● test.service
   Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: disabled)
   Active: active (exited) since Sat 2017-08-12 20:25:37 BST; 1s ago
  Process: 7769 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 7769 (code=exited, status=0/SUCCESS)

Aug 12 20:25:37 alan-laptop systemd[1]: Starting test.service...
Aug 12 20:25:37 alan-laptop systemd[1]: Started test.service.
# systemctl reload test
Job for test.service failed because the control process exited with error code.
See "systemctl  status test.service" and "journalctl  -xe" for details.
# systemctl status test
● test.service
   Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: disabled)
   Active: active (exited) (Result: exit-code) since Sat 2017-08-12 20:25:37 BST; 1min 28s ago
  Process: 8135 ExecReload=/bin/false (code=exited, status=1/FAILURE)
 Main PID: 7769 (code=exited, status=0/SUCCESS)

Aug 12 20:25:37 alan-laptop systemd[1]: Starting test.service...
Aug 12 20:25:37 alan-laptop systemd[1]: Started test.service.
Aug 12 20:27:04 alan-laptop systemd[1]: Reloading test.service.
Aug 12 20:27:04 alan-laptop systemd[1]: test.service: Control process exited, code=exited status=1
Aug 12 20:27:04 alan-laptop systemd[1]: Reload failed for test.service.
@sourcejedi

This comment has been minimized.

Copy link
Contributor Author

sourcejedi commented Aug 12, 2017

I wondered how systemctl is already smart enough to know "because the control process exited", as opposed the main process. The answer is it doesn't. It will tell you the control process failed, even if success would have shown it as the Main PID & no Control PID. The daemon code seems fairly clear about the distinction between control and main processes, apart from one confusing comment about setting s->main_command->exec_status.

@sourcejedi

This comment has been minimized.

Copy link
Contributor Author

sourcejedi commented Aug 13, 2017

wups, we have a whole state implemented for MOUNT_REMOUNTING_SIGTERM (and _SIGKILL after that), but it looks like we never transition into it. Timed out remounts are left to just hang around.

sourcejedi added a commit to sourcejedi/systemd that referenced this issue Aug 13, 2017

WIP stop failure
systemd#6478

create regression test from issue?

# systemctl cat test.service
# /etc/systemd/system/test.service
[Service]
Type=oneshot
ExecStop=/bin/false
RemainAfterExit=yes

# systemctl start test.service
# echo $?
0
# systemctl stop test.service
# echo $?
0
systemctl status test.service
● test.service
   Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-07-28 14:23:10 BST; 5s ago
  Process: 11025 ExecStop=/bin/false (code=exited, status=1/FAILURE)

Jul 28 14:23:04 alan-laptop systemd[1]: Started test.service.
Jul 28 14:23:10 alan-laptop systemd[1]: Stopping test.service...
Jul 28 14:23:10 alan-laptop systemd[1]: test.service: Control process exited, code=exited status=1
Jul 28 14:23:10 alan-laptop systemd[1]: Stopped test.service.
Jul 28 14:23:10 alan-laptop systemd[1]: test.service: Unit entered failed state.
Jul 28 14:23:10 alan-laptop systemd[1]: test.service: Failed with result 'exit-code'.

# systemctl restart test.service
# echo $?
0

sourcejedi added a commit to sourcejedi/systemd that referenced this issue Aug 16, 2017

core: fail stop job when ExecStop fails
This fixes the first part of systemd#6478.  It does not record or display failures
which occur during a restart.

This was less work than it perhaps looks at first.  I took the existing
`->reload_result`, mechanically created `->stop_result`, and then it passed
the test case from the issue.

Test cases are also included for ExecReload and ExecStart failure.

sourcejedi added a commit to sourcejedi/systemd that referenced this issue Aug 16, 2017

core: fail stop job when ExecStop fails
This fixes the first part of systemd#6478.  It does not display failures
which occur during a restart.

This was less work than it perhaps looks at first.  I took the existing
`->reload_result`, mechanically created `->stop_result`, and then it passed
the test case from the issue.

Test cases are also included for ExecReload and ExecStart failure.

sourcejedi added a commit to sourcejedi/systemd that referenced this issue Sep 1, 2017

Show failures which occur during `systemctl stop`
Fix systemd#6478.

Also failures during stop could be part of a stop+start job aka restart.
We show the same message for this case.  It is marked "Warning: ", to try
and show that it is not a fatal error, and we continued with the start.

If there are failures during both stop and start, the messages get a little
messy :(.  We add an extra message between the two failures.  Without it
the second failure message would appear to be extra details about the stop
failure.

# Conflicts:
#	test/TEST-03-JOBS/test-jobs.sh
@poettering

This comment has been minimized.

Copy link
Member

poettering commented Sep 29, 2017

There are two PRs in progress addressing this: #6623 and #6373.

@keszybz

This comment has been minimized.

Copy link
Member

keszybz commented Oct 5, 2017

Let's postpone this for now. It seems that we don't have a clear vision how this should be handled so let's not delay the release.

@keszybz keszybz modified the milestones: v235, v236 Oct 5, 2017

@poettering poettering removed this from the v236 milestone Nov 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.