Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable #3818

Merged
merged 9 commits into from
Aug 6, 2016

Conversation

poettering
Copy link
Member

@poettering poettering commented Jul 27, 2016

This beefs up three areas:

  • The code handling /tmp and/var/tmp discovery:
  • nss-systemd will now resolve the "root" and "nobody" users if /etc/passwd doesn't list them or the file is missing
  • We'll now set the EXIT_CODE, EXIT_STATUS and SERVICE_RESULT env vars in ExecStop= and ExecStopPost=, so that monitoring tools can easily make use of this information when invoked from these stanzas, without having to go via the bus.

@poettering poettering changed the title beef up /var/tmp and /tmp handline; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable Jul 27, 2016

*ret = "/tmp";
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this asks for a helper function which takes either "/tmp" or "/var/tmp" as parameter, and tmp_dir() and var_tmp_dir() should be thin wrappers around that.

@keszybz keszybz added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Aug 3, 2016
@keszybz
Copy link
Member

keszybz commented Aug 3, 2016

Looks nice. Only a few minor comments and questions.

@poettering
Copy link
Member Author

Force pushed a new version now that makes all suggested changes, except the "dumped" rename, due to the mentioned reasons. Please have another look and merge!

@poettering poettering removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Aug 3, 2016
is one of <literal>exited</literal>, <literal>killed</literal>,
<literal>dumped</literal>. <varname>$EXIT_STATUS</varname> contains the numeric exit code formatted as string
if <varname>$EXIT_CODE</varname> is <literal>exited</literal>, and the signal name in all other cases. Note
that these environment variables are only set if the service manager succeeded to start and identify the main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only set if the service manager succeeded to start

I think we should mention (explicitly) that this doesn't work for Type=oneshot

https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStop=

Note that the commands specified in ExecStop= are only executed when the service started successfully first.

-bash-4.3# systemctl cat self-abort --no-pager
# /etc/systemd/system/self-abort.service
[Service]
Type=oneshot
ExecStart=/bin/sh -x -c 'kill -ABRT $$$$'
ExecStop=/bin/sh -x -c 'set'

-bash-4.3# systemctl start self-abort
Job for self-abort.service failed because a fatal signal was delivered causing the control process to dump core.
See "systemctl status self-abort.service" and "journalctl -xe" for details.

-bash-4.3# journalctl -u self-abort --no-pager -o cat
self-abort.service: Collecting.
self-abort.service: Collecting.
self-abort.service: Collecting.
self-abort.service: Collecting.
self-abort.service: Trying to enqueue job self-abort.service/start/replace
self-abort.service: Installed new job self-abort.service/start as 380
self-abort.service: Enqueued job self-abort.service/start as 380
self-abort.service: About to execute: /bin/sh -x -c 'kill -ABRT $$$$'
self-abort.service: Forked /bin/sh as 298
self-abort.service: Changed dead -> start
Starting self-abort.service...
+ kill -ABRT 298
self-abort.service: Executing: /bin/sh -x -c 'kill -ABRT $$'
self-abort.service: Child 298 belongs to self-abort.service
self-abort.service: Main process exited, code=dumped, status=6/ABRT
self-abort.service: Changed start -> failed
self-abort.service: Job self-abort.service/start finished, result=failed
Failed to start self-abort.service.
self-abort.service: Unit entered failed state.
self-abort.service: Failed with result 'core-dump'.
self-abort.service: cgroup is empty

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXIT_CODE/EXIT_STATUS do work for ExecStop= as well as ExecStopPost=, but of course by default we don't invoke ExecStop= if ExecStart= failed to start if Type=oneshot is used, except if people used SuccessExitStatus= to mark the a failure as a non-failure...

But yeah, this deserves documentaiton.

Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).

Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.

This also ports a couple of users over to these new APIs.

The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.

Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
The ExecParameters structure contains a number of bit-flags, that were so far
exposed as bool:1, change this to a proper, single binary bit flag field. This
makes things a bit more expressive, and is helpful as we add more flags, since
these booleans are passed around in various callers, for example
service_spawn(), whose signature can be made much shorter now.

Not all bit booleans from ExecParameters are moved into the flags field for
now, but this can be added later.
Let's fix up the flags fields in service_spawn() rather than its callers, in
order to simplify things a bit.
…xecStopPost= commands

This should simplify monitoring tools for services, by passing the most basic
information about service result/exit information via environment variables,
thus making it unnecessary to retrieve them explicitly via the bus.
Let's extend nss-systemd to also synthesize user/group entries for the
UIDs/GIDs 0 and 65534 which have special kernel meaning. Given that nss-systemd
is listed in /etc/nsswitch.conf only very late any explicit listing in
/etc/passwd or /etc/group takes precedence.

This functionality is useful in minimal container-like setups that lack
/etc/passwd files (or only have incompletely populated ones).
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.

With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.

This addresses the issue pointed out here:

systemd#3818 (comment)
@poettering
Copy link
Member Author

I force pushed a new version now, that addresses @evverx's points. Specifically I added one patch that makes sure the first failure is reported as service result, and no longer the last failure. This has the benefit that a watchdog timeout will result in a correctly reported "watchdog" result code, instead of a "core-dump". I also updated the man page, and dropped the "start-limit" reference entirely, as this can never actually be set for ExecStopPost= processes. I also added some wording clarifying that the result is primarily useful in ExecStopPost=, and not so much in ExecStop=, as the former is called for services that failed during start or runtime, and the latter only for those which failed during runtime. This distinction is relevant, as Type=oneshot/RemainAfterExit=no units have no runtime, and hence ExecStop= is useless, and hence not appropriate for checking $SERVICE_RESULT. Note that I don't explicitly explain the Type=oneshot/RemainAfterExit=no case, as that appeared to specific to me. Instead it says now that ExecStopPost= is more useful because it covers both runtime and startup errors, without going into details that Type=oneshot have no runtime

@poettering
Copy link
Member Author

@evverx and thanks again for the thorough review! Much appreciated!

@evverx
Copy link
Member

evverx commented Aug 5, 2016

@poettering , thanks! I'll take a look tomorrow.

@keszybz
Copy link
Member

keszybz commented Aug 6, 2016

Looks all good. I tested the status settings in a few different ways, and the result is always as expected.

@keszybz keszybz merged commit 3bb81a8 into systemd:master Aug 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants