Skip to content

Conversation

poettering
Copy link
Member

This adds two:

  1. if memfds are available we'll serialize system state into a memfd instead of a file in /run/systemd now, making us less dependent on /run being usable and not full.
  2. before doing the serialization we check if /run is full, and if it is we refuse the reload with a clear error message, explaining why.

Why do 2 if 1 removes the problem? Well, that's because it actually doesn't while we might be successful with the actual serialization/deserialization as long as we can use a memfd, it's really not sufficient to just deal with that. That's because our start-up process requires write access to /run that works, in order to create sockets, write runtime files, and most importanly provide a place where generators can put their stuff.

@poettering poettering added the pid1 label Feb 3, 2017
@poettering poettering changed the title Run size check before reloading, check that /run/systemd has enough space Feb 3, 2017
Copy link
Contributor

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, thanks!

m->exit_code = MANAGER_REEXECUTE;
r = verify_run_space_and_log("Refusing to reexecute");
if (r >= 0)
m->exit_code = MANAGER_REEXECUTE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not doing anything here on ENOSPC means that the error is just ignored (other than logging), as manager_dispatch_signal_fd() then returns 0. This sounds plausible indeed, I just wanted to make sure it DTRT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's little else we can do, is there? signals are async after all, besides logging we have no backchannel to the user or sender of the signal...

@@ -48,6 +50,10 @@
#include "virt.h"
#include "watchdog.h"

/* Require 16MiB free in /run/systemd for reloading/reexecing. After all we need to serialize our state there, and if
* we can't we'll fail badly. */
#define RELOAD_DISK_SPACE_MIN (UINT64_C(16) * UINT64_C(1024) * UINT64_C(1024))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters in any way, but for the learning exercise: do the casts make any practical difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, they aren't casts, and yes, they do make a difference here...

They do two things: they suffix the numbers with "U" and with "LL". The "U" has the effect that the numeric constant is unsigned. That's a good thing because otherwise gcc with the fascist compiler options we use would complain about comparing an unsigned with a signed if we compare the value with disk sizes (and disk sizes are usually unsigned). The "LL" actually makes it possible to use large values above 32bit here... If we don't, then the compiler might use something smaller here at its discretion (and in fact, gcc will)...

The suffixes are actually weird, because unlike the name suggests UINT64_C doesn't actually result in something of type uint64_t. It just means that it adds a suffix sufficient to make constant compatible with that type...

We could also write this as 16U1024U1024U btw. On systems where "int"s are 32bit, that's sufficient. However, I actually want to clarify here that disk sizes are stored in uint64_t, and hence I am a bit more explicit here. Also, while developing this i changed this for testing purposes to 16U1024U1024U*1024U (i.e. 16GiB instea dof 16MiB, to ensure the limit is hit on my testin machine) and in that case, if you just use U then things blow up in your face as the resulting product cannot be stored in a 32bit integer anymore... Hence you have to use ULL int that case, and that's precisely what UINT64_C resolves to, but in a way that doesn't encode the lengths of a long or long long...

Yeah, C is weird...

@martinpitt martinpitt added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Feb 3, 2017
@keszybz
Copy link
Member

keszybz commented Feb 3, 2017

xenial-i386:

seccomp              FAIL non-zero exit status 1

@vcaputo
Copy link
Member

vcaputo commented Feb 4, 2017

neat, looks good to me!

r = verify_run_space(message, &error);
if (r < 0) {
log_error_errno(r, "%s", bus_error_message(&error, r));
return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just drop the return 0, and unconditionally return r.

@@ -1471,6 +1520,10 @@ static int method_switch_root(sd_bus_message *message, void *userdata, sd_bus_er
Manager *m = userdata;
int r;

r = verify_run_space("Refusing to switch root", error);
if (r < 0)
return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's good. For all intents and purposes, failing to switch root is as good as crashing. And since we don't really know how much space we're going to need, I think here this should only be a warning ("Dangerously low amount of free space on /var, we might crash" or so). In particular, for embedded systems with really low amounts of memory this check might be unnecessarily punitive.

char fb_available[FORMAT_BYTES_MAX], fb_need[FORMAT_BYTES_MAX];
return sd_bus_error_setf(error,
BUS_ERROR_DISK_FULL,
"%s, not enough space available on /run/systemd: %s < %s.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message could be misleading: we don't really know how much space will be needed, but it implies a strict limit. It'd be better to reword it so that when a person sees that, they'll understand that it's just an arbitrary threshold.

@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Feb 4, 2017
@martinpitt martinpitt added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Feb 5, 2017
Let's add an extra safety check: before entering a reload/reexec, let's
verify that there's enough room in /run for it.

Fixes: systemd#5016
If we can, use a memfd for serializing state during a daemon reload or
reexec. Fall back to a file in /run/systemd or /tmp only if memfds are
not available.

See: systemd#5016
@poettering
Copy link
Member Author

Force pushed a new version now, addressing @keszybz's concerns. Please have a look.

I called th 16MB requirement now a "safety buffer". I hope that gets the message across that this isn't actually a real size requirement, but just a safety buffer...

@poettering poettering removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Feb 7, 2017
@keszybz keszybz merged commit 55295fd into systemd:master Feb 7, 2017
@mbiebl
Copy link
Contributor

mbiebl commented Feb 8, 2017

Hm, so if I see this correctly, we refuse to re-exec even if memfd support is available? Why do we need /run in this case?

@keszybz
Copy link
Member

keszybz commented Feb 8, 2017

See original PR description.

@mbiebl
Copy link
Contributor

mbiebl commented Feb 8, 2017

@keszybz that doesn't answer the question, really. Afaics, if we have memfd support we can safely reexec/reload even if /run is full. Which means we should apply this check only in the fallback case when memfd is not available.

@mbiebl
Copy link
Contributor

mbiebl commented Feb 8, 2017

I guess I need to be a bit more verbose. While I can see the point that running generators might need space in /run, why not only guard the bits which need space in /run with the is-full check, but still allow a re-exec of the systemd binary?

@keszybz
Copy link
Member

keszybz commented Feb 8, 2017 via email

@poettering
Copy link
Member Author

I guess I need to be a bit more verbose. While I can see the point that running generators might need space in /run, why not only guard the bits which need space in /run with the is-full check, but still allow a re-exec of the systemd binary?

We rerun generators on reload. If the generators don#t succeed we will be missing unit files. Hence it's better to check this beforehand, before we enter an "incomplete" state.

To make this even clearer: we explicitly remove all generated units right before the reload, hence we better make sure we can replace them with new ones after the reload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

5 participants