Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beef up random seed logic, add boot loader entropy privisioning, improve docs about it #13137

Merged
merged 26 commits into from Jul 26, 2019

Conversation

poettering
Copy link
Member

Fixes for #9428 and #4271

@lgtm-com
Copy link

lgtm-com bot commented Jul 22, 2019

This pull request introduces 1 alert when merging c31ec80 into 670fb0b - view on LGTM.com

new alerts:

  • 1 for Missing header guard

@poettering poettering force-pushed the efi-random branch 4 times, most recently from 2742dbd to 8a19372 Compare July 24, 2019 07:03
@mrc0mmand
Copy link
Member

CentOS 7 no likey:

[802/2682] Compiling C object 'systemd-random-seed@exe/src_random-seed_random-seed.c.o'.
FAILED: systemd-random-seed@exe/src_random-seed_random-seed.c.o 
cc -Isystemd-random-seed@exe -I. -I.. -Isrc/basic -I../src/basic -Isrc/boot -I../src/boot -Isrc/shared -I../src/shared -Isrc/systemd -I../src/systemd -Isrc/journal -I../src/journal -Isrc/journal-remote -I../src/journal-remote -Isrc/nspawn -I../src/nspawn -Isrc/resolve -I../src/resolve -Isrc/timesync -I../src/timesync -I../src/time-wait-sync -Isrc/login -I../src/login -Isrc/udev -I../src/udev -Isrc/libudev -I../src/libudev -Isrc/core -I../src/core -Isrc/shutdown -I../src/shutdown -I../src/libsystemd/sd-bus -I../src/libsystemd/sd-device -I../src/libsystemd/sd-event -I../src/libsystemd/sd-hwdb -I../src/libsystemd/sd-id128 -I../src/libsystemd/sd-netlink -I../src/libsystemd/sd-network -I../src/libsystemd/sd-resolve -Isrc/libsystemd-network -I../src/libsystemd-network -I../ -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -std=gnu99 -g -Wextra -Werror=undef -Wlogical-op -Wmissing-include-dirs -Wold-style-definition -Wpointer-arith -Winit-self -Wfloat-equal -Wsuggest-attribute=noreturn -Werror=missing-prototypes -Werror=implicit-function-declaration -Werror=missing-declarations -Werror=return-type -Werror=format=2 -Wstrict-prototypes -Wredundant-decls -Wmissing-noreturn -Wshadow -Wendif-labels -Wstrict-aliasing=2 -Wwrite-strings -Werror=overflow -Wnested-externs -Wno-unused-parameter -Wno-missing-field-initializers -Wno-unused-result -Wno-error=nonnull -Wno-maybe-uninitialized -ffast-math -fno-common -fdiagnostics-show-option -fno-strict-aliasing -fvisibility=hidden -fstack-protector -fstack-protector-strong --param=ssp-buffer-size=4 -Werror=shadow -include config.h -g -O1 -fno-omit-frame-pointer -ftrapv -MD -MQ 'systemd-random-seed@exe/src_random-seed_random-seed.c.o' -MF 'systemd-random-seed@exe/src_random-seed_random-seed.c.o.d' -o 'systemd-random-seed@exe/src_random-seed_random-seed.c.o' -c ../src/random-seed/random-seed.c
../src/random-seed/random-seed.c: In function ‘run’:
../src/random-seed/random-seed.c:270:17: error: implicit declaration of function ‘getrandom’ [-Werror=implicit-function-declaration]
                 k = getrandom(buf, buf_size, GRND_NONBLOCK);
                 ^
../src/random-seed/random-seed.c:270:17: error: nested extern declaration of ‘getrandom’ [-Werror=nested-externs]
cc1: all warnings being treated as errors

Arch in KVM timed out during reboot though, will give it a second spin if it's not just a fluke.

@poettering
Copy link
Member Author

ah, indeed! thanks for the hint, force pushed a new version which adds the missing prototype definition to that .c file

@mrc0mmand
Copy link
Member

mrc0mmand commented Jul 24, 2019

What's currently failing:

Ubuntu CI fails around installing /bin/bootctl:

D: install any Execs from the service files
I: Attempting to install /bin/bash
I: Attempting to install /bin/bootctl
Makefile:4: recipe for target 'setup' failed
make: Leaving directory '/tmp/autopkgtest.QqpWOh/build.ht8/systemd/test/TEST-32-OOMPOLICY'

Not sure what's going on there, since in some tests it passes, in most, however, it does not.

As for CentOS CI (Arch in KVM), my first attempt in reproducing it locally went nowhere (it just works), yet the random seed initialization seemed to take a little bit more time than usual. I'll try to debug it with the same image we use in CentOS CI, if that's the case. Right now I still have no idea why it takes it more than 5 minutes to reboot (will try to temporarily bump the timeout a bit).

@mrc0mmand
Copy link
Member

mrc0mmand commented Jul 24, 2019

I guess it would be worth pinging @evverx & @ddstreet since the Ubuntu fail looks similarly to something we dealt with a while ago in #12861 (comment) and the patch we used to fix it seems to affect the currently failing code (3cdb93d & 71a0de3)

@evverx
Copy link
Member

evverx commented Jul 24, 2019

I think @rootbindir@ should be replaced with @rootlibexecdir@ in systemd-boot-system-token.service.in to get past that particular point.

@mrc0mmand
Copy link
Member

Also, this issue forced me, again, into trying to get the output from serial console into a file (in the Vagrant runs), and I finally got a working PoC. Will try to deploy it ASAP, so we can debug this further.

@poettering
Copy link
Member Author

hmm, we installed bootctl to rootlibexecdir? i.e. to /usr/lib? that doesn't look right?

@evverx
Copy link
Member

evverx commented Jul 24, 2019

Looks like install_rpath : rootlibexecdir misled me. Anyway, to judge from the logs bootctl is installed to /usr/bin on Ubuntu CI while the unit file points to /bin apparently.

@mrc0mmand
Copy link
Member

@poettering looks like bumping the boot timeout from 5 to 15 minutes helped, as the Arch in KVM is now executing the last test. However, I'd like to debug this further, so if the merge could wait (e.g. to tomorrow), it would be much appreciated. I've prepared a patch for the serial console issue in systemd/systemd-centos-ci#153 and would like to use it to look at what's happening during the failed boots.

@poettering
Copy link
Member Author

(btw, the most recent pushed version now uses @bindir@ when referencing the bootctl binary)

@poettering
Copy link
Member Author

@mrc0mmand yeah, no hurries, @keszybz suggested to delay the merge of this after v243 (or at least the pre for it), so it will stay open at least until tomorrow

@evverx
Copy link
Member

evverx commented Jul 24, 2019

Looking at the logs I noticed gcc complained about "incompatible pointer types" on i386

[583/1821] cc -c ../src/boot/efi/linux.c -o src/boot/efi/linux.c.o -Wall -Wextra -std=gnu90 -nostdinc -ggdb -O0 -fpic -fshort-wchar -ffreestanding -fno-strict-aliasing -fno-stack-protector -Wsign-compare -Wno-missing-field-initializers -isystem /usr/include/efi -isystem /usr/include/efi/ia32 -include src/boot/efi/efi_config.h -include version.h
In file included from /usr/include/efi/efi.h:41:0,
                 from ../src/boot/efi/linux.c:3:
../src/boot/efi/linux.c: In function â<80><98>linux_execâ<80><99>:
../src/boot/efi/linux.c:48:60: warning: passing argument 4 of â<80><98>BS->AllocatePagesâ<80><99> from incompatible pointer type [-Wincompatible-pointer-types]
                                 EFI_SIZE_TO_PAGES(0x4000), (UINTN *) &boot_params);
                                                            ^
../src/boot/efi/linux.c:48:60: note: expected â<80><98>EFI_PHYSICAL_ADDRESS * {aka long long unsigned int *<80><99> but argument is of type â<80><98>UINTN * {aka unsigned int *<80><99>

I'm not sure whether it has anything to do with this PR though. @ddstreet would it be possible to pass --werror to meson as we do everywhere else as far as I know?

@poettering
Copy link
Member Author

@evverx yeah, looks like a bug, not introduced by this PR. please open an issue of its own?

@ddstreet
Copy link
Contributor

I guess it would be worth pinging @evverx & @ddstreet since the Ubuntu fail looks similarly to something we dealt with a while ago in #12861 (comment) and the patch we used to fix it seems to affect the currently failing code (3cdb93d & 71a0de3)

I may not have all the context, but specifically re: systemd-bless-boot, that's installed into /lib/systemd/ for Ubuntu (and Debian), not /usr/lib/systemd.

would it be possible to pass --werror to meson as we do everywhere else as far as I know?

I think so, for the Ubuntu CI runs, let me see if I can work out how to add that.

@mrc0mmand
Copy link
Member

After the next force push the Arch in KVM job should provide logs from the serial console, which should make this (hopefully) debug-able. I also reverted the temporary reboot timeout bump, so it's back at 5 minutes, let's see if it's going to show us something useful.

@evverx
Copy link
Member

evverx commented Jul 25, 2019

I think so, for the Ubuntu CI runs, let me see if I can work out how to add that.

Thank you. Apparently to judge from src/boot/efi/meson.build that part of systemd lives in its own universe where --werror passed to meson doesn't help much but it can be fixed later on the systemd side.

…doesn't work

There's no reason why writing should work if reading and writing
doesn't. Let's simplify this hence. /dev/urandom is generally an r/w
device, and everything else would be a serious system misconfiguration.
This makes two major changes to the way systemd-random-seed operates:

1. We now optionally credit entropy if this is configured (via an env
var). Previously we never would do that, with this change we still don't
by default, but it's possible to enable this if people acknowledge that
they shouldn't replicate an image with a contained random seed to
multiple systems. Note that in this patch crediting entropy is a boolean
thing (unlike in previous attempts such as systemd#1062), where only a relative
amount of bits was credited. The simpler scheme implemented here should
be OK though as the random seeds saved to disk are now written only with
data from the kernel's entropy pool retrieved after the pool is fully
initialized. Specifically:

2. This makes systemd-random-seed.service a synchronization point for
kernel entropy pool initialization. It was already used like this, for
example by systemd-cryptsetup-generator's /dev/urandom passphrase
handling, with this change it explicitly operates like that (at least
systems which provide getrandom(), where we can support this). This
means services that rely on an initialized random pool should now place
After=systemd-random-seed.service and everything should be fine. Note
that with this change sysinit.target (and thus early boot) is NOT
systematically delayed until the entropy pool is initialized, i.e.
regular services need to add explicit ordering deps on this service if
they require an initialized random pool.

Fixes: systemd#4271
Replaces: systemd#10621 systemd#4513
@poettering
Copy link
Member Author

Force pushed a new version. Addresses all issues @keszybz found, except for the xattr thing, see above for the rationale.

I also made one more change in light of @mrc0mmand's trouble: i dropped the ordering between systemd-random-seed.service and sysinit.target. This means we are not going to delay boot anymore until the pool is initialized (since this apparently caused major delays on those VMs). This means not only early-boot need to order themselves after the service if they require an initialized pool, but regular services too. i.e. it's still nice to have the sycnrhonization pool, but it's only one if people actually use it. This should remove the huge delay on @mrc0mmand's CI machines.

(Apparently those CI machines have RDRAND but no TRUST_CPU kerne config option set?)

@poettering
Copy link
Member Author

thanks a ton for the thorough review, @keszybz, of course!

@mrc0mmand
Copy link
Member

I also made one more change in light of @mrc0mmand's trouble: i dropped the ordering between systemd-random-seed.service and sysinit.target. This means we are not going to delay boot anymore until the pool is initialized (since this apparently caused major delays on those VMs). This means not only early-boot need to order themselves after the service if they require an initialized pool, but regular services too. i.e. it's still nice to have the sycnrhonization pool, but it's only one if people actually use it. This should remove the huge delay on @mrc0mmand's CI machines.

Thanks! The random pool init still takes ~25 seconds, but if it stays this way, I have no problem with that:

[*     ] A start job is running for Load/Save Random Seed (7s / 10min)
^M[**    ] A start job is running for Load/Save Random Seed (8s / 10min)
^M[***   ] A start job is running for Load/Save Random Seed (8s / 10min)
^M[ ***  ] A start job is running for Load/Save Random Seed (9s / 10min)
^M[  *** ] A start job is running for Load/Save Random Seed (10s / 10min)
^M[   ***] A start job is running for Load/Save Random Seed (10s / 10min)
^M[    **] A start job is running for Load/Save Random Seed (11s / 10min)
^M[     *] A start job is running for Load/Save Random Seed (11s / 10min)
^M[    **] A start job is running for Load/Save Random Seed (12s / 10min)
^M[   ***] A start job is running for Load/Save Random Seed (12s / 10min)
^M[  *** ] A start job is running for Load/Save Random Seed (13s / 10min)
^M[ ***  ] A start job is running for Load/Save Random Seed (14s / 10min)
^M[***   ] A start job is running for Load/Save Random Seed (14s / 10min)
^M[**    ] A start job is running for Load/Save Random Seed (15s / 10min)
^M[*     ] A start job is running for Load/Save Random Seed (15s / 10min)
^M[**    ] A start job is running for Load/Save Random Seed (16s / 10min)
^M[***   ] A start job is running for Load/Save Random Seed (17s / 10min)
^M[ ***  ] A start job is running for Load/Save Random Seed (17s / 10min)
^M[  *** ] A start job is running for Load/Save Random Seed (18s / 10min)
^M[   ***] A start job is running for Load/Save Random Seed (18s / 10min)
^M[    **] A start job is running for Load/Save Random Seed (19s / 10min)
^M[     *] A start job is running for Load/Save Random Seed (20s / 10min)
^M[    **] A start job is running for Load/Save Random Seed (20s / 10min)
^M[   ***] A start job is running for Load/Save Random Seed (21s / 10min)
^M[  *** ] A start job is running for Load/Save Random Seed (21s / 10min)
^M[ ***  ] A start job is running for Load/Save Random Seed (22s / 10min)
^M[***   ] A start job is running for Load/Save Random Seed (22s / 10min)
^M[**    ] A start job is running for Load/Save Random Seed (23s / 10min)
^M[*     ] A start job is running for Load/Save Random Seed (24s / 10min)
^M[**    ] A start job is running for Load/Save Random Seed (24s / 10min)
^M[***   ] A start job is running for Load/Save Random Seed (25s / 10min)
^M[ ***  ] A start job is running for Load/Save Random Seed (25s / 10min)
^M[  *** ] A start job is running for Load/Save Random Seed (26s / 10min)
^M[  OK  ] Started Load/Save Random Seed.
[  OK  ] Reached target System Initialization.

(Apparently those CI machines have RDRAND but no TRUST_CPU kerne config option set?)

The Arch machines are QEMU VMs (with KVM) and they don't seem to have RDRAND according to their CPU flags.

As for the TRUST_CPU kernel option, it's indeed not set:

# zgrep TRUST_CPU /proc/config.gz
# CONFIG_RANDOM_TRUST_CPU is not set

@mrc0mmand
Copy link
Member

mrc0mmand commented Jul 25, 2019

@poettering out of curiosity I tried to pass through the host's /dev/random into the VM in systemd/systemd-centos-ci#154, and the random init pool initialization time is back to "normal":

         Starting Load/Save Random Seed...
         Starting Create System Users...
[  OK  ] Started Load/Save Random Seed.

so I guess we should make that change permanent, as the QEMU's entropy pool is not enough?

@poettering
Copy link
Member Author

please pass through /dev/urandom though, but yes, always do that.

@mrc0mmand
Copy link
Member

please pass through /dev/urandom though, but yes, always do that.

FWIK /dev/urandom is handled automatically, but for /dev/random additional shenanigans are required. I'll extend the CentOS CI PR to the other Arch VMs, so it works everywhere as expected.

@mrc0mmand
Copy link
Member

In the latest CentOS CI KVM run the boot time is back to "normal", so there's nothing blocking this PR from the CentOS CI point of view.

Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a full re-review, just the changed parts. There's still some wording changes, but I think we can just as well do that in a later PR.


tmp = mfree(tmp);

log_info("Successfully written random seed file %s with %zu bytes.", path, sz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's grammatically incorrect. Maybe "Random seed file %s successfully written with %zu bytes".

<filename>/boot/</filename>, and <filename>/boot/efi</filename> are checked in turn. It is recommended to mount
the ESP to <filename>/efi/</filename>, if possible.</para></listitem>
<filename>/boot/</filename>, and <filename>/boot/efi/</filename> are checked in turn. It is
recommended to mount the ESP to <filename>/efi/</filename>, if possible.</para></listitem>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the time has come to relax this, and accept that everybody mounts it on /boot/efi...

Copy link
Contributor

@floppym floppym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with this change sysinit.target (and thus early boot) is NOT systematically delayed until the entropy pool is initialized

@poettering systemd-random-seed.service still has Before=sysinit.target. Is this a mistake?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants