New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
beef up random seed logic, add boot loader entropy privisioning, improve docs about it #13137
Conversation
This pull request introduces 1 alert when merging c31ec80 into 670fb0b - view on LGTM.com new alerts:
|
2742dbd
to
8a19372
Compare
CentOS 7 no likey:
Arch in KVM timed out during reboot though, will give it a second spin if it's not just a fluke. |
ah, indeed! thanks for the hint, force pushed a new version which adds the missing prototype definition to that .c file |
What's currently failing: Ubuntu CI fails around installing
Not sure what's going on there, since in some tests it passes, in most, however, it does not. As for CentOS CI (Arch in KVM), my first attempt in reproducing it locally went nowhere (it just works), yet the random seed initialization seemed to take a little bit more time than usual. I'll try to debug it with the same image we use in CentOS CI, if that's the case. Right now I still have no idea why it takes it more than 5 minutes to reboot (will try to temporarily bump the timeout a bit). |
I guess it would be worth pinging @evverx & @ddstreet since the Ubuntu fail looks similarly to something we dealt with a while ago in #12861 (comment) and the patch we used to fix it seems to affect the currently failing code (3cdb93d & 71a0de3) |
I think |
Also, this issue forced me, again, into trying to get the output from serial console into a file (in the Vagrant runs), and I finally got a working PoC. Will try to deploy it ASAP, so we can debug this further. |
hmm, we installed bootctl to rootlibexecdir? i.e. to /usr/lib? that doesn't look right? |
Looks like |
@poettering looks like bumping the boot timeout from 5 to 15 minutes helped, as the Arch in KVM is now executing the last test. However, I'd like to debug this further, so if the merge could wait (e.g. to tomorrow), it would be much appreciated. I've prepared a patch for the serial console issue in systemd/systemd-centos-ci#153 and would like to use it to look at what's happening during the failed boots. |
(btw, the most recent pushed version now uses @bindir@ when referencing the bootctl binary) |
@mrc0mmand yeah, no hurries, @keszybz suggested to delay the merge of this after v243 (or at least the pre for it), so it will stay open at least until tomorrow |
Looking at the logs I noticed [583/1821] cc -c ../src/boot/efi/linux.c -o src/boot/efi/linux.c.o -Wall -Wextra -std=gnu90 -nostdinc -ggdb -O0 -fpic -fshort-wchar -ffreestanding -fno-strict-aliasing -fno-stack-protector -Wsign-compare -Wno-missing-field-initializers -isystem /usr/include/efi -isystem /usr/include/efi/ia32 -include src/boot/efi/efi_config.h -include version.h
In file included from /usr/include/efi/efi.h:41:0,
from ../src/boot/efi/linux.c:3:
../src/boot/efi/linux.c: In function â<80><98>linux_execâ<80><99>:
../src/boot/efi/linux.c:48:60: warning: passing argument 4 of â<80><98>BS->AllocatePagesâ<80><99> from incompatible pointer type [-Wincompatible-pointer-types]
EFI_SIZE_TO_PAGES(0x4000), (UINTN *) &boot_params);
^
../src/boot/efi/linux.c:48:60: note: expected â<80><98>EFI_PHYSICAL_ADDRESS * {aka long long unsigned int *}â<80><99> but argument is of type â<80><98>UINTN * {aka unsigned int *}â<80><99> I'm not sure whether it has anything to do with this PR though. @ddstreet would it be possible to pass |
@evverx yeah, looks like a bug, not introduced by this PR. please open an issue of its own? |
I may not have all the context, but specifically re: systemd-bless-boot, that's installed into /lib/systemd/ for Ubuntu (and Debian), not /usr/lib/systemd.
I think so, for the Ubuntu CI runs, let me see if I can work out how to add that. |
After the next force push the Arch in KVM job should provide logs from the serial console, which should make this (hopefully) debug-able. I also reverted the temporary reboot timeout bump, so it's back at 5 minutes, let's see if it's going to show us something useful. |
Thank you. Apparently to judge from |
That way we can reuse it elsewhere.
…doesn't work There's no reason why writing should work if reading and writing doesn't. Let's simplify this hence. /dev/urandom is generally an r/w device, and everything else would be a serious system misconfiguration.
This makes two major changes to the way systemd-random-seed operates: 1. We now optionally credit entropy if this is configured (via an env var). Previously we never would do that, with this change we still don't by default, but it's possible to enable this if people acknowledge that they shouldn't replicate an image with a contained random seed to multiple systems. Note that in this patch crediting entropy is a boolean thing (unlike in previous attempts such as systemd#1062), where only a relative amount of bits was credited. The simpler scheme implemented here should be OK though as the random seeds saved to disk are now written only with data from the kernel's entropy pool retrieved after the pool is fully initialized. Specifically: 2. This makes systemd-random-seed.service a synchronization point for kernel entropy pool initialization. It was already used like this, for example by systemd-cryptsetup-generator's /dev/urandom passphrase handling, with this change it explicitly operates like that (at least systems which provide getrandom(), where we can support this). This means services that rely on an initialized random pool should now place After=systemd-random-seed.service and everything should be fine. Note that with this change sysinit.target (and thus early boot) is NOT systematically delayed until the entropy pool is initialized, i.e. regular services need to add explicit ordering deps on this service if they require an initialized random pool. Fixes: systemd#4271 Replaces: systemd#10621 systemd#4513
Force pushed a new version. Addresses all issues @keszybz found, except for the xattr thing, see above for the rationale. I also made one more change in light of @mrc0mmand's trouble: i dropped the ordering between systemd-random-seed.service and sysinit.target. This means we are not going to delay boot anymore until the pool is initialized (since this apparently caused major delays on those VMs). This means not only early-boot need to order themselves after the service if they require an initialized pool, but regular services too. i.e. it's still nice to have the sycnrhonization pool, but it's only one if people actually use it. This should remove the huge delay on @mrc0mmand's CI machines. (Apparently those CI machines have RDRAND but no TRUST_CPU kerne config option set?) |
thanks a ton for the thorough review, @keszybz, of course! |
Thanks! The random pool init still takes ~25 seconds, but if it stays this way, I have no problem with that:
The Arch machines are QEMU VMs (with KVM) and they don't seem to have RDRAND according to their CPU flags. As for the TRUST_CPU kernel option, it's indeed not set:
|
@poettering out of curiosity I tried to pass through the host's
so I guess we should make that change permanent, as the QEMU's entropy pool is not enough? |
please pass through /dev/urandom though, but yes, always do that. |
FWIK |
In the latest CentOS CI KVM run the boot time is back to "normal", so there's nothing blocking this PR from the CentOS CI point of view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do a full re-review, just the changed parts. There's still some wording changes, but I think we can just as well do that in a later PR.
|
||
tmp = mfree(tmp); | ||
|
||
log_info("Successfully written random seed file %s with %zu bytes.", path, sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's grammatically incorrect. Maybe "Random seed file %s successfully written with %zu bytes".
<filename>/boot/</filename>, and <filename>/boot/efi</filename> are checked in turn. It is recommended to mount | ||
the ESP to <filename>/efi/</filename>, if possible.</para></listitem> | ||
<filename>/boot/</filename>, and <filename>/boot/efi/</filename> are checked in turn. It is | ||
recommended to mount the ESP to <filename>/efi/</filename>, if possible.</para></listitem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the time has come to relax this, and accept that everybody mounts it on /boot/efi...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with this change sysinit.target (and thus early boot) is NOT systematically delayed until the entropy pool is initialized
@poettering systemd-random-seed.service still has Before=sysinit.target
. Is this a mistake?
Fixes for #9428 and #4271