Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many sd-boot/bootctl fixes, and a new "boot counting" concept, for automatic fallback to older kernels on persistent failures #9437

Merged
merged 15 commits into from Oct 19, 2018

Conversation

@poettering
Copy link
Member

commented Jun 26, 2018

Yupp, a lot of patches in a single PR, but most are rather simple and the stuff at the end is mostly docs...

In order to grok the concepts introduced here it might make sense to read through the last commit first which contains docs explaining how this all fits together.

@poettering poettering added the boot label Jun 26, 2018

For unattended systems such as servers, IoT/embedded devices or appliances an
automatic logic of reverting back to previous versions of the OS or kernel in
case system boots consistently fail is important. systemd provides support for
this in various of its components. When used together they provide a complete

This comment has been minimized.

Copy link
@vcaputo

vcaputo Jun 26, 2018

Member

s/various/a variety/

* The
[`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)
boot loader optionally maintains a per-boot-loader-entry counter that is
decreased by one on each attempt to boot the entry, priorizing entries that

This comment has been minimized.

Copy link
@vcaputo

vcaputo Jun 26, 2018

Member

s/priorizing/prioritizing/


* The
[`systemd-boot-check.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check.service.html)
service is a simple health check tool that determines whether the boot

This comment has been minimized.

Copy link
@vcaputo

vcaputo Jun 26, 2018

Member

s/whether/wether/


## Walkthrough

Here's an example walkthrough how this all fits together.

This comment has been minimized.

Copy link
@vcaputo

vcaputo Jun 26, 2018

Member

s/how/of how/

@vcaputo

This comment has been minimized.

Copy link
Member

commented Jun 26, 2018

@poettering I can see the appeal of overloading the entry filenames to contain the attempts/remaining counters, but I think it would probably be better to leave the entries alone. What if someone wanted to have the directory or fs containing the entries be non-writable outside of privileged update windows when they are manipulated?

Maybe have a separate directory containing the per-entry counters, and use symlinks named after the entry files. Make the link target contain the counters rather than actually pointing anywhere meaningful in the filesystem.

This also would eliminate the issue of entries unintentionally containing counters in their name, if that was a possibility (I didn't examine if the naming of those entries was entirely user-defined).

@evverx

This comment has been minimized.

Copy link
Member

commented Jun 26, 2018

This pull request introduces 1 alert when merging 1d8536b into ad4bc33 - view on LGTM.com

new alerts:

  • 1 for Comparison result is always the same

Comment posted by LGTM.com

service automatically marks a boot loader entry for which boot counting as
mentioned above is enabled, as "good" when a boot has been determined to be
successful, thus turning off boot counting for it again.

This comment has been minimized.

Copy link
@dnicolodi

dnicolodi Jun 26, 2018

One comma is missing: "marks a boot loader entry, for which boot counting as mentioned above is enabled, as "good" when a boot has been determined to be successful".

transaction.

8. The `boot-complete.target` unit is ordered after and pulls in various units
that shall are required to succeed for the boot process to be considered

This comment has been minimized.

Copy link
@dnicolodi
@medhefgo
Copy link
Contributor

left a comment

That was an exhausting read :D

I generally like this but I really disagree on using rename to take track of tries. We already distrust EFI firmwares (for good reason), so using EFI variables for this is out of the question. But then we also cannot rely on them implementing renames in an atomic fashion. What if we are really lucky and lose power while the EFI is touching the fs? We could end up losing the boot config file or it could end up with garbage. And if it happens to be the only one the user has (for whatever reason) we wouldn't even have anything to boot anymore.

@vcaputo We cannot rely on fancy things like links on ESP because vfat is such an awesome filesystem.

<variablelist>
<varlistentry>
<term><varname>LoaderBootCountPath</varname></term>
<listitem><para>If boot counting is enabled contains the path the of the file the counter is encoded in the

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

This sentence does not parse for me.


<varlistentry>
<term><varname>LoaderConfigTimeout</varname></term>
<listitem><para>The menu time-out. Read by the boot loader.</para></listitem>

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

Technically, sd-boot can also change that (t/T).

<term><varname>LoaderEntryDefault</varname></term>
<term><varname>LoaderEntryOneShot</varname></term>

<listitem><para>The identifier of the default boot loader entry. Set by the OS and read by the boot

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

Once again, LoaderEntryDefault can also be set by the bootloader (d/D).

newest boot entry is now tried, i.e. the system automatically reverted back
to an earlier version.

The above describes the a walkthrough when the selected boot entry continously

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

Superfulous "a" here.

combined with `Requires=` dependencies from the target, so that the target
cannot be reached when any of the units fail. You may add any number of
units like this, and only if they all succeed the boot entry is marked as
good. Note that the target unit shal pull in these boot checking units, not

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

s/shal/shall

wrap them in a unit and order them after `boot-complete.target`, pulling it
in.

# FAQ

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

Maybe also add why this is not stored in UEFI variables too.

@@ -41,6 +41,11 @@ typedef struct {
EFI_STATUS (*call)(VOID);
BOOLEAN no_autoselect;
BOOLEAN non_unique;
UINTN tries_done;

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

I feel like using INTN would make this much nicer without all the (UINTN)-1 casts all over the place.

This comment has been minimized.

Copy link
@poettering

poettering Jun 27, 2018

Author Member

well, but that would suggest the counter could actually go negative, but it can't... We use (UINTN) -1 as a special way to indicate one particular exceptional case (and we do this similar elsewhere in the tree), hence I don't think we should really change the type altogether....

arg_path, good,
arg_path, bad);

log_debug("Tries left: %" PRIu64"\n"

This comment has been minimized.

Copy link
@medhefgo

medhefgo Jun 27, 2018

Contributor

You can unify these 2 log_debug() calls-

@medhefgo

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2018

Also, I feel like turning boot counting off after the first successful boot is the wrong way to go. The whole point of this is to be failsafe, no? And booting could go wrong later again...

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

@poettering I can see the appeal of overloading the entry filenames to contain the attempts/remaining counters, but I think it would probably be better to leave the entries alone. What if someone wanted to have the directory or fs containing the entries be non-writable outside of privileged update windows when they are manipulated?

Maybe have a separate directory containing the per-entry counters, and use symlinks named after the entry files. Make the link target contain the counters rather than actually pointing anywhere meaningful in the filesystem.

sd-boot operates on the ESP, i.e. FAT. FAT doesn't have symlinks. And even if we loosen the requirement for FAT/ESP one day we should still stick to operations that are implementable on FAT.

We need to be able to manipulate the counters from two sides: the boot loader, before we start a boot process. And the OS, after we completed a boot process. (And of course, also during update, to install the image and set an initial count on it.) These time windows are all very brief and well defined I'd say, on their own.

This also would eliminate the issue of entries unintentionally containing counters in their name, if that was a possibility (I didn't examine if the naming of those entries was entirely user-defined).

I thought about that, and if it indeed turns out to be a problem I figure we should simply make the whole logic something one can turn off, so that the "+" stuff looses its magic effect. After all the whole design is put together so that a boot loader that implements the boot loader spec but doesn't do boot counting will just work, but ignore the counters.

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

I generally like this but I really disagree on using rename to take track of tries. We already distrust EFI firmwares (for good reason), so using EFI variables for this is out of the question.

The distrust in EFI variables is probably more caused by the quality of memory chips backing them, i.e. it's doubt about the hardware more than the software.

But then we also cannot rely on them implementing renames in an atomic fashion. What if we are really lucky and lose power while the EFI is touching the fs? We could end up losing the boot config file or it could end up with garbage. And if it happens to be the only one the user has (for whatever reason) we wouldn't even have anything to boot anymore.

Well, that leaves very little wiggle room... We only really have the ESP and the vars as storage space.

@medhefgo

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2018

It's fine putting them in ESP, just use a separate file instead of the loader entries. You could put them in /loader/tries/* for example.

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

It's fine putting them in ESP, just use a separate file instead of the loader entries. You could put them in /loader/tries/* for example.

I think that's more risky actually, as writing files means actually allocating/releasing new blocks usually, and that is hard to do atomically (and also involves allocation algorithms, and there's a good chance firmware will fuck that up). Renaming otoh is very likely implementable atomically even on crappy, simpler file systems... I mean, we have no golden bullet, we can just pick the least bad option from a pool of crappy ones. I am pretty sure renaming is safer than writing file contents...

What I like about the rename approach is firstly that the operation it implements has a chance of being atomic and reliable, but also that this means the counting info is directly associated with the entry, and the lifecycle of this data always bound together. You cannot remove the entry without also removing the associated counting data, and that's a really good thing. If you split the counting data into a separate file you need to be careful to always explicitly lifecycle that counting data along with the entry as they are not directly and implicitly attached to each other.

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

Also, I feel like turning boot counting off after the first successful boot is the wrong way to go. The whole point of this is to be failsafe, no? And booting could go wrong later again...

Hmm, so downgrade attacks are a major problem in computer security, in particular in cryptography. After we verified that a kernel works sufficiently well for us to mark the entry good, we should really stick to it I think. There are two opposing concepts here I figure: "don't make yourself vulnerable to downgrade attacks" and "downgrade if upgrade is fucked". I think we can only marry these two if we say we never downgrade once an upgrade worked at least once.

@vcaputo

This comment has been minimized.

Copy link
Member

commented Jun 27, 2018

Oops, I keep forgetting about UEFI/ESP stuff, still living in the 90s with my trusty X61s.

With regards to the rename atomicity argument, the last time I checked actual on-disk atomicity of rename was not something guaranteed by most if not all filesystems. POSIX only speaks to runtime atomicity with regards to visibility of the replaced name to other processes. It's a common misconception to believe this equates durably atomic renames across unclean mounts of the filesystem, but that's not really the case IIRC.

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

With regards to the rename atomicity argument, the last time I checked actual on-disk atomicity of rename was not something guaranteed by most if not all filesystems. POSIX only speaks to runtime atomicity with regards to visibility of the replaced name to other processes. It's a common misconception to believe this equates durably atomic renames across unclean mounts of the filesystem, but that's not really the case IIRC.

Yeah, that's why I say there's a "chance" it's atomic. But you are right, there's no guarantee, and shitty code will always make this hard. But it's taking the least bad from a pile of crappy options.

@medhefgo

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2018

I can see the point on downgrading attacks.

Another thing: I feel like counting tries is overkill if you just want to make sure upgrades work. If the new kernel+initrd don't work, retrying would be futile, no? So making this into a boolean signaling if this booted successful once should be enough.

@medhefgo

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2018

I just had a crazy idea: in theory you could store this information in the file mtime... :D

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

Another thing: I feel like counting tries is overkill if you just want to make sure upgrades work. If the new kernel+initrd don't work, retrying would be futile, no? So making this into a boolean signaling if this booted successful once should be enough.

Well, you never know what people do: maybe they turn off the machine during boot (think: some random appliance handled by random people, or even a desktop PC with an encrypted harddisk and a password prompt in the initrd). There might be many reasons why a kernel/os image doesn't boot properly and only a subset of them have anything to do with the image itself. Hence I think there should be some room for configuration here. If deployments are in particular volatile environments it might make sense to initialize the counter to something high to allow a larger number of tries before giving up and reverting back. If deployments are in very well defined, very deterministic and supervised environments then it might make sense to initialize it to something low, maybe even 1, to only offer one attempt before reverting back.


r = make_good(prefix, suffix, &good);
if (r < 0)
return log_oom(),

This comment has been minimized.

Copy link
@evverx

evverx Jun 27, 2018

Member

I thought the alert was a false positive until I noticed this comma. I guess a semicolon should be here instead.

This comment has been minimized.

Copy link
@poettering

poettering Jun 27, 2018

Author Member

ouch! I kept scratching my head about this warning, and couldn't find what was wrong! Thanks!

@poettering poettering force-pushed the poettering:sd-boot-count branch from 1d8536b to 5ccca21 Jun 27, 2018

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

I force pushed a new version with all reported typos fixed!

@boucman

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2018

I know non-EFI system are not well handled by systemd, but having a systemd-provided point for "system is good" would be very interesting for embedded systems, and embedded/IoT systems don't have EFI...

Would it be possible to split the various units into a generic bootloader-independant part and a EFI-specific part, the same way the wait-online in network-manager-independant ?

This way we could andd u-boot or barebox specific implementations easily...

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jun 28, 2018

I know non-EFI system are not well handled by systemd, but having a systemd-provided point for "system is good" would be very interesting for embedded systems, and embedded/IoT systems don't have EFI...

Would it be possible to split the various units into a generic bootloader-independant part and a EFI-specific part, the same way the wait-online in network-manager-independant ?

That's actually excactly what this is already doing, and exactly this is even explicitly mentioned in the markdown document explaining how this all fits together, at the second to last commit of this PR. Please have a look.

This way we could andd u-boot or barebox specific implementations easily...

Yes, absolutely, the "boot-complete.target" is supposed to be generic, and usable with other boot loaders and other "boot ok" schemes.

@boucman

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2018

right, my bad...

I'll reread the whole PR in more details. I was kinda rushed when I first read it.

sorry

completed successfuly. When enabled it becomes an indirect dependency of
`systemd-boot-complete.service` (by means of `boot-complete.target`, see
below), ensuring that the boot will not be considered successful if there are
any failed services.

This comment has been minimized.

Copy link
@boucman

boucman Jun 28, 2018

Contributor

Ok, this is where I got confused... systemd-boot-check is actually independent of using systemd-boot. maybe rename it boot-check.service ?

This comment has been minimized.

Copy link
@poettering

poettering Jun 29, 2018

Author Member

We generally prefix stuff units that run code provided by us with "systemd-", and this service does invoke C code we hacked up, hence it carries that prefix.

off.

12. On the following boot (and all subsequent boots after that) the entry is
now seen with boot counting turned off, no further renaming takes place.

This comment has been minimized.

Copy link
@boucman

boucman Jun 28, 2018

Contributor

Maybe consider reseting the counter instead of disabling boot-counting entirely.

I understand the idea of "if it booted once properly, the image is correct and it won't magically stop working" but I have actually seen that happen in real life.

Suppose you have two completely separate (i.e different rootfs) boot entries. One is the normal operation one, the other is a recovery system that can do only one thing : reformat and reinstall the normal partition.

The normal partition might become bad for various reasons on a completely autonomous system. Bad blocks, bad handling of disk full, bad recovery on incorrect shutdown... Those kind of things are very rare, but on autonomous systems without any sysadmin they can happen and, statistically, they do.

At this point, having the boot counter still active provides a simple "parachute" for a complete reinstall, and that's very welcome.

This comment has been minimized.

Copy link
@poettering

poettering Jun 29, 2018

Author Member

@medhefgo also suggested this, see above. and also see my comments on that. Besides the downgrade-attack thing I also think there's value in putting boundaries on the time where we keep changing the $BOOT file system. With the implemented approach, if an update cycle is clean we'll update $BOOT three times: once to install the new kernel, once on the next boot to count down, and once after it completed to mark it OK. After that $BOOT isn't touched anymore, regardless how many cycles follow, until the next update. I think that's much better behaviour than continuing to count forerver.

I mean, the current scheme makes its changes to the ESP, which is VFAT, and hence has less than ideal integrity behaviour. If we had something more reliable we'd use it, but it's what we have to work with, but we should still optimize the write cycles, and I think O(1) instead of O(n) write cycles per update is highly preferable.

But dunno. we can certainly look into making other schemes workable if there's really the need, once the initial support is in.

This comment has been minimized.

Copy link
@boucman

boucman Jun 29, 2018

Contributor

That's no big deal, I'll have to reimplement that part of the logic for u-boot anyway...

that was more a case of "i've been there before" but there are many use cases and only one can be the default...

units/boot-complete.target Show resolved Hide resolved
@keszybz
Copy link
Member

left a comment

My major gripe is with the positioning of of the boot counter update service. IIUC, it will be called very early, when it actually needs to be called very late.

I also think the names of many things would benefit from being changed. If those somewhat misleading names are established early on, people will have a harder time using this correctly.

But overall, this looks nice. The counting scheme is a bit complex with the two counters, but not too much, and I see the value in having the two counters.

(meson-mode . ((meson-indent-basic . 8)))
(nil . ((indent-tabs-mode . nil)
(tab-width . 8)
(fill-column . 79))))

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Didn't we establish in the other PR that this doesn't actually work?

* Names we grok, and the series they result in:
*
* foobar+3.efi → foobar+2-1.efi → foobar+1-2.efi → foobar+0-3.efi → STOP!
* foobar+4.0.efi → foobar+3-1.efi → foobar+2-2.efi → foobar+1-3.efi → foobar+0-4.efi → STOP!

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Typo here "4.0".

*
* foobar+3.efi → foobar+2-1.efi → foobar+1-2.efi → foobar+0-3.efi → STOP!
* foobar+4.0.efi → foobar+3-1.efi → foobar+2-2.efi → foobar+1-3.efi → foobar+0-4.efi → STOP!
*

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Spurious line?

UINTN new_left, digit;
digit = file[i] - '0';
if (digit > (((UINTN) -1) / factor)) /* overflow check */

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Those extra parens around the division operation are not necessary…

<listitem><para>Show a help screen</para></listitem>
</varlistentry>

<varlistentry>
<term>Ctrl + l</term>
<term><keycombo><keycap>Ctrl</keycap><keycap>l</keycap></keycombo></term>

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Nice, never occurred to me to check if docbook has something like this.

#include "util.h"
#include "virt.h"

/* This generator pulls in systemd-boot-complete.service if the "LoaderBootCountPath" EFI variable is set, i.e. the

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

pulls ... into what ?


/* We pull this in from basic.target so that it ends up in all "regular" boot ups, but not in rescue.target or
* even emergency.target. */
p = strjoina(arg_dest, "/" SPECIAL_BASIC_TARGET ".wants/systemd-boot-complete.service");

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

No, this is cannot be. This needs to be hooked up late into boot, after all the checks and whatnot have happened. Marking a boot as successful early in the boot is pointless.

This comment has been minimized.

Copy link
@poettering

poettering Jul 17, 2018

Author Member

(misunderstanding, as discussed below)

mkosi.build Outdated
# Manually update the boot loader from the one we just built
mkdir -p "$DESTDIR"/boot/efi/EFI/systemd "$DESTDIR"/boot/efi/EFI/BOOT
cp "$DESTDIR"/usr/lib/systemd/boot/efi/systemd-bootx64.efi "$DESTDIR"/boot/efi/EFI/systemd/systemd-bootx64.efi
cp "$DESTDIR"/usr/lib/systemd/boot/efi/systemd-bootx64.efi "$DESTDIR"/boot/efi/EFI/BOOT/bootx64.efi

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Hmm, are those paths (with /boot/efi) always correct?

mkosi.build Outdated
# Manually update the boot loader from the one we just built
mkdir -p "$DESTDIR"/boot/efi/EFI/systemd "$DESTDIR"/boot/efi/EFI/BOOT
cp "$DESTDIR"/usr/lib/systemd/boot/efi/systemd-bootx64.efi "$DESTDIR"/boot/efi/EFI/systemd/systemd-bootx64.efi
cp "$DESTDIR"/usr/lib/systemd/boot/efi/systemd-bootx64.efi "$DESTDIR"/boot/efi/EFI/BOOT/bootx64.efi

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Hmm, are those paths (with /boot/efi) always correct?

This comment has been minimized.

Copy link
@poettering

poettering Jul 17, 2018

Author Member

in the mkosi image we build they are...

@@ -28,7 +28,7 @@ enum loader_type {
};
typedef struct {
CHAR16 *file;
CHAR16 *id; /* The identifier for this entry (not that this id is not necessarily unique though!) */

This comment has been minimized.

Copy link
@keszybz

keszybz Jul 3, 2018

Member

Too many "not"s.

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jul 4, 2018

My major gripe is with the positioning of of the boot counter update service. IIUC, it will be called very early, when it actually needs to be called very late.

The boot counter update service systemd-boot-complete.service is ordered after sysinit.target by default, as well as boot-complete.target. The latter is supposed to be the extension point which people can use to configure precisely how early or late they want the boot counter update to happen. If they don't order anything else before it, yes it will run pretty early on. But they can order anything they like before it, for example:

  1. let's say httpd.service and mysql.service is extremely imporant to them. In that case they can add Requires= pointing from boot-complete.target towards them (through RequiredBy=), and the marking will be delayed until httpd and mysql are known to have started up successfuly.

  2. They can also enable our own "systemd-boot-check.service", which is mostly an example but can be useful too. It is ordered by default after multi-user.target and graphical.target, i.e very late and it checks for any failed services. They can also hack up their own services like this, possible with simple scripts, and order them properly to move the marking as early or late into the boot process as they wish.

So, by default we'll mark things as good early on, but people have total freedom there, and can move it somewhere in the middle of the boot process (which option 1 above would do) or very later into it (which option 2 would do).

@poettering poettering force-pushed the poettering:sd-boot-count branch from 5b0a088 to 97e5e96 Oct 8, 2018

@poettering poettering force-pushed the poettering:sd-boot-count branch from 97e5e96 to 7cb5318 Oct 16, 2018

keszybz added a commit that referenced this pull request Oct 18, 2018
Merge pull request #10174 from poettering/sd-boot-counter-efi
the EFI changes from PR #9437 (the boot counting PR)
@keszybz

This comment has been minimized.

Copy link
Member

commented Oct 18, 2018

I'll force push this PR (with no changes) so github sees that some of the commits are already merged.

@keszybz keszybz force-pushed the poettering:sd-boot-count branch from 7cb5318 to d41845e Oct 18, 2018

@poettering poettering force-pushed the poettering:sd-boot-count branch from 7cb5318 to a361b76 Oct 18, 2018

@keszybz
Copy link
Member

left a comment

Looks nice, just some typos.

support is built into various of its components. When used together these
components provide a complete solution on UEFI systems, built as add-on to the
[Boot Loader
Specification](https://github.com/systemd/systemd/blob/master/docs/BOOT_LOADER_SPECIFICATION.md). However,

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 19, 2018

Member

This should be updated like in a084849.


* The
[`systemd-boot-check-no-failures.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check-no-failures.service.html)
service is a simple health check tool that determines wether the boot

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 19, 2018

Member

whether

completed successfuly. When enabled it becomes an indirect dependency of
`systemd-bless-boot.service` (by means of `boot-complete.target`, see
below), ensuring that the boot will not be considered successful if there are
any failed services.

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 19, 2018

Member

On my Fedora laptop, failing services are always a thing. After upgrading to F29 I have three, and so far I didn't bother to investigate. This would prevent me from ever blessing a new kernel. But the man page makes it clear that this is not supposed to be used for real, so it's all OK.

<literal>+</literal> followed by one or two numbers (if two they need to be separated by a <literal>-</literal>),
before the <filename>.conf</filename> or <filename>.efi</filename> suffix is subject to boot counting: the first of
the two numbers ('tries left') is decreased by one on every boot attempt, the second of the two numbers ('done
tries') is increased by one (if 'tries done' is absent it is considered equivalent to 0). Depending on the current

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 19, 2018

Member

'tries done'

<para>Internally, the service operates based on the <varname>LoaderBootCountPath</varname> EFI variable (of the
vendor UUID <constant>4a67b082-0a4c-41cf-b6c7-440b29bb8c4</constant>), which is passed from the boot loader to the
OS. It contains a file system path (relative to the EFI system partition) of the <ulink
url="https://github.com/systemd/systemd/blob/master/docs/BOOT_LOADER_SPECIFICATION.md">Boot Loader

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 19, 2018

Member

systemd.io...


/* This generator pulls systemd-bless-boot.service into the initial transaction if the "LoaderBootCountPath" EFI
* variable is set, i.e. the system boots up with boot counting in effect, which means we should mark the boot as
* "good" if we manage to boot up this far. */

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 19, 2018

Member

"this far" is unclear. The generator runs very early, but the blessing happens very late.

This comment has been minimized.

Copy link
@poettering

poettering Oct 19, 2018

Author Member

i changed this to "far enough" now. It's supposed to be vague, since after all it's supposed to open what is meant by "successful boot". But of course, the "this" is confusing indeed, since it's not just vague but potentially referencing something in the same paragraph and that's something we should fix which I hope "far enough" does. Hope that makes sense.

poettering added 15 commits Jun 22, 2018
add "systemd-bless-boot-generator"
This generator automatically pulls in "systemd-bless-boot.service" if
a boot with boot counting is detected.
kernel-install: teach bot loader spec entry generator to generate ent…
…ries with "tries"

This makes two changes:

1. When called for "remove" any drop-ins with "+" suffix are removed
   too, so that the logic works for entries with boot counting enabled
   too and we don't lose track of configuration snippets created that
   way.

2. When called for "add" we optionally generate a "+" suffix, based on
   the data in /etc/kernel/tries if it exists.

   This basically means after "echo 5 > /etc/kernel/tries" any installed
   kernels will automatically set up for 5 boot tries before older
   kernels will be tried.
man: update kernel-install(8) documentation
Many general updates, but most importantly, document the
/etc/kernel/tries logic briefly.
units: add simple boot check unit
This is might be useful in some cases, but it's primarily an example for
a boot check service that can be plugged before boot-complete.target.

It's disabled by default.

All it does is check whether the failed unit count is zero
add new systemd-bless-boot.service that marks boots as successful
This is the counterpiece to the boot counting implemented in
systemd-boot: if a boot is detected as successful we mark drop the
counter again from the booted snippet or kernel image.

@poettering poettering force-pushed the poettering:sd-boot-count branch from a361b76 to 8c073dd Oct 19, 2018

@poettering

This comment has been minimized.

Copy link
Member Author

commented Oct 19, 2018

OK, I rebased it on top of current git and pushed it, so that it applies again. I made all suggested changes. I am taking the liberty to put the green label on it, given that the new changes are only about typo fixes. Thanks for the review!

@poettering poettering merged commit a2689fa into systemd:master Oct 19, 2018

2 of 6 checks passed

LGTM analysis: C/C++ Running analyses for revisions
Details
bionic-amd64 autopkgtest running
Details
bionic-i386 autopkgtest running
Details
bionic-s390x autopkgtest running
Details
LGTM analysis: JavaScript No code changes detected
Details
semaphoreci The build passed on Semaphore.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
8 participants
You can’t perform that action at this time.