Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ostree rollbacks with alternate systemd boot target (with read-only composefs /etc). How do we detect rollbacks reliably? #3115

Closed
ericcurtin opened this issue Dec 7, 2023 · 33 comments
Labels
area/greenboot Issues related to greenboot integration reward/medium Fixing this will be notably useful triaged This issue has been evaluated and is valid

Comments

@ericcurtin
Copy link
Collaborator

ericcurtin commented Dec 7, 2023

We have a feature request that when a rollback is triggered, instead of just rolling back, we want to rollback but boot with an alternate systemd target. This may not have to be directly integrated into the ostree project.

There are typically two ways of doing this, setting a systemd.default= karg or by setting up a default.target symlink in one of:

/usr/lib/systemd/system/
/run/systemd/system/
/etc/systemd/system/

directories.

By setting up this default.target symlink early boot, we think we can achieve this.

But in order to do this early boot we need a reliable, (ideally bootloader agnostic we are not using grub) way of detecting the rollback problematically, so what we are unsure of is how do we detect this reliably early boot?

Tagging @alexlarsson for visibility

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 3, 2024

So the systemd side of this is not so tricky:

#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Rollback to rescue
DefaultDependencies=no
OnFailure=emergency.target
OnFailureJobMode=replace-irreversibly
After=initrd-root-fs.target initrd-fs.target initrd.target

[Service]
Type=oneshot
ExecStart=systemctl --no-block isolate rescue.target

[Install]
WantedBy=sysinit.target

Tested this, it seems to work fine, just need the reliable detector that we have rolled back (or not), in order to create a conditional.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 3, 2024

My initial thoughts are to parse:

rpm-ostree status --json

with either a python json parser or jq, and if it's not the head version assume it's a rollback.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 3, 2024

So what I am thinking of doing:

/ostree/deploy
/ostree/root.a -> deploy/centos/deploy/76b0919b393aa2254dd4b72950514422bb213992c3885f2196ec9cb278c57c5a.0
/ostree/root.b -> deploy/centos/deploy/some-sha.0
/ostree/root.expected -> root.a

/ostree/root.expected is a new symlink I will add.

If root.expected doesn't match the androidboot.slot_suffix we can tell we rolled back.

I'd like to use something more generic baked into rpm-ostree/libostree, but the CLIs etc. are too slow for this check, this really has to be a handful of milliseconds thing.

This will only work on this platform but at present it's just one Automotive company requesting this.

@jlebon
Copy link
Member

jlebon commented Jan 3, 2024

We have a feature request that when a rollback is triggered

It might be worth clarifying what rollback means in this context. rpm-ostree rollback? Selecting a previous entry in the boot menu?

Also, definitely worth looking at https://github.com/fedora-iot/greenboot/ if you haven't already.

@cgwalters
Copy link
Member

ExecStart=systemctl --no-block isolate rescue.target

(I think it's better instead to change the default target from a generator...i.e. the rollback detection happens in a generator. It's a bit cleaner.)

@cgwalters
Copy link
Member

My initial thoughts are to parse:
rpm-ostree status --json
with either a python json parser or jq, and if it's not the head version assume it's a rollback.

This topic heavily relates to #3032 - if we start logging metadata as to whether a deployment was ever booted, we can key of that more reliably.

Even without metadata today we could add ostree admin status --is-rollback that would just return that one bit of information. But I think in the general case we do want to be able to correctly distinguish the cases:

  • We're booted into the latest deployment

Otherwise:

  • We're in rollback, new deployment never reached the initramfs (where we'd record the booted flag)
  • We're in rollback, got as far as initramfs but it didn't pass a health check...this is where things start to overlap/intersect with https://www.freedesktop.org/software/systemd/man/latest/systemd-bless-boot.service.html
  • We're in rollback, but the new deployment apparently also booted successfully - this can happen when the user picks something manually in the bootloader menu

(Of course all of this deeply ties into having a JSON or other API for ostree, like both bootc and rpm-ostree has)

@cgwalters
Copy link
Member

I'd like to use something more generic baked into rpm-ostree/libostree, but the CLIs etc. are too slow for this check, this really has to be a handful of milliseconds thing.

I am not sure how an ostree CLI invocation would be a slow point of this flow; I can't imagine /bin/ostree is significantly slower than ExecStart=/bin/systemctl etc.

@cgwalters cgwalters added triaged This issue has been evaluated and is valid reward/medium Fixing this will be notably useful area/greenboot Issues related to greenboot integration labels Jan 3, 2024
@ericcurtin
Copy link
Collaborator Author

We have a feature request that when a rollback is triggered

It might be worth clarifying what rollback means in this context. rpm-ostree rollback? Selecting a previous entry in the boot menu?

Also, definitely worth looking at https://github.com/fedora-iot/greenboot/ if you haven't already.

@jlebon a rollback in this context is, we updated but we haven't booted into that version because it's unhealthy for some reason. We are using greenboot to at least mark a slot as healthy, this user wants to rollback but with a caveat, they want in this case to boot into an alternate systemd boot target.

@cgwalters
Copy link
Member

/ostree/root.expected is a new symlink I will add.

I'd really like to not add things that are specific just to androidboot or other bootloaders; I think a key goal of ostree here is to abstract over these things. We definitely want functionality similar to this across many OS variants and footprints.

ostree admin status --print-state or something that looks similarly to how systemctl is-running behaves to start seems like it should suffice without getting us into the big general problem of an ostree extensible API and without adding new "APIs" that are filesystem values.

@ericcurtin
Copy link
Collaborator Author

ExecStart=systemctl --no-block isolate rescue.target

(I think it's better instead to change the default target from a generator...i.e. the rollback detection happens in a generator. It's a bit cleaner.)

@cgwalters we can do that in a generator (it will be my first generator I have written 😄 ), but is the /etc writable at that point with an overlay (this thing will have read-only /etc)?

Note the systemctl --no-block isolate technique comes from initrd-cleanup.service , it's not completely new.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 3, 2024

I'd like to use something more generic baked into rpm-ostree/libostree, but the CLIs etc. are too slow for this check, this really has to be a handful of milliseconds thing.

I am not sure how an ostree CLI invocation would be a slow point of this flow; I can't imagine /bin/ostree is significantly slower than ExecStart=/bin/systemctl etc.

The CLI's are not slow for normal usage, but when you want to use them in the boot sequence and the aim is a 2-second boot, some of the options are not fast enough. The json output for example is parsable, but too slow.

rpm-ostree status is ~150ms on my x86 machine here on Silverblue and ostree admin status is ~350ms

That's not to say we couldn't make ostree admin status --is-rollback, etc. super-fast

How about this for an idea, instead of my previous proposal, we do a:

/ostree/deploy
/ostree/root.expected -> deploy/centos/deploy/76b0919b393aa2254dd4b72950514422bb213992c3885f2196ec9cb278c57c5a.0

symlink on each deployment, that way we have a generic fast thing to check?

@ericcurtin
Copy link
Collaborator Author

One of the things I like about not using a generator, is I can keep this stuff out of the initrd/initoverlayfs (although initoverlayfs doesn't really have a boot cost), I like to do stuff in the rootfs if at all possible.

@cgwalters
Copy link
Member

Generators don't write to /etc, they write to /run.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 4, 2024

Taking a quick read around, this is the precedence order:

/usr/lib/systemd/system/
/run/systemd/system/
/etc/systemd/system/

the /etc one takes precedence over /run and /run takes precedence over /usr.

Given the symlinks are set up like this by default:

$ echo "initrd:"; lsinitrd | grep default.target; echo -e "\nrootfs"; ls -ltr /etc/systemd/system/default.target /usr/lib/systemd/system/default.target
initrd:
lrwxrwxrwx   1 root     root           37 Mar 14  2023 etc/systemd/system/default.target -> /usr/lib/systemd/system/initrd.target
lrwxrwxrwx   1 root     root           13 Mar 14  2023 usr/lib/systemd/system/default.target -> initrd.target

rootfs
lrwxrwxrwx. 5 root root 16 Jan  3 15:05 /usr/lib/systemd/system/default.target -> graphical.target
lrwxrwxrwx. 1 root root 40 Jan  3 15:09 /etc/systemd/system/default.target -> /usr/lib/systemd/system/graphical.target

even if we generated the symlink in /run wouldn't the /etc one take precedence anyway, make the /run symlink redundant?

Of course we could just remove the /etc symlinks altogether to resolve this.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 4, 2024

But to take this in a step by step process, I'm gonna look into creating a:

/ostree/root.expected -> deploy/centos/deploy/76b0919b393aa2254dd4b72950514422bb213992c3885f2196ec9cb278c57c5a.0

type of symlink on all deploys as a quick check to see if we have rolled back or not in future boots. If this sounds ok?

After this I guess look into:

ostree-prepare-root is-rollback

after as step 2, that leaves us open to doing in the initrd, initoverlayfs or rootfs, plus ostree-prepare-root knows the little difference between checking on an aboot system vs a UEFI system, what we are to boot into.

Then step 3 is to write the generator (or equivalent solution).

step 4 is make it configurable I guess? Or maybe even just have the generator/systemd service file as a separate rpm to solve that problem.

@cgwalters
Copy link
Member

I don't understand what value root.expected provides here - the "source of truth" for ostree is the bootloader entries, really everything is oriented around that. So root.expected is just "deployment 0" right?

@ericcurtin
Copy link
Collaborator Author

Yeah it's just "deployment 0".

I could parse BLS files, if there's another fast way of detecting the last deployed version of the operating system, very open to ideas.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 4, 2024

So in this case:

ostree-prepare-root is-rollback

BLS parsing functionality must be added to this binary, in order to detect if we are booting deployment 0 or not early boot.

@cgwalters
Copy link
Member

Oh sorry, you want to parse this stuff from the initramfs. OK yes, all that code today is in libostree, which we aren't linking to from the initramfs code, because among other things it would drag in HTTP libraries.

Also to do this, we'd have to mount the /boot filesystem. Mmmmmm...messy. I see why you are arguing for a symlink. The other alternative is for us to inject a kernel argument like ostree.default=1 (or perhaps even better, change the name of our ostree= karg in this case because the linux kernel warns about it today, so we'd generate ostree.default=/path/to/root and everything else is ostree=)

My main concern here is just to keep the BLS entries as source of truth for system state, and a symlink is extra persistent state.

even if we generated the symlink in /run wouldn't the /etc one take precedence anyway, make the /run symlink redundant? Of course we could just remove the /etc symlinks altogether to resolve this.

Right, in an image-based world it is definitely better to have the default target be in /usr, and /etc should mostly be local overrides.

That said, it is possible for generators to override; from man systemd.generator:

    2. early-dir
      In normal use this is /run/systemd/generator.early in case of the system generators and $XDG_RUNTIME_DIR/systemd/generator.early in case of the user generators. Unit files placed in this directory override unit files in /usr/, /run/ and /etc/.
      This means that unit files placed in this directory take precedence over all normal configuration, both vendor and user/administrator.

@cgwalters
Copy link
Member

But going back to the generator point...I think given that we can change the default target via a generator in the real root, there's no need to do this in ostree-prepare-root, right? And so we have the full flexibility and software of the real root to do things here.

@ericcurtin
Copy link
Collaborator Author

Oh sorry, you want to parse this stuff from the initramfs. OK yes, all that code today is in libostree, which we aren't linking to from the initramfs code, because among other things it would drag in HTTP libraries.

I am open to doing this from the rootfs (I would have preferred this because rootfs allows you to use more generic ostree stuff). But if you do it in a generator, you are forced to do this in initrd, because the generators run so early?

Also to do this, we'd have to mount the /boot filesystem. Mmmmmm...messy. I see why you are arguing for a symlink. The other alternative is for us to inject a kernel argument like ostree.default=1 (or perhaps even better, change the name of our ostree= karg in this case because the linux kernel warns about it today, so we'd generate ostree.default=/path/to/root and everything else is ostree=)

I don't fully understand this approach, only the client knows what the latest deployed version is. If you are using Android Boot Image or UKI the client side cannot manipulate any karg, because it breaks the signature. I may be misinterpreting this approach though.

An initrd can't be altered client-side either, for the same reason.

My main concern here is just to keep the BLS entries as source of truth for system state, and a symlink is extra persistent state.

I've never written a generator before so again, apologies if I'm misinterpreting the use for them. They kinda seem inflexible because they run so early. And I don't fully understand why we need the generator dynamic functionality, because you can do the same thing in a simple systemd service file but you have the choice of putting it anywhere in the boot sequence.

even if we generated the symlink in /run wouldn't the /etc one take precedence anyway, make the /run symlink redundant? Of course we could just remove the /etc symlinks altogether to resolve this.

Right, in an image-based world it is definitely better to have the default target be in /usr, and /etc should mostly be local overrides.

Yeah this makes sense to me, the /etc symlinks should probably go away by default in an image based world. I think this is just something inherited by the ostree family of distros, because these default.target symlinks are in /etc in non-ostree distros.

That said, it is possible for generators to override; from man systemd.generator:

    2. early-dir
      In normal use this is /run/systemd/generator.early in case of the system generators and $XDG_RUNTIME_DIR/systemd/generator.early in case of the user generators. Unit files placed in this directory override unit files in /usr/, /run/ and /etc/.
      This means that unit files placed in this directory take precedence over all normal configuration, both vendor and user/administrator.

@cgwalters
Copy link
Member

because these default.target symlinks are in /etc in non-ostree distros.

I think that's mainly because Anaconda always writes it, which it should stop doing.

@cgwalters
Copy link
Member

I've never written a generator before so again, apologies if I'm misinterpreting the use for them. They kinda seem inflexible because they run so early. And I don't fully understand why we need the generator dynamic functionality, because you can do the same thing in a simple systemd service file but you have the choice of putting it anywhere in the boot sequence.

Yes, I believe your unit will generally work fine. The main downside is that some units may still be launched from the default target that wouldn't be if we'd explicitly started the desired target from very early on.

Anyways so I think we're iterating towards just having a raw ostree API like ostree admin status --query-booted or so that would output default not-default to start, and then other units could make a decision based on that?

Actually, to simplify this even more, we can change ostree-system-generator to write /run/ostree/booted-not-default if it detects this situation, then your unit can just do ConditionPathExists=/run/ostree/booted-not-default or so?

@ericcurtin
Copy link
Collaborator Author

I've never written a generator before so again, apologies if I'm misinterpreting the use for them. They kinda seem inflexible because they run so early. And I don't fully understand why we need the generator dynamic functionality, because you can do the same thing in a simple systemd service file but you have the choice of putting it anywhere in the boot sequence.

Yes, I believe your unit will generally work fine. The main downside is that some units may still be launched from the default target that wouldn't be if we'd explicitly started the desired target from very early on.

Anyways so I think we're iterating towards just having a raw ostree API like ostree admin status --query-booted or so that would output default not-default to start, and then other units could make a decision based on that?

Actually, to simplify this even more, we can change ostree-system-generator to write /run/ostree/booted-not-default if it detects this situation, then your unit can just do ConditionPathExists=/run/ostree/booted-not-default or so?

This means ostree-system-generator has to call ostree admin status --query-booted to do the query, which means pulling the world into initrd (which we may be able to do without cost with initoverlayfs, not 100% sure yet, it's not ready yet). We can't really afford to pull the world in, at least in Automotive because of tight boot time metrics. And you have to mount /boot as you said.

Anyway I'll start trying to take a stab at ostree admin status --query-booted thanks for the feedback @cgwalters @jlebon

@ericcurtin
Copy link
Collaborator Author

If it's as slow as this command, it's not gonna fly in our boot stack, but maybe it's the parts we don't need in this command that are slow:

time sudo ostree admin status
  fedora 23a97c7c6a9c4ed500a41a1a46c807f0a4617d00866582a36f5873f406bb15ab.0 (staged)
    Version: 38.20240104.0
    origin refspec: fedora:fedora/38/x86_64/silverblue
    GPG: Signature made Thu 04 Jan 2024 00:55:29 using RSA key ID 809A8D7CEB10B464
    GPG: Good signature from "Fedora <fedora-38-primary@fedoraproject.org>"
* fedora ae8facf4c86ef40b54f2cae494a95d8349131d24aa6d4af6697d37a39fab50f0.0
    Version: 38.1.6
    origin refspec: fedora:fedora/38/x86_64/silverblue
    GPG: Signature made Thu 13 Apr 2023 20:07:30 using RSA key ID 809A8D7CEB10B464
    GPG: Good signature from "Fedora <fedora-38-primary@fedoraproject.org>"

real	0m0.613s
user	0m0.059s
sys	0m0.113s

@cgwalters
Copy link
Member

cgwalters commented Jan 4, 2024

That's very expensive because it involves gpg verification, the bulk cost of which is parsing all the keys in /etc/pki/rpm-gpg. This has caused problems in other times.

We definitely don't need to do it for this case.

I also discovered a really longstanding bug when digging into this #3131

And I just pushed a commit there; try out time ostree admin status -S.

@cgwalters
Copy link
Member

Hmm in this I keep having to refresh my knowledge on androidboot...can you write up something in docs/bootloaders.md about this? Or we can collaborate in a hackmd or so.

I find myself a bit confused now as to what actually writes the /ostree/root.a|b (symlinks?).
Is it https://gitlab.com/CentOS/automotive/rpms/aboot-update/ ? But it doesn't seem to...

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 4, 2024

It's the https://gitlab.com/CentOS/automotive/rpms/aboot-deploy on the client-side, that does this.

https://gitlab.com/CentOS/automotive/rpms/aboot-update creates the Android Boot Image repo/server -side.

I will write start a hackmd on this.

@ericcurtin
Copy link
Collaborator Author

As requested @cgwalters:

https://hackmd.io/@7wAdmxHWRI6dhoDAwOtrpw/ByAeouBua

we could go into more detail if required.

The intent is to put this under the ## OSTree and grub section in docs/bootloaders.md.

@cgwalters
Copy link
Member

Actually, to simplify this even more, we can change ostree-system-generator to write /run/ostree/booted-not-default if it detects this situation, then your unit can just do ConditionPathExists=/run/ostree/booted-not-default or so?

We can't do this in the generator as /boot may not be mounted then. But we already have a unit which is pretty close to what we need: ostree-boot-complete.service that was added for informational purposes. I did #3133 which will help in running this code very early on - then we can have it write out the simple file in /run e.g. that is used as a condition for other logic.

@cgwalters
Copy link
Member

The intent is to put this under the ## OSTree and grub section in docs/bootloaders.md.

Can you DM me a writable link to that hackmd?

@ericcurtin
Copy link
Collaborator Author

So I created this as a reference rpm that can be included in an image build to turn on this feature.

https://github.com/ericcurtin/ostree-rollback-to-rescue

Maybe it's best to leave this as a reference rpm for people to base their own packages on. Or we could integrate in Fedora, CentOS Automotive repos directly somehow.

In Automotive for example, it's possible that someone creates their own target and UI for rescue mode.

We could regard this issue closed at this point.

@ericcurtin
Copy link
Collaborator Author

On completion of this PR, we can close this issue: #3171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/greenboot Issues related to greenboot integration reward/medium Fixing this will be notably useful triaged This issue has been evaluated and is valid
Projects
None yet
Development

No branches or pull requests

3 participants