Add a repo option to auto-prune other deployments (e.g. rollback) when starting upgrade #2670

cgwalters · 2022-07-08T15:12:12Z

In RHCOS we're running up against space constraints https://bugzilla.redhat.com/show_bug.cgi?id=2104619

I think we should support something like

[sysroot]
deployments-max=2

This would tell ostree to auto-prune the rollback deployment (and others) when starting an upgrade.

The text was updated successfully, but these errors were encountered:

travier · 2022-07-08T15:57:03Z

This would also help with an often requested feature where some folks want to keep more than 2 deployments by default.

dbnicholson · 2022-07-08T16:05:25Z

Isn't that already the logic for sysroot cleanup, or are you basically describing #2510? Or do you mean you don't want the temporary ballooning to 3 deployments prior to rebooting into the new deployment? It would be nice to have a config option for the number of deployments to keep, though.

jmarrero · 2022-07-08T16:13:41Z

I think the intent is to allow the user to set the number of deployments to keep. So if it's set to 1, you still would temporary balloon to 2 prior rebooting.

cgwalters · 2022-07-08T16:44:00Z

Or do you mean you don't want the temporary ballooning to 3 deployments prior to rebooting into the new deployment?

Yep, this.

travier · 2022-07-08T17:04:32Z

If I read this correctly, the number could never be less than 2.

jmarrero · 2022-07-08T17:04:42Z

Ahh, I guess that makes sense. But that means the minimum number is 2 right?

dbnicholson · 2022-07-08T17:14:05Z

Bikeshedding, but I think I'd expect the maximum deployments to be the non-ballooned value. The ballooned number of deployments is a temporary implementation detail. As a user it would bug me that I said deployments-max=2 but there was only 1 deployment under normal circumstances.

Also, when you're in the ballooned situation, one of the deployments is shown as staged. I think it would be reasonable to interpret deployments-max as the maximum number of non-staged deployments on disk.

In other words, I think the deployments-max minimum can be 1 and that means you'll normally have 1 deployment on disk and temporarily 2 during an upgrade.

travier · 2022-07-08T17:17:35Z

I think the idea is to have 2 mean that we keep 2 deployments on the disk, and when performing an upgrade, just before deploying the new one, we remove the old one to keep ourselves under 2 deployments for disk usage.

dbnicholson · 2022-07-08T17:24:22Z

Right, I thought of that while after I walked away.

They're actually 2 orthogonal concepts to me. Specifying the maximum number of deployments allows you to say you want no rollback deployment or more than 1 rollback deployment. Saying that you want to delete a rollback deployment before upgrading so that the number of deployments is strictly capped is slightly different.

For example, does deployments-max=2 mean that you should pre-delete the rollback deployment or that you want to allow ballooning to 2 deployments and only have 1 under normal circumstances.

So, I really think there are 2 knobs you want:

deployments-max - The number of active + rollback deployments to keep under normal circumstances.
pre-delete-rollback (can't think of a better name at the moment) - Whether to delete a rollback deployment prior to pulling a new deployment.

I'd say for the RHCOS bug you want the second knob. I.e., don't worry about changing the number of deployments right now, but allow systems to opt in to aggressively pruning the rollback deployment before upgrading to keep disk space constrained.

cgwalters · 2022-07-08T17:36:00Z

Yeah fair. I think they're strongly related, but yes, viewing them orthogonally makes sense too.

Perhaps we start with just prune-rollback-on-upgrade=true.

But...there are corner cases here, specifically: what happens if there are more than 2 deployments (do we only remove 1?) - this is really the grey area between deployments-max and prune-rollback-on-upgrade=true.

(Also, another corner cases is "what happens if the rollback is pinned?" but I think we should probably silently have the pin win)

dbnicholson · 2022-07-08T17:58:27Z

Good points. I think if there are pinned deployments, they should just be ignored for the purposes of pruning deployments. For more than 2 non-pinned deployments, I think they should all be removed. Basically, the same thing ostree_sysroot_cleanup would do after finalizing a new deployment. But I don't have all the logic in front of me, so consider that pretty handwavey.

jlebon · 2022-07-08T20:35:49Z

One risk with this approach worth highlighting is that a regression in the upgrade path where something fails after the rollback cleanup could result in the host with no means of going back to a deployment with working upgrade code. (E.g. failing to merge /etc, or copy binaries into /boot, or update the bootloader.)

jmarrero · 2022-07-08T20:43:37Z

How so? If the current deployment works. You got the one you are executing the upgrade from "safe" from deleting.

travier · 2022-07-11T09:01:21Z

If you update from A into a broken code base B that is not capable of doing updates past the cleanup stage, then once you remove A to make room for C with the B code, and then the update fails, you are stuck with B with no rollback option.

jlebon · 2022-07-11T15:33:39Z

Right. This is why we have upgrade tests. Container Linux was especially susceptible to this with its A/B partition update scheme. If an update bug happened before the secondary partition was nuked, you could just rollback. But if it happened after, you'd be stuck (see e.g. coreos/bugs#2457 (comment)).

libostree is better in this regard by only cleaning up the rollback after most fallible operations were done. This would re-introduce some of that fallibility. The tradeoff might be worth it though in scenarios where reprovisioning is easier such as clusters.

cgwalters · 2022-07-11T15:51:20Z

OK I'd like to propose that this option is:

[sysroot]
auto-cleanup=space-limited

Basically we only do the cleanup if doing so would allow us to install a kernel/initramfs when we otherwise couldn't.

dustymabe · 2022-08-12T23:45:32Z

Basically we only do the cleanup if doing so would allow us to install a kernel/initramfs when we otherwise couldn't.

This sounds ideal... Almost like it should be the default, though? We only take this (slightly more risky) code path if we couldn't succeed otherwise (not enough space).

dustymabe · 2022-09-28T16:30:07Z

Another thing we could do to mitigate risk is to move the old kernel/initrd to a different filesystem (tmpfs or any kind of tmp) rather than deleting it. Upon failure we could attempt to restore the original files back.

I think it would be nice to make progress on this sooner than later as it appears the compression mitigation might not be enough for ppc64le in FCOS.

cgwalters · 2022-10-05T16:40:29Z

I briefly looked at this, it's quite messy due to the internal design of trying to do a "transactional swap" of the deployments - we end up needing something like a "pre-pass". Or maybe the higher level code can pass down a separate "list of deployments to keep if you can". Needs a bit of thought/design.

ericcurtin · 2023-02-02T17:02:35Z

Just curious @cgwalters @jmarrero what are the options for the usecase today where you just want to store 2 rollbacks or 3, etc. ?

jlebon · 2023-02-15T18:06:09Z

Is the complexity in handling this arising from trying to handle ENOSPC at the last minute and unwinding? Another approach that should be more tractable is doing space calculations up front and engaging the auto-prune behaviour if (new kernel + new initrd) - (old kernel + old initrd) > space remaining. That's along the lines of a "pre-pass" as suggested by @cgwalters above.

cgwalters · 2023-02-15T18:18:29Z

Yeah, agree that's easier.

dustymabe · 2023-02-23T19:37:54Z

I just hit a similar problem here in rawhide on an aarch64 machine. I was running sudo rpm-ostree override replace --reboot https://bodhi.fedoraproject.org/updates/FEDORA-2023-22011eaa7c to test a kernel (this one happened to be a debug kernel so it's larger in size than usual).

After reboot I see the update didn't apply and:

[core@cosa-devsh ~]$ systemctl --failed
  UNIT                         LOAD   ACTIVE SUB    DESCRIPTION         
● ostree-boot-complete.service loaded failed failed OSTree Complete Boot

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
[core@cosa-devsh ~]$ 
[core@cosa-devsh ~]$ journalctl -b0 -u ostree-boot-complete.service 
Feb 23 19:32:31 localhost systemd[1]: Starting ostree-boot-complete.service - OSTree Complete Boot...
Feb 23 19:32:31 localhost ostree[838]: error: ostree-finalize-staged.service failed on previous boot: Installing kernel: Copying sun50i-a64-amarula-relic.dtb: regfile copy: No space left on device
Feb 23 19:32:31 localhost systemd[1]: ostree-boot-complete.service: Main process exited, code=exited, status=1/FAILURE
Feb 23 19:32:31 localhost systemd[1]: ostree-boot-complete.service: Failed with result 'exit-code'.
Feb 23 19:32:31 localhost systemd[1]: Failed to start ostree-boot-complete.service - OSTree Complete Boot.
[core@cosa-devsh ~]$ df -kh /boot/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3       350M  346M     0 100% /boot

During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_EXP_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning. This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 Closes: ostreedev#2670

During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_ENABLE_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning (see [[2]] and following). This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 [2]: ostreedev#2670 (comment) Closes: ostreedev#2670

jmarrero self-assigned this Jul 8, 2022

cgwalters mentioned this issue Jul 8, 2022

Boot partition can easily run out of space on upgrade coreos/fedora-coreos-tracker#1247

Closed

cgwalters mentioned this issue Jul 11, 2022

Bug 2104619: Remove rollback deployment openshift/machine-config-operator#3243

Merged

cgwalters self-assigned this Oct 3, 2022

jmarrero removed their assignment Oct 5, 2022

jlebon assigned jlebon and unassigned cgwalters Mar 27, 2023

jlebon added the jira label Mar 27, 2023

dustymabe mentioned this issue Apr 12, 2023

Delete Qualcomm DTBs to free up space on aarch64 coreos/fedora-coreos-tracker#1464

Closed

jlebon mentioned this issue Apr 13, 2023

lib/sysroot-deploy: Add experimental support for automatic early prune #2847

Merged

cgwalters closed this as completed in #2847 May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a repo option to auto-prune other deployments (e.g. rollback) when starting upgrade #2670

Add a repo option to auto-prune other deployments (e.g. rollback) when starting upgrade #2670

cgwalters commented Jul 8, 2022

travier commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

jmarrero commented Jul 8, 2022 •

edited

Loading

cgwalters commented Jul 8, 2022

travier commented Jul 8, 2022

jmarrero commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

travier commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

cgwalters commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

jlebon commented Jul 8, 2022

jmarrero commented Jul 8, 2022

travier commented Jul 11, 2022

jlebon commented Jul 11, 2022

cgwalters commented Jul 11, 2022

dustymabe commented Aug 12, 2022

dustymabe commented Sep 28, 2022

cgwalters commented Oct 5, 2022

ericcurtin commented Feb 2, 2023

jlebon commented Feb 15, 2023

cgwalters commented Feb 15, 2023

dustymabe commented Feb 23, 2023 •

edited

Loading

Add a repo option to auto-prune other deployments (e.g. rollback) when starting upgrade #2670

Add a repo option to auto-prune other deployments (e.g. rollback) when starting upgrade #2670

Comments

cgwalters commented Jul 8, 2022

travier commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

jmarrero commented Jul 8, 2022 • edited Loading

cgwalters commented Jul 8, 2022

travier commented Jul 8, 2022

jmarrero commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

travier commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

cgwalters commented Jul 8, 2022

dbnicholson commented Jul 8, 2022

jlebon commented Jul 8, 2022

jmarrero commented Jul 8, 2022

travier commented Jul 11, 2022

jlebon commented Jul 11, 2022

cgwalters commented Jul 11, 2022

dustymabe commented Aug 12, 2022

dustymabe commented Sep 28, 2022

cgwalters commented Oct 5, 2022

ericcurtin commented Feb 2, 2023

jlebon commented Feb 15, 2023

cgwalters commented Feb 15, 2023

dustymabe commented Feb 23, 2023 • edited Loading

jmarrero commented Jul 8, 2022 •

edited

Loading

dustymabe commented Feb 23, 2023 •

edited

Loading