apply: Use staged deployment on booted systems #298

dbnicholson · 2022-01-05T18:42:51Z

When finalizing an OSTree deployment, the current /etc is merged with
the new commit's /usr/etc. Any changes that happen in the current
/etc after the deployment has been finalized will not appear in the
new deployment. Since eos-updater is often run in the background, it's
likely the user will make changes in /etc (such as creating a new
user) long before the new deployment is booted into.

To address this issue, OSTree has provided the concept of a staged
deployment since 2018.5. The new deployment is initialized but not
finalized during shutdown via the ostree-finalize-staged.service
systemd unit. Since staged deployments only work on OSTree booted
systems that can initiate systemd units, this can't really work in the
current test suite. The old full deployment method is kept for that
case.

Note that staged deployment finalization depends on the
ostree-finalize-staged.path systemd unit being activated. Currently,
OSTree does this on demand but in the future it may require the OS to
explicitly activate the unit via a systemd preset or similar mechanism.

https://phabricator.endlessm.com/T5658

dbnicholson · 2022-01-05T18:44:44Z

I tested this in a VM and it works. However, one thing I noticed that's lost is the repo pruning. The deployment is now written out in the shutdown path and ostree makes the decision not to trigger pruning there. I think we want to add a call to ostree_sysroot_cleanup_prune_repo somewhere, but I don't know where yet. I'm going to see where they do this in rpm-ostree.

wjt

LGTM, modulo the prune question.

dbnicholson · 2022-01-05T21:46:30Z

Hmm, CI hit a test timeout, which I'd also seen in OBS. There's currently a 360 second timeout for the slower tests and I routinely see the flatpak install test take nearly 300 seconds on my fast laptop. I think I'll add a commit to bump that to 600 seconds.

dbnicholson · 2022-01-05T21:51:02Z

More detail in phabricator, but for pruning it appears that rpm-ostree just does the normal cleanup once the deployment is staged. Since the rollback deployment hasn't been deleted at that point (it's deleted when finalizing), you don't get the disk space back immediately. That would happen the next time you apply an update (with the current code).

If we want to be more aggressive about reclaiming the disk space, we'd have to add another ostree_sysroot_cleanup in an earlier state. One thought I had was to do it when polling indicates no updates. But that could thrash the disk unnecessarily often.

When finalizing an OSTree deployment, the current `/etc` is merged with the new commit's `/usr/etc`. Any changes that happen in the current `/etc` after the deployment has been finalized will not appear in the new deployment. Since eos-updater is often run in the background, it's likely the user will make changes in `/etc` (such as creating a new user) long before the new deployment is booted into. To address this issue, OSTree has provided the concept of a staged deployment since 2018.5. The new deployment is initialized but not finalized until shutdown via the `ostree-finalize-staged.service` systemd unit. Since staged deployments only work on OSTree booted systems that can initiate systemd units, this can't really work in the current test suite. The old full deployment method is kept for that case. Note that staged deployment finalization depends on the `ostree-finalize-staged.path` systemd unit being activated. Currently, OSTree does this on demand but in the future it may require the OS to explicitly activate the unit via a systemd preset or similar mechanism. https://phabricator.endlessm.com/T5658

On our CI and package builders `test-update-install-flatpaks` often exceeds the 360 second timeout. Even on my fast laptop it routinely takes nearly 300 seconds. Bump the timeout for slow tests to 600 seconds to ensure it has time to complete. https://phabricator.endlessm.com/T5658

wjt · 2022-01-05T21:56:44Z

An idea I haven't thought through:

eos-updater-cleanup.service, which (calls a D-Bus method on eos-updater which) runs ostree_sysroot_cleanup, and is disabled by default
eos-updater-trigger-cleanup.service, enabled with ConditionNeedsUpdate=/etc, which just runs systemctl enable --runtime eos-updater-cleanup.service

wjt · 2022-01-05T21:57:30Z

I'm sure the answer is "not easily" but can the Flatpak install test be made to take less than 300 seconds, perhaps with a crude implement such as eatmydata or running entirely on tmpfs?

dbnicholson · 2022-01-05T21:59:26Z

I'm sure the answer is "not easily" but can the Flatpak install test be made to take less than 300 seconds, perhaps with a crude implement such as eatmydata or running entirely on tmpfs.

I would like that since the test suite turnaround is slow for development. I need to figure out where it's taking so much time. I added the commit here which we can banter about. I also punted on the more aggressive pruning, which we can also discuss.

dbnicholson · 2022-01-05T22:13:17Z

An idea I haven't thought through:

eos-updater-cleanup.service, which (calls a D-Bus method on eos-updater which) runs ostree_sysroot_cleanup, and is disabled by default

eos-updater-trigger-cleanup.service, enabled with ConditionNeedsUpdate=/etc, which just runs systemctl enable --runtime eos-updater-cleanup.service

I like this. It could be even simpler since ostree admin cleanup is just a wrapper around ostree_sysroot_cleanup. Any reason not to eos-updater-cleanup.service have the ConditionNeedsUpdate=/etc directly?

wjt · 2022-01-05T22:27:24Z

My thinking was that don't want it to block the boot process, c.f. eos-split-flatpak-repos.service, so it wouldn't be triggered early enough. But perhaps it could be Type=simple so systemd doesn't wait for it to finish. Add a sprinkling of IOSchedulingClass=idle, ...

dbnicholson · 2022-01-05T22:45:55Z

My thinking was that don't want it to block the boot process, c.f. eos-split-flatpak-repos.service, so it wouldn't be triggered early enough. But perhaps it could be Type=simple so systemd doesn't wait for it to finish. Add a sprinkling of IOSchedulingClass=idle, ...

Ah, right. Especially since this has to run Before=systemd-update-done.service, so it definitely would block early boot.

dbnicholson · 2022-01-06T13:36:20Z

Another idea that's a little simpler to avoid ConditionNeedsUpdate is to use a separate stamp file. After all, eos-updater knows exactly when the system is updated. For instance, it could create /sysroot/.cleanup and then we would have a systemd unit like so:

[Unit]
ConditionPathExists=/sysroot/.cleanup

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/ostree admin cleanup
ExecStartPost=/bin/rm -f /sysroot/.cleanup

In order to not get it to block multi-user.target, you could use path unit. Or we could just set DefaultDependencies=no so that multi-user.target doesn't automatically add After to it.

I actually think the above would be useful upstream as a complement to the existing ostree-finalize-staged.service.

wjt · 2022-01-06T13:44:40Z

Perhaps it should put the stamp file into /etc in the new deployment? Otherwise, consider these steps:

Boot into deployment A
Stage deployment B, write stamp file
Unclean shutdown
Boot into deployment A
ostree admin cleanup runs, stamp file removed
Shut down cleanly. This causes B to be finalized
Boot into B. No cleanup

dbnicholson · 2022-01-06T16:20:52Z

Good point about /etc and an unclean shutdown. I opened ostreedev/ostree#2510 upstream. If the file was written out by the ostree finalization routine, then it could be in /sysroot.

wjt

This branch LGTM. Do you want to wait until you have a version of the cleanup-on-next-boot logic implemented or land this and have that as a follow-up?

dbnicholson · 2022-01-10T16:54:35Z

It looks like ostreedev/ostree#2510 is probably not imminent, so I'm going to do something similar downstream. I think I'm going to try to address it here, but if I get stuck I think we can merge this and fix the pruning in a follow up.

dbnicholson · 2022-01-11T18:32:30Z

I added a commit to handle cleanup with a drop-in for ostree-finalize-staged.service + a new eos-updater-autocleanup.service unit. For the latter it's just running ostree admin cleanup and rm -f /sysroot/.cleanup, but I could also write a small utility in eos-updater that handles /sysroot/.cleanup itself.

I tested this in a VM and it seems to DTRT. On my SSD system it took 5 seconds and didn't block gdm:

Jan 11 11:22:14 endless systemd[1]: Starting Automatically cleanup after staged Endless OS Updater deployment...
...
Jan 11 11:22:18 endless systemd[1]: Reached target Multi-User System.
Jan 11 11:22:18 endless systemd[1]: Reached target Graphical Interface.
...
Jan 11 11:22:19 endless systemd[1]: Finished Automatically cleanup after staged Endless OS Updater deployment.

wjt

Lgtm

wjt · 2022-01-11T18:51:58Z

eos-updater/eos-updater-autocleanup.service.in

+
+# Only /sysroot and /boot need to be written to.
+ProtectSystem=strict
+ReadWritePaths=/sysroot /boot


But if you can write to /sysroot, what can't you write to on the main disk?

True. I guess the only way to really narrow this is to make it it truly pruning only and then you could limit it to /sysroot/repo. ostree admin cleanup does do other things in /sysroot/boot, /sysroot/ostree/boot.* and /sysroot/ostree/deploy.

While it's true that everything is really under /sysroot, I don't believe you meant this as a blocker. It does mean that inadvertent writing to /etc isn't possible. Unless you have objections, I think we should just carry on with this.

When OSTree staged deployments are used, the old rollback deployment is deleted during system shutdown. To keep from slowing down shutdown, the OSTree repo is not pruned at that time. That means that even though the deployment was deleted, the objects are still on disk. Since that may be a significant amount of wasted disk space, the full cleanup with repo pruning needs to be run at some time after rebooting. See ostreedev/ostree#2510 for details. To detect when cleanup is necessary, a systemd drop in is added to touch the `/sysroot/.cleanup` file after `ostree-finalize-staged.service` has finalized the new deployment. The reason to use a drop-in for `ostree-finalize-staged.service` rather then creating the file from `eos-updater` is to avoid the situation where an unclean shutdown occurs and the new deployment is not finalized. In that case, cleanup would be run unnecessarily on the next boot. A new systemd service, `eos-updater-autocleanup.service`, is added to run `ostree admin cleanup` when `/sysroot/.cleanup` exists and then delete it afterwards. This adds a dependency on the `ostree` CLI but a separate program could be provided calling the `ostree_sysroot_cleanup` API and deleting the `/sysroot/.cleanup` file itself. https://phabricator.endlessm.com/T5658

dbnicholson · 2022-01-12T17:53:17Z

Something I thought of later is that the ostree_sysroot_cleanup run after the deployment is initially staged has basically no effect with the addition of the autocleanup handling. In particular, it will run a repo prune that is unlikely to prune anything since the refs for the old rollback deployment haven't been deleted yet. I added a comment that it could be skipped in that case. There's no harm to keeping it except that there's a process hammering your disk for likely no gain.

dbnicholson requested review from pwithnall and wjt January 5, 2022 18:42

dbnicholson marked this pull request as draft January 5, 2022 18:43

dbnicholson force-pushed the T5658-ostree-staged-deployment branch 2 times, most recently from 8bcc0eb to 4e943ea Compare January 5, 2022 18:57

wjt reviewed Jan 5, 2022

View reviewed changes

dbnicholson added 2 commits January 5, 2022 14:54

dbnicholson force-pushed the T5658-ostree-staged-deployment branch from 4e943ea to a49c5e9 Compare January 5, 2022 21:57

dbnicholson marked this pull request as ready for review January 5, 2022 21:57

wjt approved these changes Jan 10, 2022

View reviewed changes

wjt approved these changes Jan 11, 2022

View reviewed changes

dbnicholson force-pushed the T5658-ostree-staged-deployment branch from b5c457e to 9fa6754 Compare January 12, 2022 17:41

dbnicholson mentioned this pull request Jan 12, 2022

Sysroot auto cleanup ostreedev/ostree#2511

Open

wjt merged commit 5c7e97e into master Jan 13, 2022

wjt deleted the T5658-ostree-staged-deployment branch January 13, 2022 09:20

dbnicholson mentioned this pull request Feb 16, 2022

Finalizing staged deployments broken on /boot automount ostreedev/ostree#2543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply: Use staged deployment on booted systems #298

apply: Use staged deployment on booted systems #298

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

wjt left a comment

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

wjt commented Jan 5, 2022

wjt commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

wjt commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 6, 2022

wjt commented Jan 6, 2022 •

edited

Loading

dbnicholson commented Jan 6, 2022

wjt left a comment

dbnicholson commented Jan 10, 2022

dbnicholson commented Jan 11, 2022

wjt left a comment

wjt Jan 11, 2022

dbnicholson Jan 11, 2022

dbnicholson Jan 12, 2022

wjt Jan 13, 2022

dbnicholson commented Jan 12, 2022

apply: Use staged deployment on booted systems #298

apply: Use staged deployment on booted systems #298

Conversation

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

wjt left a comment

Choose a reason for hiding this comment

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

wjt commented Jan 5, 2022

wjt commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

wjt commented Jan 5, 2022

dbnicholson commented Jan 5, 2022

dbnicholson commented Jan 6, 2022

wjt commented Jan 6, 2022 • edited Loading

dbnicholson commented Jan 6, 2022

wjt left a comment

Choose a reason for hiding this comment

dbnicholson commented Jan 10, 2022

dbnicholson commented Jan 11, 2022

wjt left a comment

Choose a reason for hiding this comment

wjt Jan 11, 2022

Choose a reason for hiding this comment

dbnicholson Jan 11, 2022

Choose a reason for hiding this comment

dbnicholson Jan 12, 2022

Choose a reason for hiding this comment

wjt Jan 13, 2022

Choose a reason for hiding this comment

dbnicholson commented Jan 12, 2022

wjt commented Jan 6, 2022 •

edited

Loading