Skip to content

Conversation

@learmj
Copy link
Collaborator

@learmj learmj commented Oct 27, 2025

Make each system slot rootfs read-only and move mutable state to a persistent partition, with per-slot isolation for /var and shared identity/data/logs across slots.

Key changes:

  • Root immutability:
    • Mount / as ro; no remounting rw during boot.
  • Per-slot /var:
    • Add early systemd generator to mount persistent storage at /persistent and bind /var to /persistent/slots//var based on the active slot. Requires GPT PARTLABEL support.
  • Shared home:
    • Bind /home to /persistent/home so user data is retained across slot rotations.
  • Stable machine-id across slots:
    • /etc/machine-id shipped as an empty regular file (no change).
    • Link /var/lib/dbus/machine-id to /etc/machine-id symlink for legacy clients.
    • Add new machine-id-sync.service to:
      • first boot: copy /run/machine-id to /persistent/common/etc/machine-id
      • subsequent boots: write in-place persistent to /run/machine-id
  • Persistent journald across slots:
    • Bind /var/log/journal to /persistent/log/journal.
    • Add journalling preferences.
  • Image build tweaks:
    • Stage persistent (slots/*/var, common/etc).
    • Enable units under basic/sysinit.target.wants
  • Hardening:
    • Add ConditionPathExists=/run/machine-id and /persistent/common/etc to units.
    • Use RequiresMountsFor=/persistent/common/etc/machine-id where relevant.

Results:

  • Consistent machine-id in /run, /etc, and /persistent across reboots and slot rotations.
  • Journald writes to /persistent/log/journal/ preserving machine-id across slots.
  • /home persists across A/B.
  • /var remains per-slot enabling predictable state.

Note:
Due to differences in systemd startup, there are some per‑service mount namespace failures (error 226/NAMESPACE) when using sandboxing with 252 (Bookworm) that don't exist in 257 (Trixie) because of differences in how systemd boots. Add a workaround to drop PrivateDevices for affected services (timesyncd and resolved). This reduces the sandboxing for these units.

Other:

  • Relocate persistent partition inside LUKS for crypt PMAP

Make each system slot rootfs read-only and move mutable state to a
persistent partition, with per-slot isolation for /var and shared
identity/data/logs across slots.

Key changes:
- Root immutability:
  - Mount / as ro; no remounting rw during boot.
- Per-slot /var:
  - Add early systemd generator to mount persistent storage at
    /persistent and bind /var to /persistent/slots/<slot>/var based on
    the active slot. Requires GPT PARTLABEL support.
- Shared home:
  - Bind /home to /persistent/home so user data is retained across slot
    rotations.
- Stable machine-id across slots:
  - /etc/machine-id shipped as an empty regular file (no change).
  - Link /var/lib/dbus/machine-id to /etc/machine-id symlink for legacy
    clients.
  - Add new machine-id-sync.service to:
      - first boot: copy /run/machine-id to /persistent/common/etc/machine-id
      - subsequent boots: write in-place persistent to /run/machine-id
- Persistent journald across slots:
  - Bind /var/log/journal to /persistent/log/journal.
  - Add journalling preferences.
- Image build tweaks:
  - Stage persistent (slots/*/var, common/etc).
  - Enable units under *basic/sysinit*.target.wants
- Hardening:
  - Add ConditionPathExists=/run/machine-id and /persistent/common/etc
    to units.
  - Use RequiresMountsFor=/persistent/common/etc/machine-id where relevant.

Results:
- Consistent machine-id in /run, /etc, and /persistent across reboots and
  slot rotations.
- Journald writes to /persistent/log/journal/ preserving machine-id
  across slots.
- /home persists across A/B.
- /var remains per-slot enabling predictable state.

Note:
Due to differences in systemd startup, there are some per‑service mount
namespace failures (error 226/NAMESPACE) when using sandboxing with 252
(Bookworm) that don't exist in 257 (Trixie) because of differences in
how systemd boots. Add a workaround to drop PrivateDevices for affected
services (timesyncd and resolved). This reduces the sandboxing for these
units.

Other:
- Relocate persistent partition inside LUKS for crypt PMAP
@learmj learmj requested a review from pelwell October 27, 2025 12:33
@learmj
Copy link
Collaborator Author

learmj commented Oct 27, 2025

Tested on both Bookworm and Trixie with Connect authkey, this PR enables support of AB delta/incremental updates via immutable root partition and provides unified logging across slots, unified machine-id across slots, persistent per-slot writable support (ie slot specific rw /var), and slot agnostic HOME.

fyi @tdewey-rpi @roliver-rpi

Some eyes on the journald settings would be appreciated. See image/gpt/ab_userdata/device/rootfs-overlay/usr/lib/systemd/system-generators/slot-perst-generator.

I've extensively tested the machine-id sync service and it seems solid across many reboot and tryboot cycles.

  • journalctl --list-boots counts correctly per boot cycle (tracks machine-id)
  • machine-id in sync for all boot cycles
  • journald settings intended to balance endurance and reliability without impacting performance too much
  • persistent ext4 writable partition mounted with rw,noatime,lazytime,commit=60,errors=remount-ro

Seems a reasonable initial implementation with which we can develop delta updates on.

Had to add a small workaround for systemd 252 (Bookworm) in order to support immutable root for services that tried to have a private /dev. See image/gpt/ab_userdata/pre-image.sh.

Immutable root support now mandates GPT PARTLABEL support (GPT label for the root device is used to bind mount the slot specific /var).

@learmj
Copy link
Collaborator Author

learmj commented Oct 27, 2025

The Connect client is not currently part of either AB config (ie config/bookworm-minbase-ab.yaml or config/trixie-minbase-ab.yaml) but it can be easily added, eg by a simple:

 layer:
   base: bookworm-minbase
+  app: rpi-connect-lite

It's probably worth me adding this since it will add Connect support to these builds by default, therefore only requiring the authkey to be added at build time. I can do that in a separate commit.

Copy link
Contributor

@pelwell pelwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's working for me. Using a custom config/otatest.yaml, I don't see any difference except for the persistent partition and the read-only root.

@adeldigital
Copy link

This is what I was just looking for. I'm still not up to speed with rpi-image-gen, but have experimented with building the webkiosk example and it runs fine but I'm a bit confused with things (like what user is it actually running as, and what are the user/SSH credentials for it?). Seems that specifying the user and config inside rpi-imager before flashing doesn't have effect?

Anyways, I don't know how familiar you are with Mender, it's an open-source OTA solution with A/B partitioning and I'm trying to get it working with rpi-sb-provisioner to get SB and FDE for my CM4 fleet. Mender has a similar partition setup, with AB rootfs partitions and a presistant partition (they called it /data). Any ideas how well rpi-image-gen and Mender would work together, in combination with an A/B partitioned image generated with trixie-minbase-ab.yaml? Would it be feasible to integrate Mender’s update agent and boot logic into that layout without breaking the slot-mounting and persistent partition setup you described?

Cheers

@learmj
Copy link
Collaborator Author

learmj commented Oct 29, 2025

Hi @adeldigital
For the time being, you can consider rpi-imager and rpi-image-gen complete strangers - they don't know each other apart from being able to use rpi-imager to write an image created by rpi-image-gen to storage. rpi-image-gen is purely driven by its config system.

To get familiar with how rpi-image-gen works, have a read of the docs. Understanding how the layer system works is critical to getting the best out of the tool. Assuming usage of the in-tree layers, anything running as a regular user runs as the user defined by the rpi-user-credentials layer. If this layer is not included in the build, there is no user account created. SSH keys are set up via layer openssh-server.

rpi-image-gen is an image build/creation tool, so technically you can install anything you like in what it generates.

@adeldigital
Copy link

Thanks for the pointer, @learmj! I found that the user is defined in device-base.yaml. Will dig into it more when I get the chance. My use case is pretty close to the kiosk example — essentially Chromium running a locally hosted app. For now, I’ve had good results using the Lite version of Pi OS. With Trixie, it’s now possible to install the new “base” packages for pd-wayland-core + additionally installing Chromium and my locally hosted app separately, which makes it easy to build a minimal setup tailored to my use case.

Regarding your AB partitioning base layer, do you already have an update strategy in mind — something that handles rootfs switching and rollback on failed updates? I’d love to hear more about what you’re planning for this.

@learmj
Copy link
Collaborator Author

learmj commented Oct 29, 2025

I found that the user is defined in device-base.yaml.

Yes it's defined there, but no local account will be created in the chroot unless the creds layer is pulled in.

Re: update strategy etc - yes we do. Stay tuned...

@pelwell
Copy link
Contributor

pelwell commented Oct 29, 2025

Also, the three machine-id values (/etc/machine-id, /run/machine-id and /persistent/common/etc/machine-id) have the same value, and they don't change after an A/B update.

@learmj learmj merged commit f8388b7 into raspberrypi:master Oct 29, 2025
@adeldigital
Copy link

@learmj To help us track progress for our new product development (targeting launch in early 2026), is there an existing or planned public repo where we might follow the development of the AB update agent? We're very eager to integrate this into our fleet management.

@learmj
Copy link
Collaborator Author

learmj commented Nov 11, 2025

rpi-image-gen is just the build tool that lays down the foundation for an AB system. Once we release the update functionality, there will be more information available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants