Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new "vpick" concept for automatically picking newest resource from .v/ dir containing versioned files #26663

Merged
merged 12 commits into from Jan 3, 2024

Conversation

poettering
Copy link
Member

@poettering poettering commented Mar 3, 2023

The idea is this: at various placed where we currently accept a path name to some file accept ".v/" directories where we automatically enumerate contents and pick newest file.

i.e. if you call this:

# systemd-nspawn --image=/srv/myproject.v/foo___.raw

And you have the dir populated like this:

/srv/myproject.v/foo_1.3.45_x86-64.raw
/srv/myproject.v/foo_1.5.35_x86-64.raw
/srv/myproject.v/foo_1.1.7_x86-64.raw
/srv/myproject.v/foo_1.55.3_arm64.raw

and we run on x86-64, then we'll automatically resolve /srv/myproject.v/foo___.raw/srv/myproject.v/foo_1.5.35_x86-64.raw. On 64bit arm, we'd instead resolve it to /srv/myproject.v/foo_1.55.3_arm64.raw.

The .v/ in the penultimate path element enables this versioned mode, so that a dir ending as ".v" is clearly marked as "this contains versioned things". (Note, this is a new concept, and if people already have dirs named like this will create some minor incompat where we'd intrepret this differently as before).

And "___" is the wildcard that we expect the version and arch in.

The whole matching logic is a bit more complex, see patch. And matching by architecture is optional.

This takes inspiration of how distros organize download dirs on their download servers anyway, as well as our ".d/" dirs, that contain fragments of configuration. With this new concept we ".d/" means "merge everything in here, ordered by name". while ".v/" means "pick one from the stuff in here, the newest".

This also nicely matches systemd-sysupdate's pattern logic.

@AdrianVovk
Copy link
Contributor

Maybe it'd be simpler (and more elegant) to restrict the contents of one of these .v directories to one resource only (for example, if you have a .v directory containing some sysext, that directory contains only versions of that one sysext and nothing more). Then there's no need for the wildcard (which is quite ugly IMO)

So, translating your example:

# ls /srv/myproject/foo.raw.v/
/srv/myproject/foo.raw.v/1.3.45_x86-64
/srv/myproject/foo.raw.v/1.5.35+0-3_x86-64
/srv/myproject/foo.raw.v/1.1.7_x86-64
/srv/myproject/foo.raw.v/1.55.3_arm64
# systemd-nspawn --image=/srv/myproject/foo.raw.v

On amd64 hosts it picks /srv/myproject/foo.raw.v/1.3.45_x86-64 and on aarch64 it picks /srv/myproject/foo.raw.v/1.55.3_arm64.

Rather than checking if the penultimate path segment has a .v suffix, you can just check if the last path segment ends with .v AND it is a directory. If both conditions are met do the version substitution.

As an alternative naming scheme, you could use the directory name as the wildcard. For instance (() means optional): /path/to/<prefix>.<suffix>.v/<prefix>_<version>(_<arch>)(<bootcounting>).<suffix>. You can combine this with the approach I showcase above like so: /path/to/<prefix>.<suffix>.v/(<prefix>_)<version>(_<arch>)(<bootcounting>)(.<suffix>)

@AdrianVovk
Copy link
Contributor

Nice-to-have: mapping other common arch names to systemd's arch names:

  • aarch64 -> arm64
  • amd64, x86_64 -> x86-64
  • i386/i486/i586/i686 -> x86
  • (etc)

@AdrianVovk
Copy link
Contributor

This also contains a wip concept "systemd-fsrebind" which then uses this on some new type of rootfs where we'll bind mount dirs based on this logic, replicating what the gpt auto spec does for partitions, but based on dirs. Usecase is btrfs subvolume systems for example, but more. THis is very incomplete though

So from a quick skim of the code, as far as I can tell this is how you envision this working:

  • You look for a hard-coded list of hierarchies to possibly rebind (Including /, /usr, /home, /var, /var/tmp, and /srv)
  • For each hierarchy, you look for
    1. A directory
    2. A .raw file
    3. A versioned directory
    4. A versioned .raw file
  • If you find something, you bind-mount it

Is this correct? I'll proceed with this comment assuming it is


Feedback 0

This fs-rebinding concept should probably be its own separate MR

Feedback 1

This should probably a generator, and there should be a defined partition UUID that gets auto-mounted and then the generator runs for it

Feedback 2

systemd could do better than a hard-coded list of hierarchies. What if the user wants to encrypt /srv/whatever but not /srv/somethingelse? This way of doing it makes it impossible. How about this: we combine this PR's versioning magic with my filename-defines-mountpoint proposal from the mailing list. That seems to cover most of the usecases I can think of. Here's what I mean:

  • /state (or some other place): Mountpoint of this partition
    • @auto: This behavior. Allows other things to store data in this partition (i.e. ostree), handles ext4's lost+found, and also allows us to reuse this partition for slightly different semantics down the line (described later)
      • A path here is a systemd escaped path, reusing the same unit-name escaping logic and relative to / (so treated like systemd-escape --path)
      • path/: Bind-mounted to /path
      • path.raw: Mounted (decrypted, verity, etc etc etc) to /path
      • path.v/: Versioned contents of /path. We pick the latest version's folder using logic from this PR and bind-mount it
      • path.raw.v/: Ditto, except inside are partition images to mount (w/ decryption, etc)

Here's a more practical example for an A/B-partition-based OS (specifically for carbonOS):

  • /state
    • @auto
      • var.raw: Encrypted /var tree
      • etc.local.raw: Encrypted /etc.local tree for systemd-sysconf
      • (note, I'm not combining the two above into a root.raw because I want / to stay pristine on every boot, forcing users to use this state mechanism instead)
      • home/: Bind-mounted to /home to store homed files
      • var-lib-flatpak\x2druntimes: Bind-mounted to /var/lib/flatpak-runtimes to store shared Flatpak-runtimes

Here's an alternative example of a btrfs-subvolume versioned OS:

  • /state
    • @auto
      • -.raw: Encrypted / tree
      • home/: Bind-mounted to /home
      • usr.v/: Versioned /usr tree
        • 1.0/<content>: Contents of /usr for 1.0
        • 2.0_arm64/<content>: Contents of /usr for 2.0 for arm64

And here's what an ostree-esque OS could look like:

  • /state
    • @auto
      • -.raw: Encrypted / tree
      • home/: Bind-mounted to /home
      • usr.v/: Versioned /usr tree
        • 1.0: Symlink to /state/ostree/deploy/HASH
        • 2.0: Symlink to /state/ostree/deploy/HASH
    • ostree: Normal ostree file structure

Or composefs:

  • /state
    • @auto
      • -.raw: Encrypted / tree
      • home/: Bind-mounted to /home
      • usr.raw.v/: Versioned /usr tree
        • 1.0: Composefs blob for 1.0 of the OS
        • 2.0_arm64: Composefs blob for 2.0 of the OS for arm64
    • composefs: Content-addressed store for composefs

Or an A/B image-based server OS customized by the user:

  • /state
    • @auto
      • (/usr is handled by the GPT partition table)
      • root.raw: Encrypted / (contains /var and /etc. Also contains /home since for a server homed isn't necessarily useful)
      • srv-userdata.raw: Encrypted /srv/userdata tree. Contains some kind of user data that must be encrypted on disk
      • srv-www.v/: Versioned contents of /srv/www tree (static HTTP content)
      • var-lib-portables: Portable services (bind-mounted to /var/lib/portables) - no need to encrypt
        • our-backend.v/: Versioned backend server
          • 1.0/<contents>
          • ...
        • httpd.v/: Versioned HTTPD
          • ...

I think you get the picture. Here are some of my ideas for alternate editions of @auto:

  • @auto-resize: Like @auto, but there's a userspace service that runs and tries to dynamically resize the filesystems inside of there. Restricts contents to .raw (and versioned .raw) files that contain live-resizable filesystems
  • @auto-encrypted.raw: Exactly like @auto, and the generator mounts it to @auto-encrypted then subsequently treats it exactly like @auto. Since the things inside of @auto-encrypted will always be encrypted, this will eliminate the need for the user to create their own encrypted raw disk images whenever they just want to encrypt some subtree somewhere on their system. I really do like this one

So the final carbonOS example would look like:

  • /state
    • @auto
      • home: homedir files
      • var-lib-flatpak\x2druntimes: Shared flatpak runtimes
    • @auto-encrypted.raw: Encrypted contents to rebind
    • @auto-encrypted: Mounted from @auto-encrypted.raw
      • var: Contents of /var tree
      • etc.local: Contents of /etc.local tree for systemd-sysconf

@poettering poettering removed the please-review PR is ready for (re-)review by a maintainer label Jun 16, 2023
@poettering
Copy link
Member Author

Maybe it'd be simpler (and more elegant) to restrict the contents of one of these .v directories to one resource only (for example, if you have a .v directory containing some sysext, that directory contains only versions of that one sysext and nothing more). Then there's no need for the wildcard (which is quite ugly IMO)

So i am pretty sure we should not do deeper nesting. I made this a single level only, so that we can easily search under three axis: type, arch, version, without having to choose which is more important and gets reflected in the dir hierarchy. Moreover, I wanted to make sure that the logic sd-boot implements via the UKI dirs is made generic and usable elsewhere. hence any logic that uses deeper nesting doesn't fulfill that goal anymore.

moreover depending on the resource arch or version might not apply (i.e. only /usr/ trees have archs, and maybe rootfs, but certainly no home dirs; similar /home is typically not versioned, only /usr/ is). hence by forcing things into a rigid dir hierarchy just complicates I think.

I think an approach of "everything in one dir", and then just have one kind of resource there ("files"), is the simplest, prettiest approach.

this also has various other benefits. for example you can just do "mkdir foo.v" and then download various files into that dir via "curl -O" and it will just work. I think the simplicity and usefulness of this is not to be underestimated. If we'd first have to create a rigid dir structure underneath things would be much less sexy I am sure.

@poettering
Copy link
Member Author

also, note that we want similar behaviour for picking stuff from a gpt partition table, using the partition label strings. For those we don't have a hierarchy either, hence relying on a hierarchy removes our ability to nicely mirror things on this.

And this goes on: by having a single, shallow list of files we can implement .v/ logic on top of a SHA256SUMS file. We couldn't do that with a deep tree of files.

@AdrianVovk
Copy link
Contributor

Sorry it's not super clear to me what parts of your response are about what parts of my feedback...

This PR had two distinctly separate parts: versioned directories (which I responded to in my first two comments) and fs-rebind (which I talked about in my last comment)

The part you quoted isn't talking about nesting directory hierarchies. I'm just saying that instead of using a wildcard the directory should contain exactly one resource (i.e. the /usr tree, or a single sysext, or a portable service, etc) that gets versioned. I don't think it makes sense for a versioning directory to version multiple unrelated resources. Since the directory contains exactly one resource, the directory name can determine the name of the resource. Then inside of the folder you pick that resource, and search through version numbers and (optionally) arch and (optionally) boot counting.

moreover depending on the resource arch or version might not apply (i.e. only /usr/ trees have archs, and maybe rootfs, but certainly no home dirs; similar /home is typically not versioned, only /usr/ is). hence by forcing things into a rigid dir hierarchy just complicates I think.

Are you talking about fs-rebind now?

If something shouldn't be versioned then it shouldn't be in a .v directory, right?

If arch isn't relevant then the filenames in the .v directory just don't include an arch and the arch stops being used to pick the version

What do you mean by rigid dir hierarchy?

@github-actions github-actions bot added the please-review PR is ready for (re-)review by a maintainer label Jun 21, 2023
@poettering
Copy link
Member Author

I force pushed a new version now. I move the fsrebind feature out of the PR, focussed solely on the vpick stuff.

@poettering
Copy link
Member Author

Sorry it's not super clear to me what parts of your response are about what parts of my feedback...

the first comment of yours, but a bit also the second.

This PR had two distinctly separate parts: versioned directories (which I responded to in my first two comments) and fs-rebind (which I talked about in my last comment)

Yeah, i dropped the fsrebind stuff for now. needs more thinking.

The part you quoted isn't talking about nesting directory hierarchies. I'm just saying that instead of using a wildcard the directory should contain exactly one resource (i.e. the /usr tree, or a single sysext, or a portable service, etc) that gets versioned. I don't think it makes sense for a versioning directory to version multiple unrelated resources. Since the directory contains exactly one resource, the directory name can determine the name of the resource. Then inside of the folder you pick that resource, and search through version numbers and (optionally) arch and (optionally) boot counting.

As mentioned I think each file should be standalone, so that you can "curl -O" it. And I think it's important to allow multiple related but different objects in the same dir. Ideally I want that distros can just put .v dirs as download dirs on their servers.

moreover depending on the resource arch or version might not apply (i.e. only /usr/ trees have archs, and maybe rootfs, but certainly no home dirs; similar /home is typically not versioned, only /usr/ is). hence by forcing things into a rigid dir hierarchy just complicates I think.

Are you talking about fs-rebind now?

The arch + version filtering is implemented generically in the vpick part. the fsrebind then makes use of this for the various key fs trees.

If something shouldn't be versioned then it shouldn't be in a .v directory, right?

If it's multi-arch then I think it might still make sense to put it in one so that we can automatically pick the version for the local arch, even though the primary purpose of vpick is of course the version picking, not the arch picking.

If arch isn't relevant then the filenames in the .v directory just don't include an arch and the arch stops being used to pick the version

correct.

What do you mean by rigid dir hierarchy?

by that i mean multiple levels of dirs arranged in a certain way. I'd really minimize the requirement for arranging things deep, and instead just have one .v/ level per resource and that's it.

@poettering
Copy link
Member Author

Feedback 0

This fs-rebinding concept should probably be its own separate MR

yeah, true. i dropped it from this PR for now.

@poettering
Copy link
Member Author

Feedback 1

This should probably a generator, and there should be a defined partition UUID that gets auto-mounted and then the generator runs for it

a generator on the host runs with the root fs already mounted. if we want to allow fsrebind to pick a root fs (and I think that'd be key) then we need to run it from the initrd at a late stage (i.e. where the basic root fs is already mounted, and now we have to pick a dir inside it). hence generators don't really cut it. It's a bit like systemd-volatile-root in that regard which also runs after the root fs is mounted, but before the initrd transition.

@poettering
Copy link
Member Author

systemd could do better than a hard-coded list of hierarchies. What if the user wants to encrypt /srv/whatever but not /srv/somethingelse? This way of doing it makes it impossible. How about this: we combine this PR's versioning magic with my filename-defines-mountpoint proposal from the mailing list. That seems to cover most of the usecases I can think of. Here's what I mean:

I think fstab can help out with many of these usecases. I'd like to use fsrebind though as vehicle to drive some limited form of uniformity, i.e. not allow arbitrary .v stuff to be arranged. i.e. it's a good thing to make people place resources into the FHS (well, the modernized form of it we define), and hence it's a good thing allowing the broad hierarchies FHS defines to be handled by fsrebind, but not arbitrary ones that make it too easy to depart from that. That said, let's see how this plays out in the end.

@github-actions github-actions bot removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Dec 13, 2023
@poettering
Copy link
Member Author

Addressed all raised issues, but didn't do a portabled port, or the other mentioned ports. Let's do that in follow-up PRs.

I also kept the /var/lib/machines/ test mostly the way it is, i.e. without overmounting, since there are other tests that want it too.

@bluca
Copy link
Member

bluca commented Dec 13, 2023

Looks good, several comments - also with the integration in the dissect logic, do MountImages= and ExtensionImages= also work out of the box?

No not yet. The patch set was large enough already. There are a bunch of things that could be ported over still:

  1. portabled
  2. MountImges=, ExtensionImages=, ExtensionDirectories=
  3. systemd-tmpfiles, systemd-sysusers, … --image= and --root= switches
  4. WorkingDirectory=
  5. BindPaths=, ReadOnlyBindPaths=
  6. EnvironmentFile=
  7. LoadCredential=

I mean, there's so much. I my goal here was to start out with some basic components, and then later add more as we go.

That's absolutely fine, I was just wondering what's done and what's left to do. I can be done later.

@bluca
Copy link
Member

bluca commented Dec 13, 2023

For tests, it would be good to add a real test with a real image in TEST-50-DISSECT as well, to ensure the integration also doesn't regress

The existing test already checks a "real image", no?

I meant via RootImage=. It's good that lots of detailed tests can be done with the tool, but given RootImage= is integrated, I'd like to see that covered immediately too. It doesn't need to test many variations and permutations, just one instance to make sure the integration works as expected is enough.

test/units/testsuite-50.sh Fixed Show fixed Hide fixed
test/units/testsuite-50.sh Fixed Show fixed Hide fixed
test/units/testsuite-50.sh Fixed Show fixed Hide fixed
@poettering
Copy link
Member Author

I meant via RootImage=. It's good that lots of detailed tests can be done with the tool, but given RootImage= is integrated, I'd like to see that covered immediately too. It doesn't need to test many variations and permutations, just one instance to make sure the integration works as expected is enough.

Added.

…rsioned resources

This adds a new concept for handling paths. At appropriate places, if a
path such as /foo/bar/baz.v/ is specified, we'll
automatically enumerate all entries in /foo/bar/baz.v/baz* and then
do a version sort and pick the newest file.

A slightly more complex syntax is available, too:

/foo/bar/baz.v/quux___waldo

if that's used, then we'll look for all files matching
/foo/bar/baz.v/quux*waldo, and split out the middle, and version sort
it, and pick the nwest.

The ___ wildcard indicates both a version string, and if needed an
architecture ID, in case per-arch entries shall be supported.

This is a very simple way to maintain versioned resources in a dir, and
make systemd's components automatically pick the newest. Example:

    /srv/myimages.v/foobar_1.32.65_x86-64.raw
    /srv/myimages.v/foobar_1.33.45_x86-64.raw
    /srv/myimages.v/foobar_1.31.5_x86-64.raw
    /srv/myimages.v/foobar_1.31.5_arm64.raw

If now nspawn is invoked like this:

    systemd-nspawn --image=/srv/myimages.v/foobar___.raw

Then it will automatically pick
/srv/myimages.v/foobar_1.33.45_x86-64.raw as the version to boot on
x86-64, and /srv/myimages.v/foobar_1.31.5_arm64.raw on arm64.

This commit only adds the basic implementation for picking files from a
dir, but no hook-up anywhere.
…d line

Usecase:

    $ du $(systemd-vpick /srv/myimages.v/foo___.raw)

In order to determine size of newest image in /srv/myimages.v/
@bluca bluca added good-to-merge/with-minor-suggestions and removed please-review PR is ready for (re-)review by a maintainer labels Jan 3, 2024
@poettering
Copy link
Member Author

thanks for the review.

@poettering poettering added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels Jan 3, 2024
@poettering poettering merged commit 2a02a8d into systemd:main Jan 3, 2024
45 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-system dissect documentation good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed meson new-feature nspawn pid1 tests util-lib
Development

Successfully merging this pull request may close these issues.

None yet

3 participants