Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toolbox: prevent mounted snapshots from being gc'ed #9

Conversation

daMupfel
Copy link
Contributor

@daMupfel daMupfel commented Mar 12, 2024

The toolbox uses ctr under the hood if available. The 'ctr image mount' command by default adds a 1 day lease on the created snapshot that is mounted as the upperdir. Therfore the snapshot will be gc'ed and the toolbox not function correctly anymore as the upperdir does not longer exist.

mounted upperdir snapshot is gc'ed after lease times out

The toolbox script uses the ctr client (which it seems should only be used for testing and administrative purposes?). This client will create a lease when using the ctr image mount command which lasts for one day and references the rw snapshot created.

When said lease expires it will remove the upperdir of the mount and make the toolbox somewhat none functional.

In detail, if we run toolbox it will mount the fedore image and a rw snapshot which can be seen here:

$ cat /proc/mounts | grep toolbox
overlay /var/lib/toolbox/core-docker.io_library_fedora-38 overlay rw,seclabel,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/27/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/27/work,uuid=on 0 0

When inspecting the leases with ctr we can see that it creates a lease valid for a day:

$ sudo ctr leases ls
ID                                                CREATED AT           LABELS                                                                                                                                 
/var/lib/toolbox/core-docker.io_library_fedora-38 2024-03-12T15:38:53Z containerd.io/gc.expire=2024-03-13T15:38:53Z,containerd.io/gc.ref.snapshot.overlayfs=/var/lib/toolbox/core-docker.io_library_fedora-38 

After the lease expires it will remove the snapshot and therefore the mounted upperdir.

This lease can't be changed as it seems hardcoded in the client https://github.com/containerd/containerd/blob/main/cmd/ctr/commands/images/mount.go#L76-L80.

I worked around this by manually setting the label containerd.io/gc.root=true on the snapshot. This will prevent the
GC from removing the snapshot

$ sudo ctr snapshots info /var/lib/toolbox/core-docker.io_library_fedora-38
{
    "Kind": "Active",
    "Name": "/var/lib/toolbox/core-docker.io_library_fedora-38",
    "Parent": "sha256:fdc6e90d218ce752383b418a061d823f1dd876cab4514aa86ca02811498f60f4",
    "Labels": {
        "containerd.io/gc.root": "true"
    },
    "Created": "2024-03-12T15:38:53.586782845Z",
    "Updated": "2024-03-12T15:38:53.633095228Z"
}

There is also a --label flag which does not seem to work.

I'm not sure if setting containerd.io/gc.root is the best solution or if there is something else since i don't 100% understand what containerd is doing.

For testing purposes i built an own version of ctr which sets the lease time to 1 minute. This helped for testing the problem.

How to use

Run the toolbox command on flatcar as usual and inspect manually with ctr leases ls, ctr snapshots ls and ctr snapshots info.

Testing done

Copied the toolbox script and updated it according to this PR. Called directly instead of the provided toolbox script.

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

The toolbox uses ctr under the hood if available. The 'ctr image mount'
command by default adds a 1 day lease on the created snapshot that is
mounted as the upperdir. Therfore the snapshot will be gc'ed and the
toolbox not function correctly anymore as the upperdir does not longer
exist.
@jepio
Copy link
Member

jepio commented Mar 22, 2024

I have one concern about the change: does this mean toolbox snapshots will never be GCed and will accumulate over time?

@daMupfel
Copy link
Contributor Author

I would assume so. I'm not a 100% sure what containerd is doing internally, but the documentation would suggest it.

Since there is no defined lifecycle (like there is no toolbox reset or destroyc ommand), I don't know what the expected behaviour is here.

When using docker it just extracted all files into the folder. That never gets deleted as well.

A none functional toolbox however does not seem to be what we want though :).

It is a bit weird to me that we have something in containerd (the image and mount) when we actually are not using containerd to run the toolbox. There seems to be a miss match of who is responsible for what (with docker at least we only used it to extract everything but after that docker was not involved anymore only systemd nspawn).

@jepio
Copy link
Member

jepio commented Mar 22, 2024

Ok, that makes sense.

Could I ask you to prepare a github.com/flatcar/scripts PR that:

  • updates the commit in toolbox-9999.ebuild
  • rename the toolbox ebuild symlink to increment the revision

@jepio jepio merged commit fce9ba2 into flatcar:flatcar-master Mar 22, 2024
@daMupfel
Copy link
Contributor Author

Sure can do!

Thank you very much jepio!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants