Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lease-lock airgap images to prevent Kubelet GC from deleting them #3193

Closed
brandond opened this issue Apr 14, 2021 · 6 comments
Closed

Lease-lock airgap images to prevent Kubelet GC from deleting them #3193

brandond opened this issue Apr 14, 2021 · 6 comments
Assignees
Labels
kind/enhancement An improvement to existing functionality
Milestone

Comments

@brandond
Copy link
Contributor

brandond commented Apr 14, 2021

We want to do what was discussed at #2961 (comment):

The Kubelet’s attempts to delete images don’t actually remove them, they just mark them as unused and then at some point later on containerd actually cleans them up. It looks like we could actually put a lease on things that are imported from the airgap tarball to ensure that it can’t be deleted. The only gotcha to that would be needing to have some way to prune out things that were previously imported from a tarball, but were no longer actually needed. Maybe just do a sweep at startup and clear the lease on everything that was previously protected, and then re-lease everything that’s imported that round? It would mean that the tarball needed to be retained in order to retain protection on things it contains.

  • On K3s startup, find all images in the containerd image store with a well-known lease on them, and clear the lease.
  • When importing images, do so with a well-known lease to prevent them from being deleted.

This will prevent containerd from deleting any images from the image store that were imported from airgap tarballs during the most recent startup. Images pulled by the kubelet, and images that were previously found in tarballs but are not any longer, may be garbage-collected as usual.

We may also want to hold a lease on any images that are pre-pulled due to being found in .txt files in the agent/images directory.

@brandond brandond added this to the v1.21.1+k3s1 milestone Apr 14, 2021
@brandond brandond added this to To Triage in Development [DEPRECATED] via automation Apr 14, 2021
@brandond brandond moved this from To Triage to Backlog in Development [DEPRECATED] Apr 14, 2021
@brandond brandond added the kind/enhancement An improvement to existing functionality label Apr 14, 2021
@dereknola dereknola self-assigned this Jun 9, 2021
@dereknola dereknola moved this from Backlog to Working in Development [DEPRECATED] Jun 10, 2021
@dereknola dereknola moved this from Working to To Test in Development [DEPRECATED] Jun 17, 2021
xiaods added a commit to xiaods/k8e that referenced this issue Jul 4, 2021
fix k3s-io/k3s#3193

porting from k3s upstream pr: k3s-io/k3s#3464
Signed-off-by: Deshi Xiao <xiaods@gmail.com>
xiaods added a commit to xiaods/k8e that referenced this issue Jul 4, 2021
fix k3s-io/k3s#3193

porting from k3s upstream pr: k3s-io/k3s#3464
Signed-off-by: Deshi Xiao <xiaods@gmail.com>
@rancher-max
Copy link
Contributor

In previous versions, we generally didn't see anything using k3s ctr leases ls since the leases are immediately removed after the function runs. With the new implementation, we see a result with a lease that does not expire which will satisfy the requirement.
Moving this to Waiting for RC to give a quick check with the actual tarball artifacts and an airgap setup once those are created.

@dereknola
Copy link
Contributor

A new PR has merged that further addresses this issue #3755

@rancher-max
Copy link
Contributor

I see this working in an airgap environment using tarball install method. I used boltbrowser to navigate the directory and see the leases.

Development [DEPRECATED] automation moved this from Waiting for RC to Done Issue / Merged PR Sep 21, 2021
@bflorat
Copy link

bflorat commented Jan 28, 2022

k3s version v1.23.3+k3s1 (5fb370e)
go version go1.17.5
airgap, embedded contenaird, no docker, no private repository

My docker.io/rancher/mirrored-pause:3.6 is still deleted (causing classical 'pending' POD because k3s cannot fetch docker.io/rancher/mirrored-pause:3.6 in an air gap environment. Doing a k3s-killall.sh then a systemctl start k3s fixes it.

/var/lib/rancher/k3s/agent/containerd/containerd.log extract:

time="2022-01-28T13:09:59.718387598+01:00" level=info msg="ImageDelete event &ImageDelete{Name:docker.io/rancher/mirrored-pause:3.6,XXX_unrecognized:[],}"

crictl image | grep pause

returns nothing

@sdemura
Copy link

sdemura commented May 17, 2023

Can we re-open this? This is still happening on at least 1.23.14+k3s1.

@rancher-max
Copy link
Contributor

@sdemura v1.23 is EOL now. If you're able to reproduce on a more current version (v1.24+) then will you actually create a new issue with the details and reference this one?

@k3s-io k3s-io locked and limited conversation to collaborators May 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement An improvement to existing functionality
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

7 participants