-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The image of a running container was deleted by the image garbage collection #123631
Comments
/sig node |
@tomergayer do you only see this issue with cri-o? AFAIK, if you are using docker as your container runtime, docker will block image deletion for any image which is being used by a running container (atleast this is how the docker cli behaves, not sure if the docker's api behaves in the same way) so even if the kubelet gc triggers image deletion, docker will block the image deletion eventually if it has a running container using this image Also, what are the effects you saw on the container which was using this image once the underlying image was deleted, did the container exit? |
@tomergayer this is how containerd prevents the image from getting deleted if the image is referenced by any running container |
@saschagrunert @haircommander I know that we had some cri-o garbage collection bugs reported recently. Any ideas on this one? |
The issue I am thinking of cri-o/cri-o#7143 but not sure if this is related. |
@kannon92 no that's not related. This issue here is speaking about a race between image GC and container creation. |
Hm, CRI-O also complains when an image is in use (as expected): https://github.com/cri-o/cri-o/blob/0f7786ab6b671828dc57d4c16b42dff1f32ec3cf/vendor/github.com/containers/storage/store.go#L2560 If it's not in use by the runtime, then I can only imagine that the container creation RPC is still in progress but the container has already started, otherwise the creation RPC would fail. |
Avoid images being deleted which are still required because a container creation is currently in progress. This fixes a rare race between the image garbage collection and the kuberuntime manager. Fixes kubernetes#123631 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
/triage accepted |
Avoid images being deleted which are still required because a container creation is currently in progress. This fixes a rare race between the image garbage collection and the kuberuntime manager. Fixes kubernetes#123631 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
@Shubham1320 you are right that the runtime might block the remove request if the image is in use, but I don't think the kubelet should to make any assumption about that.
Hm, looks like you're referring to the wrong place, actually I don't know how containerd will behave in a case you try to remove a image referenced by running container, you should take a look here: https://github.com/containerd/containerd/blob/e53663cca75a3dd9c688a65a04f79350d6bb1fbd/internal/cri/server/images/image_remove.go#L36C31-L36C42 @saschagrunert thanks for the PRs you opened, in kubelet as well as in cri-o. I have some thoughts about your fix in the kubelet. I wonder about the side affect of using a shared lock. Well, this lock will prevent any container (who already scheduling on that node) from starting on the node while the GC is running, and I think it's might not be acceptable in some cases (the delay that could happen in GC can also be a problem).
The second approach makes more sense to me, as it prevents any locking or changing in Please let me know what you think about it. |
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Yeah locking by container image would be an option for the fix.
I like that idea, but that would mean we have to integrate it into a feature like |
Hm, unlike the |
I would prefer having it part of an existing one. |
I have some thoughts about the implementation of the idea we discussed, currently the time that the kubelet checks for changes in node conditions depends on @saschagrunert I'd love to hear what you think about it and whether you think it's worth adding this function and the changes that will be required for it to the kubelet for the benefit of this idea. |
I think this would be a good way forward. It may have other implications performance wise, but do you think you could propose that as a draft PR? |
Yes, I just created one |
We now avoid removing images when a sandbox or container creation is in progress. This should close the gap in the outlined corner case in: kubernetes/kubernetes#123631 Refers to kubernetes/kubernetes#123711 as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
What happened?
The image of the container was deleted while pod was running.
When I noticed it, it was already almost two weeks after the pod was created, and by prometheus metrics, I could see at the time that the pod was created the kubelet performed image garbage collection on the node and images were deleted.
It might be worth mentioning that nothing other than the kubelet could have deleted the image from the node.
This situation happen a while ago, and I couldn't deal with it for too long while it happened, So unfortunately I don't have data to share. But by looking in the code the following scenario seems to make sense:
kubernetes/pkg/kubelet/images/image_gc_manager.go
Lines 232 to 247 in b340ef2
imagePullPolicy
isIfNotPresent
there is no need for pulling and the container is start to running. There is no knowledge about that the image is about to be deleted. Ref:kubernetes/pkg/kubelet/images/image_manager.go
Lines 126 to 143 in b340ef2
In short, I think that in the time between pods are detected from runtime to find the unused image to the time those images are actually deleted (the time between step 2 to step 4), new pods could be scheduling on that node, and make use of the images that will be deleted by the garbage collection immediately afterwards. Although the chances of this scenario are rare, the troublesome time I described would be bigger when the node could contain a lot of images and pods, and the difference between
HighThresholdPercent
andLowThresholdPercent
have an effect as well.What did you expect to happen?
The image garbage collection should not delete used images.
How can we reproduce it (as minimally and precisely as possible)?
Suppose an unused image X has already been on the node for some time.
The timeline is designed as following,
Any pod that uses image X and is scheduling on the node within the orange zone will cause this issue.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
No response
Install tools
No response
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
The text was updated successfully, but these errors were encountered: