New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: list the current k8s pods #8772
tests: list the current k8s pods #8772
Conversation
e8120f5
to
7dc061d
Compare
Hey @danmihai1 - I might have misunderstood something, but it looks like wait=true the default value from https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#delete ? |
Thanks Steve! I didn't notice the Default column in that doc when I looked at it yesterday. It looks like "delete" returns without waiting for some of the K8s YAML files that I tested. Unfortunately --wait=true doesn't change that behavior. For example, my 3 VMs below are still around after deleting my Deployment. I will keep looking for a way to avoid those temporary zombie VMs - unless you already know what the solution is.
This delete behavior is similar with vanilla runc containers too, not just with Kata. My YAML fille:
|
Thanks a lot for the recreate steps that lets me have a play. I think what I've seen so far is that the problem here is that
I'll have a play and see if I find a nicer way to wait for the pods as well and if I do, I'll let you know. |
So, at first attempt, the only way I can see to wait for the deletion of the pods in a deployment is using the deployment's selector. e.g. the above example has a selector
That's admittedly not super helpful for the generic case, where we don't necessarily know the labels, but from a quick scan through the PR, it looks like most of the cases are directly deleting the pod, which should just wait ok IIUC? |
That sounds very promising, thanks! I will give this a try when I can come back to it - hopefully in the next couple of days. |
Log the list of the current pods between tests because these pods might be related to cluster nodes occasionally running out of memory. Fixes: kata-containers#8769 Signed-off-by: Dan Mihai <dmihai@microsoft.com>
7dc061d
to
90c782f
Compare
/test |
It would be nice to see some evidence that Kata CI is dealing with leaked (for short term or longer term) pods, before making relatively intrusive changes to wait for pod termination. I didn't get such evidence here, so maybe my guess about zombie pods running the node out of memory was incorrect. So, I think we could proceed in one of the following ways:
I vote for (1), but I wouldn't mind (2) either. |
I feel like getting more information is a good plan if we don't quite understand what's going wrong. I don't think that it seems like an excessive amount and harmful, so I'm okay with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - as discussed in the comments. Thanks
Log the list of the current pods between tests because these pods might be related to cluster nodes occasionally running out of memory.
Fixes: #8769