-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flaky test: kill KILL [host pidns] #4163
Comments
Interestingly, I've never seen it, either in CI or locally. I see that you're using cgroupv1 -- any other details that might be relevant for a repro? |
e.g. kernel version and distro? |
|
Another flaky I saw in github action:
Maybe we should wait some time to see |
This last one is different from the previous few (it was line 66 before). Now it is this:
Here is it: runc/tests/integration/kill.bats Line 56 in 313ec8b
What's happening here is:
Ah! I think what happens is we kill init, but its pid is still listed in cgroup.procs. Now, the older failures (line 66) are happening because this code: runc/tests/integration/kill.bats Line 61 in 313ec8b
does not do anything if container's init is already killed. Again, apparently we read cgroup.procs too fast for the kernel to update the list of processes. For both cases, I can't think of anything but to add a sleep after kill. |
Should be fixed by #4179. As much as I don't like adduing kludges like this, I can't think of any other way to solve this. |
I was wrong. If the pid is known, one can wait in loop doing |
Implemented via kill -0 in a loop to wait for the process to be gone. I think that maybe it's better to implement it in runc (rather than the test case). IOW make |
When I run integration tests in my local machine, the tests about kill container with host pidns will alway fail.
The text was updated successfully, but these errors were encountered: