-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hung volumes can wedge the kubelet #31272
Comments
Gonna fix this one now. |
v1.4? |
@luxas you betcha, this is muy no bueno imo |
Ohh man... |
I see one thing might related is during tearDown() for the volume, Jing On Tue, Aug 23, 2016 at 11:40 AM, Timothy St. Clair <
|
Yes, there definitely could be multiple places that are affecting this. |
Yeah, I think I want to audit this entire subsystem before we start making code changes |
At least that, yes. Not sure if there is more we need to be looking at. On Tue, Aug 23, 2016 at 3:14 PM, Paul Morie notifications@github.com
|
I think it has to be anything that can do a stat. |
In general, kubelet sync loop should run operations asynchronously and not On Tue, Aug 23, 2016 at 12:14 PM, Paul Morie notifications@github.com
|
@vishh are you saying we should have made all things that can do a stat have a timeout from the get-go? :) |
I'm saying that each pod setup should be asynchronous & should not affect On Tue, Aug 23, 2016 at 12:23 PM, Paul Morie notifications@github.com
|
@vishh so actually -- setup is asynchronous but the syncloop waits for it to complete 😭 |
Scratchpad for call sites that might be problematic:
|
@pmorie, I didn't find any others. |
@pmorie Thanks for jumping on this. Yes that capture it. The problematic ones are 1 and 2 since those block the volume manager reconciler and kublet sync loop. 3 is less of an issue since it only blocks plugin level operations which are async. |
@saad-ali in this case I feel like (3) is still an issue, even though with the attach / detach refactor, theoretically, |
So, I have some code wherein I introduced timeouts at (1) and (2), and realized I'm not quite certain what the desired behavior is when this happens. The behavior I see with my WIP code is:
I think this is the correct behavior -- does anyone disagree? |
What if the nfs server is never restarted, or its exports are changed, or its ip is changed? Will the pod remain stuck forvever? |
No disagreement. The user's intent is for the pod to go away. |
@jeffvance you can force the deletion of the pod using a grace period of 0. |
@jeffvance, force delete doesn't go through the kubelet. |
This did not make it into 1.7. Pushing to 1.8 |
I was trying to reproduce this and I do not think original steps work anymore. Also, following steps @sjenning used doesn't work.
The main thing is, we are not cleaning volume for pods that are not deleted from api server (and hence a pod stuck in "terminating" state don't count). The e2e tests @jeffvance wrote also don't reproduce this. I am going to keep looking and it is possible that some other refactoring can bring back the issue, so it is better to safe guard against it. @jingxu97 @sjenning if you can reproduce this somehow let me know. |
[MILESTONENOTIFIER] Milestone Labels Complete Issue label settings:
|
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/remove-lifecycle stale |
today I realized we are having the same behavior with the EFS provider (https://github.com/kubernetes-incubator/external-storage/tree/master/aws/efs). EFS is the Amazon NFS implementation, the efs-provider basically creates a directory for each volume you need, and internally is handled as an NFS pv. The problem is the efs-provider is deleting the volume(it means remove a directory in the NFS) before the kubelet umounts it, becoming an error in the TearDown:
and resulting in the pods getting stuck in the Terminating state:
As I understand kubelet should be able to tear-down a NFS volume in the "stale NFS file handle" state. |
Now with the StorageProtection beta in 1.10, we won't allow deleting the PVC (and PV) until all Pods using it are terminated. |
Does PVCProtection(alpha on 1.9) provide the same functionality? |
Yes 1.9 PVCProtection alpha feature is the same. |
see kubernetes#31272, kubernetes#101622 Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>
If you have pods that use something like NFS storage, if the system is unable to read the mounted directory, or unmount it, it is possible to completely wedge the kubelet such that it can't successfully run any new pods that use volumes (which is basically all, if they use secret tokens) until either the storage issue is resolved, or you restart the kubelet.
To reproduce:
kubectl run --rm --attach --restart Never --image busybox bbox date
)The busybox pod will be stuck ContainerCreating with events such as these:
In this stack trace I gathered after I deleted the pod, it shows that the volume reconciler is still trying to get the volumes for the pod I just deleted. You'll also see a goroutine trying to stop the Docker container, but it is stuck.
In this stack trace I gathered after I tried to create the bbox pod, it shows that the new pod (bbox) is waiting for its volumes to attach/mount (in this case, secrets).
We've seen this in 1.2.x and I just reproduced it in master (commit f297ea9).
cc @kubernetes/sig-storage @kubernetes/sig-node @kubernetes/rh-cluster-infra @pmorie @derekwaynecarr @timothysc @saad-ali
The text was updated successfully, but these errors were encountered: