Hung volumes can wedge the kubelet #31272

ncdc · 2016-08-23T17:28:53Z

If you have pods that use something like NFS storage, if the system is unable to read the mounted directory, or unmount it, it is possible to completely wedge the kubelet such that it can't successfully run any new pods that use volumes (which is basically all, if they use secret tokens) until either the storage issue is resolved, or you restart the kubelet.

To reproduce:

Create a pod that uses an NFS volume
Stop the NFS server
Try to delete the pod
Wait a couple of minutes - you'll see the pod stuck Terminating
Try to create a new pod (kubectl run --rm --attach --restart Never --image busybox bbox date)

The busybox pod will be stuck ContainerCreating with events such as these:

Events:
  FirstSeen     LastSeen        Count   From                    SubobjectPath   Type            Reason          Message
  ---------     --------        -----   ----                    -------------   --------        ------          -------
  8m            8m              1       {default-scheduler }                    Normal          Scheduled       Successfully assigned bbox to 127.0.0.1
  6m            6s              4       {kubelet 127.0.0.1}                     Warning         FailedMount     Unable to mount volumes for pod "bbox_default(12a20cdb-694f-11e6-baa4-001c42e13e5d)": timeout expired waiting for volumes to attach/mount for pod "bbox"/"default". list of unattached/unmounted volumes=[default-token-joiwi]
  6m            6s              4       {kubelet 127.0.0.1}                     Warning         FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "bbox"/"default". list of unattached/unmounted volumes=[default-token-joiwi]

In this stack trace I gathered after I deleted the pod, it shows that the volume reconciler is still trying to get the volumes for the pod I just deleted. You'll also see a goroutine trying to stop the Docker container, but it is stuck.

In this stack trace I gathered after I tried to create the bbox pod, it shows that the new pod (bbox) is waiting for its volumes to attach/mount (in this case, secrets).

We've seen this in 1.2.x and I just reproduced it in master (commit f297ea9).

cc @kubernetes/sig-storage @kubernetes/sig-node @kubernetes/rh-cluster-infra @pmorie @derekwaynecarr @timothysc @saad-ali

The text was updated successfully, but these errors were encountered:

pmorie · 2016-08-23T17:29:19Z

Gonna fix this one now.

luxas · 2016-08-23T17:44:34Z

v1.4?

pmorie · 2016-08-23T17:45:28Z

@luxas you betcha, this is muy no bueno imo

timothysc · 2016-08-23T18:39:18Z

Ohh man...

jingxu97 · 2016-08-23T18:59:06Z

I see one thing might related is during tearDown() for the volume,
if IsLikelyNotMountPoint() fails checking the dir, it will fail the whole
tearDown() operation, which will block the following operations on the
volume.

Jing

On Tue, Aug 23, 2016 at 11:40 AM, Timothy St. Clair <
notifications@github.com> wrote:

Ohh man...

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#31272 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ASSNxaGsRtumGofKXUSy5ZrRRogxFolJks5qiz6MgaJpZM4JrLnH
.

Jing

ncdc · 2016-08-23T18:59:48Z

Yes, there definitely could be multiple places that are affecting this.

pmorie · 2016-08-23T19:01:49Z

Yeah, I think I want to audit this entire subsystem before we start making code changes

pmorie · 2016-08-23T19:13:32Z

@ncdc @jingxu97 I think the long and short of this is that we need to audit for anything that tries to read a directory that might be a volume mountpoint and pass a timeout to it. Agree?

ncdc · 2016-08-23T19:17:35Z

At least that, yes. Not sure if there is more we need to be looking at.

On Tue, Aug 23, 2016 at 3:14 PM, Paul Morie notifications@github.com
wrote:

@ncdc https://github.com/ncdc @jingxu97 https://github.com/jingxu97 I
think the long and short of this is that we need to audit for anything that
tries to read a directory that might be a volume mountpoint and pass a
timeout to it. Agree?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#31272 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAABYtdohiJ1CYD8sagv-cJUMmMmQj7hks5qi0aRgaJpZM4JrLnH
.

pmorie · 2016-08-23T19:18:17Z

I think it has to be anything that can do a stat.

vishh · 2016-08-23T19:19:33Z

In general, kubelet sync loop should run operations asynchronously and not
make assumptions on libraries unless the library provides an explicit
contract.

On Tue, Aug 23, 2016 at 12:14 PM, Paul Morie notifications@github.com
wrote:

@ncdc https://github.com/ncdc @jingxu97 https://github.com/jingxu97 I
think the long and short of this is that we need to audit for anything that
tries to read a directory that might be a volume mountpoint and pass a
timeout to it. Agree?

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#31272 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKIGKX5e9nP-2rs8aBIxcKHhIR5_Vks5qi0aSgaJpZM4JrLnH
.

pmorie · 2016-08-23T19:22:42Z

@vishh are you saying we should have made all things that can do a stat have a timeout from the get-go? :)

vishh · 2016-08-23T19:27:04Z

I'm saying that each pod setup should be asynchronous & should not affect
the sync loop :)
Having a timeout on all kernel syscalls would be like utopia :)

On Tue, Aug 23, 2016 at 12:23 PM, Paul Morie notifications@github.com
wrote:

@vishh https://github.com/vishh are you saying we should have made all
things that can do a stat have a timeout from the get-go? :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#31272 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKN4m5eUpuTQe4XGiTAMBFBY8Ll8mks5qi0i4gaJpZM4JrLnH
.

pmorie · 2016-08-23T19:38:21Z

@vishh so actually -- setup is asynchronous but the syncloop waits for it to complete 😭

pmorie · 2016-08-23T19:43:51Z

Scratchpad for call sites that might be problematic:

pkg/kubelet/volumemanager/reconciler/reconciler.go#getVolumesFromPodDir - ioutil.ReadDir
pkg/kubelet/kubelet_getters.go#getPodVolumeNameListFromDisk - ioutil.ReadDir,util.ReadDirNoExit
mounter.IsLikelyNotMountPoint implementations

@jingxu97 @saad-ali does that seem accurate?

jingxu97 · 2016-08-23T20:33:57Z

@pmorie, I didn't find any others.

saad-ali · 2016-08-23T21:20:11Z

Scratchpad for call sites that might be problematic:

pkg/kubelet/volumemanager/reconciler/reconciler.go#getVolumesFromPodDir - ioutil.ReadDir

pkg/kubelet/kubelet_getters.go#getPodVolumeNameListFromDisk - ioutil.ReadDir,util.ReadDirNoExit

mounter.IsLikelyNotMountPoint implementations

@jingxu97 @saad-ali does that seem accurate?

@pmorie Thanks for jumping on this. Yes that capture it. The problematic ones are 1 and 2 since those block the volume manager reconciler and kublet sync loop. 3 is less of an issue since it only blocks plugin level operations which are async.

pmorie · 2016-08-24T13:25:09Z

@saad-ali in this case I feel like (3) is still an issue, even though with the attach / detach refactor, theoretically, IsLikelyNotMountPoint should never be called when a mountpoint already exists. In the case of NFS it is entirely possible to block forever and not even the insanely high tcp timeout will save you... because the NFS protocol uses udp by default. So, I'd like to be safe. Safety first after all.

pmorie · 2016-08-24T19:03:21Z

So, I have some code wherein I introduced timeouts at (1) and (2), and realized I'm not quite certain what the desired behavior is when this happens. The behavior I see with my WIP code is:

Pod with NFS mount is running
Stop NFS server
Delete pod
Pod gets stuck in terminating, timeout errors show in log
Start NFS server
Pod get terminated

I think this is the correct behavior -- does anyone disagree?

jeffvance · 2016-08-24T19:10:41Z

What if the nfs server is never restarted, or its exports are changed, or its ip is changed? Will the pod remain stuck forvever?

ncdc · 2016-08-24T19:27:35Z

I think this is the correct behavior -- does anyone disagree?

No disagreement. The user's intent is for the pod to go away.

ncdc · 2016-08-24T19:28:23Z

@jeffvance you can force the deletion of the pod using a grace period of 0.

pmorie · 2016-08-24T19:35:23Z

@jeffvance, force delete doesn't go through the kubelet.

saad-ali · 2017-06-07T15:16:33Z

This did not make it into 1.7. Pushing to 1.8

gnufied · 2017-06-27T16:57:09Z

I was trying to reproduce this and I do not think original steps work anymore. Also, following steps @sjenning used doesn't work.

start nfs server
start nfs pod
stop nfs server
delete nfs pod
stop kubelet
start kubelet
try to start a pod

The main thing is, we are not cleaning volume for pods that are not deleted from api server (and hence a pod stuck in "terminating" state don't count). The e2e tests @jeffvance wrote also don't reproduce this.

I am going to keep looking and it is possible that some other refactoring can bring back the issue, so it is better to safe guard against it. @jingxu97 @sjenning if you can reproduce this somehow let me know.

k8s-github-robot · 2017-09-07T17:58:48Z

[MILESTONENOTIFIER] Milestone Labels Complete

@ncdc @pmorie

Issue label settings:

sig/storage: Issue will be escalated to these SIGs if needed.
priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
kind/bug: Fixes a bug discovered during the current release.

Additional instructions available here The commands available for adding these labels are documented here

fejta-bot · 2018-01-07T07:52:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

errordeveloper · 2018-01-22T11:25:29Z

/remove-lifecycle stale
/lifecycle frozen

sp-joseluis-ledesma · 2018-06-15T11:06:12Z

today I realized we are having the same behavior with the EFS provider (https://github.com/kubernetes-incubator/external-storage/tree/master/aws/efs). EFS is the Amazon NFS implementation, the efs-provider basically creates a directory for each volume you need, and internally is handled as an NFS pv. The problem is the efs-provider is deleting the volume(it means remove a directory in the NFS) before the kubelet umounts it, becoming an error in the TearDown:

Jun 15 10:58:16 ip-172-17-96-143 kubelet[2488]: E0615 10:58:16.423138    2488 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/nfs/d2e22f1c-7086-11e8-939c-0a8a94845282-pvc-d25050ba-7086-11e8-b2ec-0e3c06e0fed6\" (\"d2e22f1c-7086-11e8-939c-0a8a94845282\")" failed. No retries permitted until 2018-06-15 11:00:18.423105953 +0000 UTC m=+174041.026910867 (durationBeforeRetry 2m2s). Error: "UnmountVolume.TearDown failed for volume \"dynamodb-pd\" (UniqueName: \"kubernetes.io/nfs/d2e22f1c-7086-11e8-939c-0a8a94845282-pvc-d25050ba-7086-11e8-b2ec-0e3c06e0fed6\") pod \"d2e22f1c-7086-11e8-939c-0a8a94845282\" (UID: \"d2e22f1c-7086-11e8-939c-0a8a94845282\") : Error checking if path exists: stat /var/lib/kubelet/pods/d2e22f1c-7086-11e8-939c-0a8a94845282/volumes/kubernetes.io~nfs/pvc-d25050ba-7086-11e8-b2ec-0e3c06e0fed6: stale NFS file handle"

and resulting in the pods getting stuck in the Terminating state:

NAME                        READY     STATUS        RESTARTS   AGE
dynamodb-5bbcb7b45c-62mtn   0/1       Terminating   0          35m

As I understand kubelet should be able to tear-down a NFS volume in the "stale NFS file handle" state.

msau42 · 2018-06-19T22:18:20Z

Now with the StorageProtection beta in 1.10, we won't allow deleting the PVC (and PV) until all Pods using it are terminated.

sp-joseluis-ledesma · 2018-06-25T08:15:21Z

Does PVCProtection(alpha on 1.9) provide the same functionality?

msau42 · 2018-06-25T16:28:25Z

Yes 1.9 PVCProtection alpha feature is the same.

see kubernetes#31272, kubernetes#101622 Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>

ncdc added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 23, 2016

ncdc assigned pmorie Aug 23, 2016

k8s-github-robot added area/kubelet team/cluster labels Aug 23, 2016

dchen1107 added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 23, 2016

dchen1107 added this to the v1.4 milestone Aug 23, 2016

saad-ali modified the milestones: v1.8, v1.7 Jun 7, 2017

saad-ali removed the team/cluster (deprecated - do not use) label Jun 7, 2017

k8s-github-robot added the milestone-labels-complete label Sep 1, 2017

k8s-github-robot added the milestone-labels-**complete** label Sep 7, 2017

spxtr removed the milestone-labels-**complete** label Sep 11, 2017

saad-ali removed this from the v1.8 milestone Sep 15, 2017

eparis removed the milestone-labels-complete label Oct 5, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2018

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 22, 2018

LukeCarrier mentioned this issue Sep 25, 2020

Deleting NFS server without waiting for pod termination can cause umount hang in kubelet, preventing termination rook/rook#6327

Open

jingxu97 mentioned this issue Oct 14, 2020

check volume directories instead of mounts for cleanupOrphanedPodDirs #84206

Merged

tnqn mentioned this issue Apr 29, 2021

kubelet SyncLoop hangs on "os.Stat" forever if there is an unresponsive NFS volume #101622

Closed

joshimoo mentioned this issue Jun 3, 2021

Add timeout for os.Stat in IsLikelyNotMountPoint #102038

Closed

llamerada-jp mentioned this issue Sep 30, 2021

Avoid getting incorrect information from /proc/mounts. topolvm/topolvm#371

Closed

4 tasks

liggitt mentioned this issue Jan 25, 2022

Sporadic "MountVolume.SetUp failed for volume ... not registered" failures seen in 1.22.0-1.22.8, 1.23.0-1.23.5 #105204

Closed

j4ckstraw added a commit to j4ckstraw/kubernetes that referenced this issue Aug 15, 2023

Fix hung volume wedge the kubelet

37d510c

see kubernetes#31272, kubernetes#101622 Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>

j4ckstraw mentioned this issue Aug 15, 2023

Fix hung volume wedge the kubelet #119945

Closed

gnufied mentioned this issue Oct 25, 2023

Pod volume file system is corrupted, causing the program to get stuck when accessing files, leading to program freeze #119512

Open

ileixe mentioned this issue Jan 24, 2024

[Question] what's expected behavior when nfs-server pod is restarted? openebs-archive/dynamic-nfs-provisioner#184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hung volumes can wedge the kubelet #31272

Hung volumes can wedge the kubelet #31272

ncdc commented Aug 23, 2016

pmorie commented Aug 23, 2016

luxas commented Aug 23, 2016

pmorie commented Aug 23, 2016

timothysc commented Aug 23, 2016

jingxu97 commented Aug 23, 2016

ncdc commented Aug 23, 2016

pmorie commented Aug 23, 2016

pmorie commented Aug 23, 2016

ncdc commented Aug 23, 2016

pmorie commented Aug 23, 2016

vishh commented Aug 23, 2016

pmorie commented Aug 23, 2016

vishh commented Aug 23, 2016

pmorie commented Aug 23, 2016

pmorie commented Aug 23, 2016 •

edited

Loading

jingxu97 commented Aug 23, 2016

saad-ali commented Aug 23, 2016

pmorie commented Aug 24, 2016

pmorie commented Aug 24, 2016

jeffvance commented Aug 24, 2016

ncdc commented Aug 24, 2016

ncdc commented Aug 24, 2016

pmorie commented Aug 24, 2016

saad-ali commented Jun 7, 2017

gnufied commented Jun 27, 2017

k8s-github-robot commented Sep 7, 2017

fejta-bot commented Jan 7, 2018

errordeveloper commented Jan 22, 2018

sp-joseluis-ledesma commented Jun 15, 2018

msau42 commented Jun 19, 2018

sp-joseluis-ledesma commented Jun 25, 2018

msau42 commented Jun 25, 2018

Hung volumes can wedge the kubelet #31272

Hung volumes can wedge the kubelet #31272

Comments

ncdc commented Aug 23, 2016

pmorie commented Aug 23, 2016

luxas commented Aug 23, 2016

pmorie commented Aug 23, 2016

timothysc commented Aug 23, 2016

jingxu97 commented Aug 23, 2016

ncdc commented Aug 23, 2016

pmorie commented Aug 23, 2016

pmorie commented Aug 23, 2016

ncdc commented Aug 23, 2016

pmorie commented Aug 23, 2016

vishh commented Aug 23, 2016

pmorie commented Aug 23, 2016

vishh commented Aug 23, 2016

pmorie commented Aug 23, 2016

pmorie commented Aug 23, 2016 • edited Loading

jingxu97 commented Aug 23, 2016

saad-ali commented Aug 23, 2016

pmorie commented Aug 24, 2016

pmorie commented Aug 24, 2016

jeffvance commented Aug 24, 2016

ncdc commented Aug 24, 2016

ncdc commented Aug 24, 2016

pmorie commented Aug 24, 2016

saad-ali commented Jun 7, 2017

gnufied commented Jun 27, 2017

k8s-github-robot commented Sep 7, 2017

fejta-bot commented Jan 7, 2018

errordeveloper commented Jan 22, 2018

sp-joseluis-ledesma commented Jun 15, 2018

msau42 commented Jun 19, 2018

sp-joseluis-ledesma commented Jun 25, 2018

msau42 commented Jun 25, 2018

pmorie commented Aug 23, 2016 •

edited

Loading