Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Storage: devicePath is empty while WaitForAttach in StatefulSets #67342
Is this a BUG REPORT or FEATURE REQUEST?:
I created a
When that happens I can get it working by deleting the pod again, it can happen that the error happens again after that but usually not three times in a row.
The main issue is that if you don't act manually on it it will continue to be reconciled by
The issue seems to be on
What you expected to happen:
The pod comes back with no error.
How to reproduce it (as minimally and precisely as possible):
At this point one of two things can happen:
I haven't found a reliable way to make one or the other happen when I wanted, it seems to be very random but I'm sure that it only happens when the pod is recreated on the same node.
Statefulset to reproduce
apiVersion: v1 kind: Namespace metadata: name: repro-devicepath --- apiVersion: apps/v1 kind: StatefulSet metadata: name: myrepro namespace: repro-devicepath labels: component: myrepro spec: serviceName: myrepro selector: matchLabels: component: myrepro replicas: 4 template: metadata: name: myrepro labels: component: myrepro spec: containers: - name: myrepro image: docker.io/fntlnz/caturday:latest volumeMounts: - name: data mountPath: /data volumeClaimTemplates: - metadata: namespace: repro-devicepath name: data spec: storageClassName: ebs-1 accessModes: - "ReadWriteOnce" resources: requests: storage: 1Gi
Anything else we need to know?:
When this happens, if one looks at
kubectl get node -o json | jq ".items.status.volumesAttached"
I found some other users on slack that have this problem, @wirewc sent me this (note the empty
I'm hitting this on a production cluster on 1.10.3 but I get the same error on a testing cluster that has 1.11
FWIW - devicePath being empty for iSCSI is expected. iSCSI does not perform "real" attach/detach, so naturally there is no
We need to see corresponding entry for EBS when that happens. iSCSI could be a red herring.
@gnufied just happened and noted that when this error occurs the volume is marked as attached on aws as
It is also not listed in any process
The only relevant log I see in the kubelet is:
I'm starting thinking that the problem here is happening because of a dirty unmount rather then a bad mount.
Hi, we're also facing the same issue. We can confirm that the unmount is left from previous pod. The difference in our scenario is that we were trying to upgrade from 1.11.1 to 1.11.2. We initally thought it had something to do with the versions. But here is what our hypothesis is:
When pod gets deleted first time, it leaves the mount behind.
When the scheduler puts back the container on the kubelet, kubelet tries to mount again. But the
It makes me wonder why does
Looks like the kube e2e is also running into this as part of
added a commit
Aug 26, 2018
I opened the kubelet --v=10 in my test cluster, in our scenario (with intermittent long write operations on the disk). This bug trigger is 10%. By analyzing the logs, I found the cause of the problem and succeeded in reproducing the problem by adding a new unit test to prove that the bug existed.
The cause is that AWS EBS sometimes attacher.UnmountDevice slowly (10s ++ ), UnmountDevice is an asynchronous function and at the same time，
This has caused some code reconciler.reconcile can't run as expected
and i'm glad/want to contribute code. btw i think reconciler lifecyle is complex，I've been looking for a long time to search pr/issue about reconciler lifecyle. But I didn't find it, so I didn't know how to fix the bug correctly. maybe insure after executed UnmountDevice to excute MarkVolumeAsDetached
p.s. English is not my mother tongue; please excuse any errors on my part. if have not understood, please see WPH95#1 or mention me :)
referenced this issue
Sep 1, 2018
referenced a pull request that will
Sep 20, 2018
We see the same issue, in particular, when the same volume is repeatedly mounted/dismounted on the same node. Here are some logs showing successful mount, dismount, and then a failed mount with empty device path:
I see the same issue from time to time on one of my openstack k8s clusters (v1.11.3).
The cluster has only one worker node. When the error occurs the node shows the resp. volume as attached:
I hit this error in one pod of a StatefulSet on k8s v1.11.5:
When I look at the node, I see the devicePath looks normal:
After deleting the pod, the problem resolved.