Problem rescheduling POD with GCE PD disk attached #14642

Closed
rvrignaud opened this Issue Sep 28, 2015 · 37 comments

Comments

Projects
None yet
@rvrignaud

Hello,

I'm using a GKE (1.0.6) cluster. Today, for a yet unknown reason, a node has rebooted. This node used to have a pod with a GCE PD attached. This pod is scheduled by a RC with only one replica.
When the node rebooted, the pod has been rescheduled on an other node. However for some reason, the PD has not been detached from the old node.
The result was that Kubernetes tried multiple time to attach the disk to the new node. I got a lot of errors in GCE Operations Dashboard:

RESOURCE_IN_USE_BY_ANOTHER_RESOURCE: The disk resource 'projects/projectid/zones/europe-west1-c/disks/diskname' is already being used by 'projects/projectid/zones/europe-west1-c/instances/gke-nodename' 

At the end, the pod is in Waiting state with that reason:

Image: gcr.io/projectid/imagename:imagetag is not ready on the node

(which is not IMHO the right error message)

And events:

  Mon, 28 Sep 2015 07:09:34 +0200   Mon, 28 Sep 2015 10:50:26 +0200 124 {kubelet nodename}          failedMount Unable to mount volumes for pod "podname": Could not attach GCE PD "diskname". Timeout waiting for mount paths to be created.
  Mon, 28 Sep 2015 07:09:34 +0200   Mon, 28 Sep 2015 10:50:26 +0200 124 {kubelet nodename}          failedSync  Error syncing pod, skipping: Could not attach GCE PD "diskname". Timeout waiting for mount paths to be created.

As this a not critical service, I'm happy to let it for a few days in this state, if that can help for debugging.
Is there any other thing I could provide to help understand the problem ?

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Oct 5, 2015

Member

The sync loop in kubelet is responsible for detecting and unmounting/detaching PDs that are no longer being referenced by any pod.

The problem here is that when the node goes down (is rebooted) this logic does not get a chance to do this, therefore the PD remains attached to the node.

The Kubernetes scheduler reschedules the pod that was previously running on the downed node to another node. The new node realizes that it needs to attach the PD for the pending pod. This process then fails because the PD is already attached to the previous node (GCE PDs can only be attached to one machine in RW mode).

Potential solutions:

  • If we detect that a GCE attach call is failing because the PD is attached to a different node, add logic to automatically detach the disk from the other node so that attachment will succeed
    • Problems with this approach: could end up with a race between machines trying to attach/detach the same PD from each other.
  • Force some sort of garbage collection, when kubelet is killed that automatically detaches all disks.
    • Problems with this approach: How would this works?
Member

saad-ali commented Oct 5, 2015

The sync loop in kubelet is responsible for detecting and unmounting/detaching PDs that are no longer being referenced by any pod.

The problem here is that when the node goes down (is rebooted) this logic does not get a chance to do this, therefore the PD remains attached to the node.

The Kubernetes scheduler reschedules the pod that was previously running on the downed node to another node. The new node realizes that it needs to attach the PD for the pending pod. This process then fails because the PD is already attached to the previous node (GCE PDs can only be attached to one machine in RW mode).

Potential solutions:

  • If we detect that a GCE attach call is failing because the PD is attached to a different node, add logic to automatically detach the disk from the other node so that attachment will succeed
    • Problems with this approach: could end up with a race between machines trying to attach/detach the same PD from each other.
  • Force some sort of garbage collection, when kubelet is killed that automatically detaches all disks.
    • Problems with this approach: How would this works?

@saad-ali saad-ali added the sig/storage label Oct 5, 2015

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Oct 6, 2015

Member

If/When the "dead" kubelet comes back, won't it unmount the PD? This is far from ideal, obviously, but GCE VMs don't normally "die" and not come back. But they can if there's disk corruption or something, so we'll need to handle it.

Saad and I brainstormed a bit today - look for a design doc soon.

Member

thockin commented Oct 6, 2015

If/When the "dead" kubelet comes back, won't it unmount the PD? This is far from ideal, obviously, but GCE VMs don't normally "die" and not come back. But they can if there's disk corruption or something, so we'll need to handle it.

Saad and I brainstormed a bit today - look for a design doc soon.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Oct 6, 2015

Member

Hey all. Appreciate you guys focusing on this! We are a google cloud customer, and we have had production downtime because of this bug. We are worked no with support, and have verified this. Let us know if we can provide any information.

Once we have a tested patch, we would like to roll it into production.

Thanks again, don't want to have this kill a node again.

Member

chrislovecnm commented Oct 6, 2015

Hey all. Appreciate you guys focusing on this! We are a google cloud customer, and we have had production downtime because of this bug. We are worked no with support, and have verified this. Let us know if we can provide any information.

Once we have a tested patch, we would like to roll it into production.

Thanks again, don't want to have this kill a node again.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Oct 6, 2015

Member

Oh and I guess I cannot type on my phone without typos today

Member

chrislovecnm commented Oct 6, 2015

Oh and I guess I cannot type on my phone without typos today

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Oct 12, 2015

Member

@fgrzadkowski and other google gurus. Any update on this?

Member

chrislovecnm commented Oct 12, 2015

@fgrzadkowski and other google gurus. Any update on this?

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Oct 12, 2015

Member

@chrislovecnm I am going to send out a design proposal for this, since it is a major change (not a trivial bug fix). It is unlikely to make it into our v1.1 release, however it'll be my top priority thereafter.

Member

saad-ali commented Oct 12, 2015

@chrislovecnm I am going to send out a design proposal for this, since it is a major change (not a trivial bug fix). It is unlikely to make it into our v1.1 release, however it'll be my top priority thereafter.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Oct 13, 2015

Member

@saad-ali that is an awesome update. Peeps here will be happy!! We are adding monitoring to ensure that we catch this if we have another problem like this. Let me know when you think you will have a time estimate ... I know I know ... my bosses are asking ...

Member

chrislovecnm commented Oct 13, 2015

@saad-ali that is an awesome update. Peeps here will be happy!! We are adding monitoring to ensure that we catch this if we have another problem like this. Let me know when you think you will have a time estimate ... I know I know ... my bosses are asking ...

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Oct 16, 2015

Member

@chrislovecnm I can't comment on an ETA, but I will say it's high pri post v1.1.

Member

saad-ali commented Oct 16, 2015

@chrislovecnm I can't comment on an ETA, but I will say it's high pri post v1.1.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Oct 27, 2015

Member

While we work on a permanent fix, for folks running into this issue, here are a couple workarounds:

  1. Do nothing, wait for the downed node to come back up.
    • Once the original node is rebooted and kubelet comes up on it, it will realize that the pod/volume are no longer scheduled on it and will detach the PD. Then the other node will then be able to attach it.
    • This assumes that the original, downed node will come back up..
  2. Manually detach the disk.
    • Manually detach the disk using the gcloud API (CLI or REST or WebUI) from the downed node, and this will allow the other node to attach it.
Member

saad-ali commented Oct 27, 2015

While we work on a permanent fix, for folks running into this issue, here are a couple workarounds:

  1. Do nothing, wait for the downed node to come back up.
    • Once the original node is rebooted and kubelet comes up on it, it will realize that the pod/volume are no longer scheduled on it and will detach the PD. Then the other node will then be able to attach it.
    • This assumes that the original, downed node will come back up..
  2. Manually detach the disk.
    • Manually detach the disk using the gcloud API (CLI or REST or WebUI) from the downed node, and this will allow the other node to attach it.
@rvrignaud

This comment has been minimized.

Show comment
Hide comment
@rvrignaud

rvrignaud Oct 28, 2015

@saad-ali im my case the node come back up and kubelet did not detach the PD even if the pod was rescheduled in an other node. I had to detach it manually.

@saad-ali im my case the node come back up and kubelet did not detach the PD even if the pod was rescheduled in an other node. I had to detach it manually.

@paralin

This comment has been minimized.

Show comment
Hide comment
@paralin

paralin Nov 21, 2015

Contributor

Just ran into this.

Contributor

paralin commented Nov 21, 2015

Just ran into this.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Nov 21, 2015

Member

@saad-ali your release is out. What is the status on releasing this?

Member

chrislovecnm commented Nov 21, 2015

@saad-ali your release is out. What is the status on releasing this?

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Dec 2, 2015

Member

@chrislovecnm Yes v1.1 is out. I'm working on the detailed design for this at the moment. I can't comment on a release timeline.

Member

saad-ali commented Dec 2, 2015

@chrislovecnm Yes v1.1 is out. I'm working on the detailed design for this at the moment. I can't comment on a release timeline.

@MaxDaten

This comment has been minimized.

Show comment
Hide comment
@MaxDaten

MaxDaten Dec 2, 2015

Is this issue related to my problem with RC's or do I have a fundamental misunderstanding of PV & PVC with RCs? Or is my problem different and a new issue?
https://github.com/MaxDaten/k8s-pvc-rc

Edit:

Okay I found this: #4052 with reference to https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/user-guide/volumes.md#gcepersistentdisk

 Unfortunately, PDs can only be mounted by a single consumer in read-write mode - no simultaneous readers allowed.

I wasn't aware of this.

MaxDaten commented Dec 2, 2015

Is this issue related to my problem with RC's or do I have a fundamental misunderstanding of PV & PVC with RCs? Or is my problem different and a new issue?
https://github.com/MaxDaten/k8s-pvc-rc

Edit:

Okay I found this: #4052 with reference to https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/user-guide/volumes.md#gcepersistentdisk

 Unfortunately, PDs can only be mounted by a single consumer in read-write mode - no simultaneous readers allowed.

I wasn't aware of this.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Dec 2, 2015

Member

@saad-ali thanks ... went through a update on the production cluster, and this HAMMERED us again.... :(

Member

chrislovecnm commented Dec 2, 2015

@saad-ali thanks ... went through a update on the production cluster, and this HAMMERED us again.... :(

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Dec 2, 2015

Member

@chrislovecnm Can you send an email to me (email address in profile). I'd like to better understand why you guys are running into this so often, and if there's anything we can do in the immediate term to work around it. Thanks!

Member

saad-ali commented Dec 2, 2015

@chrislovecnm Can you send an email to me (email address in profile). I'd like to better understand why you guys are running into this so often, and if there's anything we can do in the immediate term to work around it. Thanks!

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Dec 3, 2015

Member

@saad-ali it was on an update gcloud command :) We are not planning on running the update again, any time shortly.

Member

chrislovecnm commented Dec 3, 2015

@saad-ali it was on an update gcloud command :) We are not planning on running the update again, any time shortly.

@housebolt

This comment has been minimized.

Show comment
Hide comment
@housebolt

housebolt Dec 15, 2015

+1, this has been a persistent problem with our internal apps and has made me very wary of moving any of our critical production to kube/GC. During cluster updates or after a node goes down, I have no guarantee that PD's will attach correctly, and that leaves us open to an unacceptable level of potential downtime

+1, this has been a persistent problem with our internal apps and has made me very wary of moving any of our critical production to kube/GC. During cluster updates or after a node goes down, I have no guarantee that PD's will attach correctly, and that leaves us open to an unacceptable level of potential downtime

@johanhaleby

This comment has been minimized.

Show comment
Hide comment
@johanhaleby

johanhaleby Feb 10, 2016

We're also experiencing this problem:

 FirstSeen  LastSeen    Count   From                    SubobjectPath   Reason      Message
  ─────────   ────────    ───── ────                    ───────────── ──────      ───────
  1h        1m      78  {kubelet gke-xxxxxxx-node-ky8x}         FailedMount Unable to mount volumes for pod "some-pod-124-noyvz_default": Could not attach GCE PD "mongodb-disk". Timeout waiting for mount paths to be created.
  1h        1m      78  {kubelet gke-yyyyyyy-117b8ac5-node-ky8x}            FailedSync  Error syncing pod, skipping: Could not attach GCE PD "mongodb-disk". Timeout waiting for mount paths to be created.

The disk seems to be fine though (gcloud compute disks list):

NAME                        ZONE           SIZE_GB TYPE        STATUS
mongodb-disk            europe-west1-c 1       pd-standard READY

We're also experiencing this problem:

 FirstSeen  LastSeen    Count   From                    SubobjectPath   Reason      Message
  ─────────   ────────    ───── ────                    ───────────── ──────      ───────
  1h        1m      78  {kubelet gke-xxxxxxx-node-ky8x}         FailedMount Unable to mount volumes for pod "some-pod-124-noyvz_default": Could not attach GCE PD "mongodb-disk". Timeout waiting for mount paths to be created.
  1h        1m      78  {kubelet gke-yyyyyyy-117b8ac5-node-ky8x}            FailedSync  Error syncing pod, skipping: Could not attach GCE PD "mongodb-disk". Timeout waiting for mount paths to be created.

The disk seems to be fine though (gcloud compute disks list):

NAME                        ZONE           SIZE_GB TYPE        STATUS
mongodb-disk            europe-west1-c 1       pd-standard READY
@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Feb 11, 2016

Member

We're working on a fix. It is targeted for the next minor release v1.3.0.

Member

saad-ali commented Feb 11, 2016

We're working on a fix. It is targeted for the next minor release v1.3.0.

@johanhaleby

This comment has been minimized.

Show comment
Hide comment
@johanhaleby

johanhaleby Feb 11, 2016

Nice to see that this is being working on. But what do we do in the meantime? We have a disk that we can't seem to mount that contains data for mongodb. I've tried deleting the pods and replication controller but whatever I try I can't seem to get the disk to be mounted. Luckily this happened in our test environment but we're running this is production as well and if it happens there it's quite bad.

Nice to see that this is being working on. But what do we do in the meantime? We have a disk that we can't seem to mount that contains data for mongodb. I've tried deleting the pods and replication controller but whatever I try I can't seem to get the disk to be mounted. Luckily this happened in our test environment but we're running this is production as well and if it happens there it's quite bad.

@spark2ignite

This comment has been minimized.

Show comment
Hide comment
@spark2ignite

spark2ignite Feb 11, 2016

@saad-ali isn't the timeframe for v1.3.0 is about 5 months away?

@saad-ali isn't the timeframe for v1.3.0 is about 5 months away?

@alexcouper

This comment has been minimized.

Show comment
Hide comment
@alexcouper

alexcouper Feb 11, 2016

@johanhaleby I see you're running on GKE. When i've had this issue in the past it's because the disk is mounted on a different node in the cluster than the one that the pod is booting on
I went into the google dev console and unmounted manually from the node and then recreated the RC for the pod and all worked.

@johanhaleby I see you're running on GKE. When i've had this issue in the past it's because the disk is mounted on a different node in the cluster than the one that the pod is booting on
I went into the google dev console and unmounted manually from the node and then recreated the RC for the pod and all worked.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Feb 11, 2016

Member

what do we do in the meantime?

@johanhaleby What @alexcouper recommended: detach the disk manually via gcloud API, CLI, or web UI that should unclog the pipes.

@saad-ali isn't the timeframe for v1.3.0 is at least 5 months away?

The next minor release is likely 2-5 months out. That said, if we have something before then that unblocks folks, we'll patch it back into an intermediate 1.2.x release, so it may be sooner than that--but no promises.

Member

saad-ali commented Feb 11, 2016

what do we do in the meantime?

@johanhaleby What @alexcouper recommended: detach the disk manually via gcloud API, CLI, or web UI that should unclog the pipes.

@saad-ali isn't the timeframe for v1.3.0 is at least 5 months away?

The next minor release is likely 2-5 months out. That said, if we have something before then that unblocks folks, we'll patch it back into an intermediate 1.2.x release, so it may be sooner than that--but no promises.

@johanhaleby

This comment has been minimized.

Show comment
Hide comment
@johanhaleby

johanhaleby Feb 12, 2016

@saad-ali @alexcouper Thanks that seems to work

@saad-ali @alexcouper Thanks that seems to work

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Mar 2, 2016

Member

@saad-ali ouch man ... 2-5 months? This has been open a long time. I assume that this problem has caused a major re-write? This is still on my oh crap tracking list with c-levels, please let us know when you have a closer ETA. We have not hit it again, but I am waiting for the next time it nips us ... Having mysql not restart gracefully is ZERO fun.

Member

chrislovecnm commented Mar 2, 2016

@saad-ali ouch man ... 2-5 months? This has been open a long time. I assume that this problem has caused a major re-write? This is still on my oh crap tracking list with c-levels, please let us know when you have a closer ETA. We have not hit it again, but I am waiting for the next time it nips us ... Having mysql not restart gracefully is ZERO fun.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Mar 2, 2016

Member

I hear ya @chrislovecnm. Hang tight! We're working on #21931 It'll be part of 1.2.X and will greatly reduce the bite of this issue, by gracefully handling volumes on restart. And the new controller shouldn't be too much longer either (i.e. before 1.3 release).

Member

saad-ali commented Mar 2, 2016

I hear ya @chrislovecnm. Hang tight! We're working on #21931 It'll be part of 1.2.X and will greatly reduce the bite of this issue, by gracefully handling volumes on restart. And the new controller shouldn't be too much longer either (i.e. before 1.3 release).

@wangjia184

This comment has been minimized.

Show comment
Hide comment
@wangjia184

wangjia184 Mar 4, 2016

+1

gcloud compute instances detach-disk NODE --disk DISK

+1

gcloud compute instances detach-disk NODE --disk DISK

@bussyjd

This comment has been minimized.

Show comment
Hide comment
@bussyjd

bussyjd Mar 8, 2016

Contributor

I ran in this problem and as my solution might not be related but well...
I was migrating my db to a new cluster located in another region and solved it moving the volume to the same region as the new k8s cluster
gcloud compute disks move [disk] --destination-zone [zone]
Hope that helps

Contributor

bussyjd commented Mar 8, 2016

I ran in this problem and as my solution might not be related but well...
I was migrating my db to a new cluster located in another region and solved it moving the volume to the same region as the new k8s cluster
gcloud compute disks move [disk] --destination-zone [zone]
Hope that helps

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Mar 10, 2016

Member

@bussyjd That's a different issue. GCE requires that a PD must be in the same zone as the instance that will use it.

Member

saad-ali commented Mar 10, 2016

@bussyjd That's a different issue. GCE requires that a PD must be in the same zone as the instance that will use it.

@discostur

This comment has been minimized.

Show comment
Hide comment
@discostur

discostur Apr 21, 2016

I think i ran into the same issue trying two deploy a pod with a GCE PD and a replication controller set to replicas=2.

Regarding the documentation that should be possible:

Using a PD on a pod controlled by a ReplicationController will fail unless the PD is read-only or the replica count is 0 or 1.

So i setup a new cluster with one minion and Kubernetes 1.2.2. Then i created a PD, PersistentVolumes and my ReplicationController:


apiVersion: v1
kind: PersistentVolume
metadata:
name: volume-nginx-data-disk
spec:
capacity:
storage: 1Gi
accessModes:
- ReadOnlyMany
gcePersistentDisk:
pdName: nginx-data-disk
fsType: ext4


kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: claim-nginx-data-disk
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Gi


apiVersion: v1
kind: ReplicationController
metadata:
name: sslproxy-rc
labels:
name: sslproxy-rc
spec:
replicas: 2
selector:
name: sslproxy-rc
template:
metadata:
labels:
name: sslproxy-rc
spec:
containers:
- name: nginx
image: nginx
ports:
- name: nginx-ssl
containerPort: 443
volumeMounts:
- name: nginx-data-disk
mountPath: /etc/nginx/conf.d
volumes:
- name: nginx-data-disk
persistentVolumeClaim:
claimName: claim-nginx-data-disk
readOnly: true


So everything should be mounted in "readOnly" mode. When i try to deploy my RC, only the first pod is created, the second is always stuck with status "Pending".

kubectl describe pod xyz:

Unable to mount volumes for pod "sslproxy-rc-zxe52_default(a532eb8f-07f2-11e6-8707-42010af00108)": Could not attach GCE PD "nginx-data-disk". Timeout waiting for mount paths to be created.

{kubelet gke-sslproxy-cluster1-default-pool-4091b8f3-wys3} Warning FailedSync Error syncing pod, skipping: Could not attach GCE PD "nginx-data-disk". Timeout waiting for mount paths to be created.

kubelet.log:

GCE operation failed: googleapi: Error 400: The disk resource 'nginx-data-disk' is already being used by 'gke-sslproxy-cluster1-default-pool-4091b8f3-wys3'
gce_util.go:187] Error attaching PD "nginx-data-disk": googleapi: Error 400: The disk resource 'nginx-data-disk' is already being used by 'gke-sslproxy-cluster1-default-pool-4091b8f3-wys3'

I think that is because of (#21931)

When two or more pods specify the same volume (allowed for some plugins in certain cases), the second pod will fail to start because because the volume attach call for the 2nd pod will continuously fail since the volume is already attached (by the first pod).

Kilian

discostur commented Apr 21, 2016

I think i ran into the same issue trying two deploy a pod with a GCE PD and a replication controller set to replicas=2.

Regarding the documentation that should be possible:

Using a PD on a pod controlled by a ReplicationController will fail unless the PD is read-only or the replica count is 0 or 1.

So i setup a new cluster with one minion and Kubernetes 1.2.2. Then i created a PD, PersistentVolumes and my ReplicationController:


apiVersion: v1
kind: PersistentVolume
metadata:
name: volume-nginx-data-disk
spec:
capacity:
storage: 1Gi
accessModes:
- ReadOnlyMany
gcePersistentDisk:
pdName: nginx-data-disk
fsType: ext4


kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: claim-nginx-data-disk
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Gi


apiVersion: v1
kind: ReplicationController
metadata:
name: sslproxy-rc
labels:
name: sslproxy-rc
spec:
replicas: 2
selector:
name: sslproxy-rc
template:
metadata:
labels:
name: sslproxy-rc
spec:
containers:
- name: nginx
image: nginx
ports:
- name: nginx-ssl
containerPort: 443
volumeMounts:
- name: nginx-data-disk
mountPath: /etc/nginx/conf.d
volumes:
- name: nginx-data-disk
persistentVolumeClaim:
claimName: claim-nginx-data-disk
readOnly: true


So everything should be mounted in "readOnly" mode. When i try to deploy my RC, only the first pod is created, the second is always stuck with status "Pending".

kubectl describe pod xyz:

Unable to mount volumes for pod "sslproxy-rc-zxe52_default(a532eb8f-07f2-11e6-8707-42010af00108)": Could not attach GCE PD "nginx-data-disk". Timeout waiting for mount paths to be created.

{kubelet gke-sslproxy-cluster1-default-pool-4091b8f3-wys3} Warning FailedSync Error syncing pod, skipping: Could not attach GCE PD "nginx-data-disk". Timeout waiting for mount paths to be created.

kubelet.log:

GCE operation failed: googleapi: Error 400: The disk resource 'nginx-data-disk' is already being used by 'gke-sslproxy-cluster1-default-pool-4091b8f3-wys3'
gce_util.go:187] Error attaching PD "nginx-data-disk": googleapi: Error 400: The disk resource 'nginx-data-disk' is already being used by 'gke-sslproxy-cluster1-default-pool-4091b8f3-wys3'

I think that is because of (#21931)

When two or more pods specify the same volume (allowed for some plugins in certain cases), the second pod will fail to start because because the volume attach call for the 2nd pod will continuously fail since the volume is already attached (by the first pod).

Kilian

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Apr 21, 2016

Member

@discostur That's a known symptom of the same issue.

Member

saad-ali commented Apr 21, 2016

@discostur That's a known symptom of the same issue.

@ScyDev

This comment has been minimized.

Show comment
Hide comment
@ScyDev

ScyDev May 20, 2016

We've also experienced this multiple times. In our case it wasn't a new node that had a problem because the disk was not properly detached from another node.
This node existed for weeks. It was like the node was complaining that it couldn't attach the disk because it was already attached to itself.

Stackdriver log for node gke-test-cluster-default-pool-a7afbbdb-i0sb

11:28:01.000
GCE operation failed: googleapi: Error 400: The disk resource 'test-mongo-disk-test' is already being used by 'gke-test-cluster-default-pool-a7afbbdb-i0sb'
{
  metadata: {…}   
  insertId: "2016-05-20|02:28:03.111393-07|10.194.248.228|101325354"   
  log: "kubelet"   
  structPayload: {…}   
}

Manually detaching the disk in question helps: https://cloud.google.com/sdk/gcloud/reference/compute/instances/detach-disk

After that the node can reattach the disk properly and the pod can start again.

ScyDev commented May 20, 2016

We've also experienced this multiple times. In our case it wasn't a new node that had a problem because the disk was not properly detached from another node.
This node existed for weeks. It was like the node was complaining that it couldn't attach the disk because it was already attached to itself.

Stackdriver log for node gke-test-cluster-default-pool-a7afbbdb-i0sb

11:28:01.000
GCE operation failed: googleapi: Error 400: The disk resource 'test-mongo-disk-test' is already being used by 'gke-test-cluster-default-pool-a7afbbdb-i0sb'
{
  metadata: {…}   
  insertId: "2016-05-20|02:28:03.111393-07|10.194.248.228|101325354"   
  log: "kubelet"   
  structPayload: {…}   
}

Manually detaching the disk in question helps: https://cloud.google.com/sdk/gcloud/reference/compute/instances/detach-disk

After that the node can reattach the disk properly and the pod can start again.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali May 20, 2016

Member

@ScyDev That's #19953 Should be fixed at the same time as this.

Member

saad-ali commented May 20, 2016

@ScyDev That's #19953 Should be fixed at the same time as this.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jun 1, 2016

Member

@saad-ali is this getting fixed in 1.3?? What is going on with this never ending issue 😄

Member

chrislovecnm commented Jun 1, 2016

@saad-ali is this getting fixed in 1.3?? What is going on with this never ending issue 😄

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jun 1, 2016

Member

It is indeed my friend #25457
Hang tight =)

Member

saad-ali commented Jun 1, 2016

It is indeed my friend #25457
Hang tight =)

k8s-merge-robot added a commit that referenced this issue Jun 3, 2016

Merge pull request #26351 from saad-ali/attachDetachControllerKubelet…
…Changes

Automatic merge from submit-queue

Attach/Detach Controller Kubelet Changes

This PR contains changes to enable attach/detach controller proposed in #20262.

Specifically it:
* Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled.
* Removes all references `SafeToDetach` annotation from controller.
* Adds the new `VolumesInUse` field to the Node Status API object.
* Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment.
* Modifies kubelet to set `VolumesInUse` before Mount and after Unmount.
  * There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue kubernetes/node-problem-detector#9 (comment) opened to fix that.
  * There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor.
* Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first.
* Fix misc bugs in controller.
* Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached.

Fixes #14642, #19953

```release-note
Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation.

A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not.
```
@nikunjchapadia

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment