Volume is not detached/attached to a new node when pod is scheduled to other node #28671

chao007 · 2016-07-08T08:45:33Z

Version-Release number of selected component (if applicable):
Server Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-alpha.0.1310+8741217179860e", GitCommit:"8741217179860e9f0ce37997c810f61170a3672a", GitTreeState:"clean", BuildDate:"2016-07-07T06:54:04Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

Steps to Reproduce:
1.Install kubernetes with one master and two nodes
2.Create a pod using replicationcontroller
apiVersion: v1
kind: ReplicationController
metadata:
name: chaoyangwildfly-rc
labels:
name: chaoyangwildfly
context: docker-k8s-lab
spec:
replicas: 1
template:
metadata:
labels:
name: chaoyangwildfly
spec:
containers:
- name: chaoyangwildfly-rc-pod
image: jhou/hello-openshift
ports:
- containerPort: 8080
volumeMounts:
- name: html-volume
mountPath: "/usr/share/nginx/html"
volumes:
- name: html-volume
awsElasticBlockStore:
volumeID: aws://us-east-1d/vol-dde44879
fsType: ext4
3.Check pod status
[root@ip-172-18-5-143 ~]# kubectl describe pods chaoyangwildfly-rc-ccx51
Name: chaoyangwildfly-rc-ccx51
Namespace: default
Node: ip-172-18-0-61.ec2.internal/172.18.0.61
Start Time: Fri, 08 Jul 2016 02:56:00 -0400
Labels: name=chaoyangwildfly
Status: Running
IP: 172.16.73.10
Controllers: ReplicationController/chaoyangwildfly-rc
Containers:
chaoyangwildfly-rc-pod:
Container ID: docker://0bb522c3d8305c13decef15c5624637bb3824021be78ae75b086be1560369817
Image: jhou/hello-openshift
Image ID: docker://sha256:3642a95271f490f9d618e29128a089ebaaf58f8f3f4e556c02660b54ebb881fd
Port: 8080/TCP
QoS Tier:
memory: BestEffort
cpu: BestEffort
State: Running
Started: Fri, 08 Jul 2016 02:56:08 -0400
Ready: True
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
html-volume:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-east-1d/vol-dde44879
FSType: ext4
Partition: 0
ReadOnly: false
default-token-vs6y1:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vs6y1
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message

5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned chaoyangwildfly-rc-ccx51 to ip-172-18-0-61.ec2.internal
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Pulling pulling image "jhou/hello-openshift"
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Pulled Successfully pulled image "jhou/hello-openshift"
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Created Created container with docker id 0bb522c3d830
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Started Started container with docker id 0bb522c3d830

4..After the pod is running, stop kubelet service on the node ip-172-18-0-61.ec2.internal
5. A new pod will be assigned to the other node
[root@ip-172-18-5-143 ~]# kubectl describe pods chaoyangwildfly-rc-hn9kn
Name: chaoyangwildfly-rc-hn9kn
Namespace: default
Node: ip-172-18-9-229.ec2.internal/172.18.9.229
Start Time: Fri, 08 Jul 2016 03:02:31 -0400
Labels: name=chaoyangwildfly
Status: Pending
IP:
Controllers: ReplicationController/chaoyangwildfly-rc
Containers:
chaoyangwildfly-rc-pod:
Container ID:
Image: jhou/hello-openshift
Image ID:
Port: 8080/TCP
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
html-volume:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-east-1d/vol-dde44879
FSType: ext4
Partition: 0
ReadOnly: false
default-token-vs6y1:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vs6y1
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message

4m 4m 1 {default-scheduler } Normal Scheduled Successfully assigned chaoyangwildfly-rc-hn9kn to ip-172-18-9-229.ec2.internal
2m 1s 2 {kubelet ip-172-18-9-229.ec2.internal} Warning FailedMount Unable to mount volumes for pod "chaoyangwildfly-rc-hn9kn_default(f059eb80-44d9-11e6-ba51-0ecfeba772c9)": timeout expired waiting for volumes to attach/mount for pod "chaoyangwildfly-rc-hn9kn"/"default". list of unattached/unmounted volumes=[html-volume]
2m 1s 2 {kubelet ip-172-18-9-229.ec2.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "chaoyangwildfly-rc-hn9kn"/"default". list of unattached/unmounted volumes=[html-volume]

From aws web console, I foud volume vol-dde44879 is still attched to node ip-172-18-0-61.ec2.internal

dhawal55 · 2016-07-08T16:22:31Z

Duplicate of #28643

chao007 · 2016-07-11T03:32:30Z

Seems not the same issue. I can mount ebs volume to the node, but failed when re-scheduled the pod from one node to another node because the ebs volume not detached

rootfs · 2016-07-26T14:50:03Z

@chao007 do you have kubelet logs from both nodes?

eparis · 2016-08-15T17:43:59Z

@kubernetes/rh-storage @kubernetes/sig-storage

saad-ali · 2016-08-16T00:26:56Z

@chao007 Also, how long did you wait? Master expects a kubelet to be down for 5min 40sec before it considers the node down and evicts the pods that were scheduled to it. After the pods are evicted (and rescheduled to another node), master will wait 6min before it unilaterally detaches volumes from the downed node. So from the time you kill kubelet to the time that master begins to detach, you're looking at 11 min 30 seconds, add to that the time to detach from first node and time to attach to new node.

chao007 · 2016-08-16T08:42:06Z

@saad-ali I did not remember the accurate time I waited.
I just re-try this on openshift today, step is like https://bugzilla.redhat.com/show_bug.cgi?id=1335293#c6
But this time, the ebs volume can be detached and attached to another node.
Openshift version is
openshift v3.3.0.19
kubernetes v1.3.0+507d3a7

[root@ip-172-18-3-41 ~]# oc get pods recreate-example-1-9ce1q -o wide
NAME READY STATUS RESTARTS AGE IP NODE
recreate-example-1-9ce1q 1/1 Running 0 40s 10.1.0.6 ip-172-18-11-49.ec2.internal
after re-deploy the pod,
[root@ip-172-18-3-41 ~]# oc get pods recreate-example-2-j6z26 -o wide
NAME READY STATUS RESTARTS AGE IP NODE
recreate-example-2-j6z26 1/1 Running 0 48s 10.1.1.3 ip-172-18-3-41.ec2.internal

saad-ali · 2016-08-17T04:40:13Z

@chao007 If you run into this again, please share your /var/log/kubelet.log files from your node and /var/log/kube-controller-manager.log files from your master. That'll help us better debug the issue you experienced.

chao007 · 2016-08-17T08:13:04Z

Thanks @saad-ali
Close this due to could not reproduce right now.

k8s-github-robot added area/kubelet team/cluster labels Aug 3, 2016

chao007 closed this as completed Aug 17, 2016

Rotwang mentioned this issue Sep 22, 2016

Openstack cinder volumes not detached from downed vm when pod is rescheduled to another node. #33288

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volume is not detached/attached to a new node when pod is scheduled to other node #28671

Volume is not detached/attached to a new node when pod is scheduled to other node #28671

chao007 commented Jul 8, 2016

dhawal55 commented Jul 8, 2016

chao007 commented Jul 11, 2016

rootfs commented Jul 26, 2016

eparis commented Aug 15, 2016

saad-ali commented Aug 16, 2016

chao007 commented Aug 16, 2016

saad-ali commented Aug 17, 2016

chao007 commented Aug 17, 2016

Volume is not detached/attached to a new node when pod is scheduled to other node #28671

Volume is not detached/attached to a new node when pod is scheduled to other node #28671

Comments

chao007 commented Jul 8, 2016

dhawal55 commented Jul 8, 2016

chao007 commented Jul 11, 2016

rootfs commented Jul 26, 2016

eparis commented Aug 15, 2016

saad-ali commented Aug 16, 2016

chao007 commented Aug 16, 2016

saad-ali commented Aug 17, 2016

chao007 commented Aug 17, 2016