Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume is not detached/attached to a new node when pod is scheduled to other node #28671

Closed
chao007 opened this issue Jul 8, 2016 · 8 comments

Comments

@chao007
Copy link

chao007 commented Jul 8, 2016

Version-Release number of selected component (if applicable):
Server Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-alpha.0.1310+8741217179860e", GitCommit:"8741217179860e9f0ce37997c810f61170a3672a", GitTreeState:"clean", BuildDate:"2016-07-07T06:54:04Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

Steps to Reproduce:
1.Install kubernetes with one master and two nodes
2.Create a pod using replicationcontroller
apiVersion: v1
kind: ReplicationController
metadata:
name: chaoyangwildfly-rc
labels:
name: chaoyangwildfly
context: docker-k8s-lab
spec:
replicas: 1
template:
metadata:
labels:
name: chaoyangwildfly
spec:
containers:
- name: chaoyangwildfly-rc-pod
image: jhou/hello-openshift
ports:
- containerPort: 8080
volumeMounts:
- name: html-volume
mountPath: "/usr/share/nginx/html"
volumes:
- name: html-volume
awsElasticBlockStore:
volumeID: aws://us-east-1d/vol-dde44879
fsType: ext4
3.Check pod status
[root@ip-172-18-5-143 ~]# kubectl describe pods chaoyangwildfly-rc-ccx51
Name: chaoyangwildfly-rc-ccx51
Namespace: default
Node: ip-172-18-0-61.ec2.internal/172.18.0.61
Start Time: Fri, 08 Jul 2016 02:56:00 -0400
Labels: name=chaoyangwildfly
Status: Running
IP: 172.16.73.10
Controllers: ReplicationController/chaoyangwildfly-rc
Containers:
chaoyangwildfly-rc-pod:
Container ID: docker://0bb522c3d8305c13decef15c5624637bb3824021be78ae75b086be1560369817
Image: jhou/hello-openshift
Image ID: docker://sha256:3642a95271f490f9d618e29128a089ebaaf58f8f3f4e556c02660b54ebb881fd
Port: 8080/TCP
QoS Tier:
memory: BestEffort
cpu: BestEffort
State: Running
Started: Fri, 08 Jul 2016 02:56:08 -0400
Ready: True
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
html-volume:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-east-1d/vol-dde44879
FSType: ext4
Partition: 0
ReadOnly: false
default-token-vs6y1:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vs6y1
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message


5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned chaoyangwildfly-rc-ccx51 to ip-172-18-0-61.ec2.internal
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Pulling pulling image "jhou/hello-openshift"
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Pulled Successfully pulled image "jhou/hello-openshift"
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Created Created container with docker id 0bb522c3d830
5m 5m 1 {kubelet ip-172-18-0-61.ec2.internal} spec.containers{chaoyangwildfly-rc-pod} Normal Started Started container with docker id 0bb522c3d830

4..After the pod is running, stop kubelet service on the node ip-172-18-0-61.ec2.internal
5. A new pod will be assigned to the other node
[root@ip-172-18-5-143 ~]# kubectl describe pods chaoyangwildfly-rc-hn9kn
Name: chaoyangwildfly-rc-hn9kn
Namespace: default
Node: ip-172-18-9-229.ec2.internal/172.18.9.229
Start Time: Fri, 08 Jul 2016 03:02:31 -0400
Labels: name=chaoyangwildfly
Status: Pending
IP:
Controllers: ReplicationController/chaoyangwildfly-rc
Containers:
chaoyangwildfly-rc-pod:
Container ID:
Image: jhou/hello-openshift
Image ID:
Port: 8080/TCP
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
html-volume:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-east-1d/vol-dde44879
FSType: ext4
Partition: 0
ReadOnly: false
default-token-vs6y1:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vs6y1
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message


4m 4m 1 {default-scheduler } Normal Scheduled Successfully assigned chaoyangwildfly-rc-hn9kn to ip-172-18-9-229.ec2.internal
2m 1s 2 {kubelet ip-172-18-9-229.ec2.internal} Warning FailedMount Unable to mount volumes for pod "chaoyangwildfly-rc-hn9kn_default(f059eb80-44d9-11e6-ba51-0ecfeba772c9)": timeout expired waiting for volumes to attach/mount for pod "chaoyangwildfly-rc-hn9kn"/"default". list of unattached/unmounted volumes=[html-volume]
2m 1s 2 {kubelet ip-172-18-9-229.ec2.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "chaoyangwildfly-rc-hn9kn"/"default". list of unattached/unmounted volumes=[html-volume]

  1. From aws web console, I foud volume vol-dde44879 is still attched to node ip-172-18-0-61.ec2.internal
@dhawal55
Copy link
Contributor

dhawal55 commented Jul 8, 2016

Duplicate of #28643

@chao007
Copy link
Author

chao007 commented Jul 11, 2016

Seems not the same issue. I can mount ebs volume to the node, but failed when re-scheduled the pod from one node to another node because the ebs volume not detached

@rootfs
Copy link
Contributor

rootfs commented Jul 26, 2016

@chao007 do you have kubelet logs from both nodes?

@eparis
Copy link
Contributor

eparis commented Aug 15, 2016

@kubernetes/rh-storage @kubernetes/sig-storage

@saad-ali
Copy link
Member

@chao007 Also, how long did you wait? Master expects a kubelet to be down for 5min 40sec before it considers the node down and evicts the pods that were scheduled to it. After the pods are evicted (and rescheduled to another node), master will wait 6min before it unilaterally detaches volumes from the downed node. So from the time you kill kubelet to the time that master begins to detach, you're looking at 11 min 30 seconds, add to that the time to detach from first node and time to attach to new node.

@chao007
Copy link
Author

chao007 commented Aug 16, 2016

@saad-ali I did not remember the accurate time I waited.
I just re-try this on openshift today, step is like https://bugzilla.redhat.com/show_bug.cgi?id=1335293#c6
But this time, the ebs volume can be detached and attached to another node.
Openshift version is
openshift v3.3.0.19
kubernetes v1.3.0+507d3a7

[root@ip-172-18-3-41 ~]# oc get pods recreate-example-1-9ce1q -o wide
NAME READY STATUS RESTARTS AGE IP NODE
recreate-example-1-9ce1q 1/1 Running 0 40s 10.1.0.6 ip-172-18-11-49.ec2.internal
after re-deploy the pod,
[root@ip-172-18-3-41 ~]# oc get pods recreate-example-2-j6z26 -o wide
NAME READY STATUS RESTARTS AGE IP NODE
recreate-example-2-j6z26 1/1 Running 0 48s 10.1.1.3 ip-172-18-3-41.ec2.internal

@saad-ali
Copy link
Member

@chao007 If you run into this again, please share your /var/log/kubelet.log files from your node and /var/log/kube-controller-manager.log files from your master. That'll help us better debug the issue you experienced.

@chao007
Copy link
Author

chao007 commented Aug 17, 2016

Thanks @saad-ali
Close this due to could not reproduce right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants