Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E test to verify pod failover during node power-off #54509

Merged

Conversation

shaominchen
Copy link

@shaominchen shaominchen commented Oct 24, 2017

What this PR does / why we need it:

This PR adds test to verify volume status after the node where the pod got provisioned being powered off and failed over to a different node.

Test performs following tasks:

  1. Create a StorageClass
  2. Create a PVC with the StorageClass
  3. Create a Deployment with 1 replica, using the PVC
  4. Verify the pod got provisioned on a node
  5. Verify the volume is attached to the node
  6. Power off the node where pod got provisioned
  7. Verify the pod got provisioned on a different node
  8. Verify the volume is attached to the new node
  9. Verify the volume is detached from the previous node
  10. Power on the previous node
  11. Delete the Deployment
  12. Delete the PVC
  13. Delete the StorageClass

Which issue this PR fixes:

Fixes vmware-archive#272

Special notes for your reviewer:

Test logs:

# go run hack/e2e.go --check-version-skew=false --v --test --test_args='--ginkgo.focus=Node\sPoweroff'
flag provided but not defined: -check-version-skew
Usage of /tmp/go-build212295472/command-line-arguments/_obj/exe/e2e:
  -get
                go get -u kubetest if old or not installed (default true)
  -old duration
                Consider kubetest old if it exceeds this (default 24h0m0s)
2017/10/24 11:48:28 e2e.go:55: NOTICE: go run hack/e2e.go is now a shim for test-infra/kubetest
2017/10/24 11:48:28 e2e.go:56:   Usage: go run hack/e2e.go [--get=true] [--old=24h0m0s] -- [KUBETEST_ARGS]
2017/10/24 11:48:28 e2e.go:57:   The separator is required to use --get or --old flags
2017/10/24 11:48:28 e2e.go:58:   The -- flag separator also suppresses this message
2017/10/24 11:48:28 e2e.go:77: Calling kubetest --check-version-skew=false --v --test --test_args=--ginkgo.focus=Node\sPoweroff...
2017/10/24 11:48:28 util.go:154: Running: ./cluster/kubectl.sh --match-server-version=false version
2017/10/24 11:48:28 util.go:156: Step './cluster/kubectl.sh --match-server-version=false version' finished in 350.700421ms
2017/10/24 11:48:28 util.go:154: Running: ./hack/e2e-internal/e2e-status.sh
Skeleton Provider: prepare-e2e not implemented
Client Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.0-alpha.1.1627+54fc02df4a3a2a", GitCommit:"54fc02df4a3a2a12e14fb72d84a1aaa658ba6689", GitTreeState:"clean", BuildDate:"2017-10-24T18:33:37Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.0-alpha.1.1437+ba66fcb63de9e9", GitCommit:"ba66fcb63de9e9b72e2ccf8b823df33a22df0522", GitTreeState:"clean", BuildDate:"2017-10-20T07:16:05Z", GoVersion:"go1.9.1", Compiler:"gc", Platform:"linux/amd64"}
2017/10/24 11:48:28 util.go:156: Step './hack/e2e-internal/e2e-status.sh' finished in 315.334518ms
2017/10/24 11:48:28 util.go:154: Running: ./hack/ginkgo-e2e.sh --ginkgo.focus=Node\sPoweroff
Conformance test: not doing test setup.
Oct 24 11:48:30.391: INFO: Overriding default scale value of zero to 1
Oct 24 11:48:30.391: INFO: Overriding default milliseconds value of zero to 5000
I1024 11:48:30.637436     409 e2e.go:378] Starting e2e run "ed9fdfc7-b8eb-11e7-a595-0050569c26b8" on Ginkgo node 1
Running Suite: Kubernetes e2e suite
===================================
Random Seed: 1508870909 - Will randomize all specs
Will run 1 of 717 specs
 
Oct 24 11:48:30.678: INFO: >>> kubeConfig: /root/.kube/config
Oct 24 11:48:30.685: INFO: Waiting up to 4h0m0s for all (but 0) nodes to be schedulable
Oct 24 11:48:30.719: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Oct 24 11:48:30.857: INFO: 17 / 17 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Oct 24 11:48:30.857: INFO: expected 4 pod replicas in namespace 'kube-system', 4 are Running and Ready.
Oct 24 11:48:30.863: INFO: Waiting for pods to enter Success, but no pods in "kube-system" match label map[name:e2e-image-puller]
Oct 24 11:48:30.863: INFO: Dumping network health container logs from all nodes...
Oct 24 11:48:30.877: INFO: Client version: v1.9.0-alpha.1.1627+54fc02df4a3a2a
Oct 24 11:48:30.879: INFO: Server version: v1.9.0-alpha.1.1437+ba66fcb63de9e9
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
[sig-storage] Node Poweroff [Feature:vsphere] [Slow] [Disruptive]
  verify volume status after node power off
  /root/divyenp/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/storage/vsphere_volume_node_poweroff.go:149
[BeforeEach] [sig-storage] Node Poweroff [Feature:vsphere] [Slow] [Disruptive]
  /root/divyenp/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:133
STEP: Creating a kubernetes client
Oct 24 11:48:30.885: INFO: >>> kubeConfig: /root/.kube/config
STEP: Building a namespace api object
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-storage] Node Poweroff [Feature:vsphere] [Slow] [Disruptive]
  /root/divyenp/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/storage/vsphere_volume_node_poweroff.go:64
Oct 24 11:48:30.984: INFO: Waiting up to 4h0m0s for all (but 0) nodes to be schedulable
[It] verify volume status after node power off
  /root/divyenp/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/storage/vsphere_volume_node_poweroff.go:149
STEP: Creating a Storage Class
STEP: Creating PVC using the Storage Class
STEP: Waiting for PVC to be in bound phase
Oct 24 11:48:31.141: INFO: Waiting up to 5m0s for PersistentVolumeClaim pvc-zxz56 to have phase Bound
Oct 24 11:48:31.150: INFO: PersistentVolumeClaim pvc-zxz56 found but phase is Pending instead of Bound.
Oct 24 11:48:33.155: INFO: PersistentVolumeClaim pvc-zxz56 found and phase=Bound (2.013403698s)
STEP: Creating a Deployment
I1024 11:48:33.180161     409 deployment_util.go:254] Waiting deployment "deployment-ef6b820e-b8eb-11e7-a595-0050569c26b8" to complete
Oct 24 11:48:33.192: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:0, Replicas:0, UpdatedReplicas:0, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:0, Conditions:[]v1beta1.DeploymentCondition(nil), CollisionCount:(*int32)(nil)}
Oct 24 11:48:35.197: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:37.197: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:39.196: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:41.197: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:43.197: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:45.197: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:47.198: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:49.198: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:51.196: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
Oct 24 11:48:53.197: INFO: deployment status: v1beta1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1beta1.DeploymentCondition{v1beta1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, LastTransitionTime:v1.Time{Time:time.Time{sec:63644467713, nsec:0, loc:(*time.Location)(0x5db10c0)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}}, CollisionCount:(*int32)(nil)}
STEP: Get pod from the deployement
STEP: Verify disk is attached to the node: kubernetes-node5
STEP: Power off the node: kubernetes-node5
Oct 24 11:49:07.337: INFO: Waiting for pod to be failed over from "kubernetes-node5"
Oct 24 11:49:17.336: INFO: Waiting for pod to be failed over from "kubernetes-node5"
Oct 24 11:49:27.340: INFO: Waiting for pod to be failed over from "kubernetes-node5"
Oct 24 11:49:37.340: INFO: The pod has been failed over from "kubernetes-node5" to "kubernetes-node7"
STEP: Waiting for disk to be attached to the new node: kubernetes-node7
Oct 24 11:49:47.534: INFO: Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" has successfully attached to "kubernetes-node7".
STEP: Waiting for disk to be detached from the previous node: kubernetes-node5
Oct 24 11:49:57.707: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:50:07.702: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:50:17.710: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:50:27.733: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:50:37.713: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:50:47.723: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:50:57.705: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:51:07.710: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:51:17.719: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:51:27.716: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:51:37.717: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:51:47.712: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:51:57.707: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:52:07.724: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:52:17.716: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:52:27.711: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:52:37.716: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:52:47.709: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:52:57.714: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:53:07.715: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:53:17.711: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:53:27.714: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:53:37.713: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:53:47.705: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:53:57.711: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:54:07.712: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:54:17.705: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:54:27.712: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:54:37.707: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:54:47.698: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:54:57.705: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:55:07.711: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:55:17.699: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:55:27.702: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:55:37.704: INFO: Waiting for Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" to detach from "kubernetes-node5".
Oct 24 11:55:47.703: INFO: Volume "[vsanDatastore] 165fee59-08d0-b42e-e5c4-020047ab7bb1/kubernetes-dynamic-pvc-ee347cf2-b8eb-11e7-8558-005056a2ed7b.vmdk" has successfully detached from "kubernetes-node5".
STEP: Power on the previous node: kubernetes-node5
Oct 24 11:55:49.168: INFO: Deleting PersistentVolumeClaim "pvc-zxz56"
[AfterEach] [sig-storage] Node Poweroff [Feature:vsphere] [Slow] [Disruptive]
  /root/divyenp/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:134
Oct 24 11:55:49.253: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-node-poweroff-l245b" for this suite.
Oct 24 11:55:57.630: INFO: namespace: e2e-tests-node-poweroff-l245b, resource: bindings, ignored listing per whitelist
Oct 24 11:55:57.643: INFO: namespace e2e-tests-node-poweroff-l245b deletion completed in 8.379395732s
 
• [SLOW TEST:446.758 seconds]
[sig-storage] Node Poweroff [Feature:vsphere] [Slow] [Disruptive]
/root/divyenp/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/storage/framework.go:22
  verify volume status after node power off
  /root/divyenp/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/storage/vsphere_volume_node_poweroff.go:149
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSOct 24 11:55:57.647: INFO: Running AfterSuite actions on all node
Oct 24 11:55:57.647: INFO: Running AfterSuite actions on node 1
 
Ran 1 of 717 Specs in 446.969 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 716 Skipped PASS
 
Ginkgo ran 1 suite in 7m27.797177022s
Test Suite Passed
2017/10/24 11:55:57 util.go:156: Step './hack/ginkgo-e2e.sh --ginkgo.focus=Node\sPoweroff' finished in 7m28.760818768s
2017/10/24 11:55:57 e2e.go:81: Done

VMware Reviewers: @divyenpatel @pshahzeb

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 24, 2017
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 24, 2017
@shaominchen
Copy link
Author

Signed cncf-cla.

@bowei
Copy link
Member

bowei commented Oct 24, 2017

/assign @saad-ali this looks to be storage related
/unassign

@k8s-ci-robot k8s-ci-robot assigned saad-ali and unassigned bowei Oct 24, 2017
@divyenpatel
Copy link
Member

@jeffvance Can you help review this PR?

@dims
Copy link
Member

dims commented Oct 26, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 26, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 26, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 26, 2017
@shaominchen
Copy link
Author

@deads2k @saad-ali Can you help review this PR? Thanks!

if err != nil {
return nil, fmt.Errorf("deployment %q Create API error: %v", deploymentSpec.Name, err)
}
glog.Infof(fmt.Sprintf("Waiting deployment %q to complete", deploymentSpec.Name))
Copy link
Contributor

@jeffvance jeffvance Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to be using this vs. framework.Logf, which is the typical e2e log func?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - we should use framework.Logf here for consistency. Will fix.


if currentState == expectedState {
if expectedState == volumeStateDetached {
framework.Logf("Volume %q has successfully detached from %q.", volumePath, nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A state map could simplify this and below. Eg:

var attachedState = map[bool]int{
            true: volumeStateAttached,
            false: volumeStateDetached,
}
var attachedStateMsgs = map[int]string{
            volumeStateAttached: "attached to",
            volumeStateDetached : "detached from",
}
...
// in wait.Poll func...
...
       currentState := attachedState[diskAttached]
       if currentState == expectedState {
             framework.Logf("Volume %q has successfully %s %q", volumePath, attachedStateMsgs[currentState], nodeName)
             return true, nil
       }
      framework.Logf("Waiting for Volume %q to %s %q.", volumePath, attachedStateMsgs[currentState], nodeName)
      return false, nil
})

return fmt.Errorf("Gave up waiting for Volume %q to detach from %q after %v", volumePath, nodeName, detachTimeout)

if currentState != expectedState {
if expectedState == volumeStateDetached {
Copy link
Contributor

@jeffvance jeffvance Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuing...

if currentState != expectedState {
      err = fmt.Errorf("Gave up waiting for Volume %q to %s %q after %v", volumePath, attachedStateMsgs[expectedState], nodeName, timeout)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/wait"
clientset "k8s.io/client-go/kubernetes"
vsphere "k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is "vsphere" alias needed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

vsp, err = vsphere.GetVSphere()
Expect(err).NotTo(HaveOccurred())
workingDir = os.Getenv("VSPHERE_WORKING_DIR")
Expect(workingDir).NotTo(BeEmpty())
Copy link
Contributor

@jeffvance jeffvance Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any sanity check on the value of VSPHERE_WORKING_DIR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's nice to add some sanity check, but it's not really straightforward to verify the validity of this value. So I'd prefer to keep it as is for now.


By("Creating a Deployment")
deployment, err := framework.CreateDeployment(client, int32(1), map[string]string{"test": "app"}, namespace, pvclaims, "")
defer client.Extensions().Deployments(namespace).Delete(deployment.Name, &metav1.DeleteOptions{})
Copy link
Contributor

@jeffvance jeffvance Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see all the defers

f := find.NewFinder(govMoMiClient.Client, true)
ctx, _ := context.WithCancel(context.Background())

vmPath := workingDir + string(node1)
Copy link
Contributor

@jeffvance jeffvance Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if supplied workingDir does not end in a "/"? You may want to use filepath.Join here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will fix.

})

if err != nil {
framework.Logf("Pod did not fail over from %q with error: %v", oldNode, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the timeout value logged too? Can be useful in debugging in case 3m is later deemed too short...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a log for timeout.

node2, err := waitForPodToFailover(client, deployment, node1)
Expect(err).NotTo(HaveOccurred(), "Pod did not fail over to a different node")

By(fmt.Sprintf("Waiting for disk to be attached to the new node: %v", node2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wait for attached before waiting for detach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disk will be attached to new node almost immediately, but it takes a few minutes to detach from the previous node. The attach doesn't really depend on detach here.

Expect(err).NotTo(HaveOccurred(), "Disk is not attached to the node")

By(fmt.Sprintf("Waiting for disk to be detached from the previous node: %v", node1))
err = waitForVSphereDiskToDetach(vsp, volumePath, node1)
Copy link
Contributor

@jeffvance jeffvance Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works ok even when node1 has powered off?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this check doesn't communicate with the node which has been powered off.

Expect(err).NotTo(HaveOccurred(), "Disk is not detached from the node")

By(fmt.Sprintf("Power on the previous node: %v", node1))
vm.PowerOn(ctx)
Copy link
Contributor

@jeffvance jeffvance Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this interfere in any way with the defer'd poweron call which will be unconditionally executed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't happen. If this operation gets executed, it means all test steps have passed smoothly. Then the defer'd poweron call will be just a no-op which will return successfully.

We add this step just to make this test case more robust - make sure the node gets powered on before running other tests.

@shaominchen
Copy link
Author

/retest

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2017
@deads2k
Copy link
Contributor

deads2k commented Nov 9, 2017

/unassign

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2017
@shaominchen
Copy link
Author

@jeffvance Can I get a ship-it for this PR?

@jeffvance
Copy link
Contributor

@shaominchen It looks like you need the CLA from the linux foundation.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2017
@shaominchen
Copy link
Author

/assign @saad-ali

Hi Saad, I have already signed up for the CLA from the linux foundation. I have no idea why it still shows "cncf-cla: no" label here. Could you let me know what I can do on my side?

@shaominchen
Copy link
Author

/test pull-kubernetes-node-e2e

@saad-ali
Copy link
Member

/approve

@saad-ali
Copy link
Member

Looks like you authored the commits in this PR with the email address shchen@shchen-m02.vmware.com which is different than the email address you signed the CLA with. Please update the commits to match the email address you signed the CLA for.

@saad-ali
Copy link
Member

/approve no-issue

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2017
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 14, 2017
@shaominchen
Copy link
Author

@saad-ali Thanks for pointing it out! I've set up my git user.email properly and re-pushed the commit (also squashed 2 commits into 1).

@jeffvance
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jeffvance, saad-ali, shaominchen

Associated issue requirement bypassed by: saad-ali

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@krzyzacy
Copy link
Member

/retest

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit ea66c00 into kubernetes:master Nov 14, 2017
@shaominchen shaominchen deleted the node_poweroff_test branch November 14, 2017 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2E Test - Node power off test
10 participants