Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller-manager/scheduler/rancher-kubernetes-agent containers are not able to start successfully in a different host when the host on which they are running is powered down. #4354

Closed
sangeethah opened this issue Apr 9, 2016 · 3 comments
Assignees
Labels
area/kubernetes kind/bug Issues that are defects reported by users or that we know have reached a real release

Comments

@sangeethah
Copy link
Contributor

Rancher Version: v1.0.1-rc1

Create a "kubernetes" environment.
Added 2 Digitial Ocean machine hosts.

When Kubernetes stack gets launched on the hosts successfully , added few services.

Powered down the host where Controller-manager/scheduler/rancher-kubernetes-agent containers where running.

These containers were marked as "unhealthy" and attempt is made to start them in the existing host. But the containers are stuck in "starting" state forever (containers keep restarting)

| 1803 | Kubernetes_controller-manager_1                                | starting | updating-healthy | 2016-04-08 23:39:04 | NULL                |          11 |
| 1804 | Kubernetes_scheduler_1                                         | starting | updating-healthy | 2016-04-08 23:39:19 | NULL                |           1 |
| 1805 | Kubernetes_rancher-kubernetes-agent_1                          | starting | updating-healthy | 2016-04-08 23:39:19 | NULL                |           2 |
+------+----------------------------------------------------------------+----------+------------------+---------------------+---------------------+-------------+

Controlller-manager logs:


4/8/2016 4:39:08 PMI0408 23:39:08.186267       1 replication_controller.go:208] Starting RC Manager
4/8/2016 4:39:08 PMI0408 23:39:08.464352       1 nodecontroller.go:143] Sending events to api server.
4/8/2016 4:39:08 PMI0408 23:39:08.470031       1 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
4/8/2016 4:39:38 PME0408 23:39:38.474858       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:39:38 PME0408 23:39:38.475857       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:40:09 PME0408 23:40:09.482658       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:40:13 PME0408 23:40:13.495145       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:40:39 PME0408 23:40:39.502791       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:40:39 PMF0408 23:40:39.503422       1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
4/8/2016 4:41:06 PMI0408 23:41:06.940177       1 replication_controller.go:208] Starting RC Manager
4/8/2016 4:41:07 PMI0408 23:41:07.204863       1 nodecontroller.go:143] Sending events to api server.
4/8/2016 4:41:07 PMI0408 23:41:07.211202       1 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
4/8/2016 4:41:37 PME0408 23:41:37.217587       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:41:37 PME0408 23:41:37.221275       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:42:08 PME0408 23:42:08.242383       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:42:12 PME0408 23:42:12.220978       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:42:38 PME0408 23:42:38.261289       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:42:38 PMF0408 23:42:38.261476       1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
4/8/2016 4:42:39 PMI0408 23:42:39.902177       1 replication_controller.go:208] Starting RC Manager
4/8/2016 4:42:40 PMI0408 23:42:40.190930       1 nodecontroller.go:143] Sending events to api server.
4/8/2016 4:42:40 PMI0408 23:42:40.192213       1 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
4/8/2016 4:43:10 PME0408 23:43:10.196704       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:43:10 PME0408 23:43:10.197343       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:43:41 PME0408 23:43:41.224404       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:43:45 PME0408 23:43:45.200446       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:44:11 PME0408 23:44:11.243485       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:44:11 PMF0408 23:44:11.243620       1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
4/8/2016 4:44:37 PMI0408 23:44:37.586751       1 replication_controller.go:208] Starting RC Manager
4/8/2016 4:44:38 PMI0408 23:44:38.002178       1 nodecontroller.go:143] Sending events to api server.
4/8/2016 4:44:38 PMI0408 23:44:38.008582       1 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
4/8/2016 4:45:08 PME0408 23:45:08.015835       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:45:08 PME0408 23:45:08.016570       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:45:39 PME0408 23:45:39.038441       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:45:43 PME0408 23:45:43.019459       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:46:09 PME0408 23:46:09.057808       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:46:09 PMF0408 23:46:09.057945       1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
4/8/2016 4:46:10 PMI0408 23:46:10.938089       1 replication_controller.go:208] Starting RC Manager
4/8/2016 4:46:11 PMI0408 23:46:11.221720       1 nodecontroller.go:143] Sending events to api server.
4/8/2016 4:46:11 PMI0408 23:46:11.223016       1 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
4/8/2016 4:46:41 PME0408 23:46:41.227634       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:46:41 PME0408 23:46:41.228253       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:47:12 PME0408 23:47:12.249490       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:47:16 PME0408 23:47:16.231779       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:47:42 PME0408 23:47:42.268763       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:47:42 PMF0408 23:47:42.268908       1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
4/8/2016 4:48:22 PMI0408 23:48:22.250512       1 replication_controller.go:208] Starting RC Manager
4/8/2016 4:48:22 PMI0408 23:48:22.515237       1 nodecontroller.go:143] Sending events to api server.
4/8/2016 4:48:22 PMI0408 23:48:22.516536       1 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
4/8/2016 4:48:52 PME0408 23:48:52.521146       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:48:52 PME0408 23:48:52.521874       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:49:23 PME0408 23:49:23.544650       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:49:27 PME0408 23:49:27.524645       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:49:53 PME0408 23:49:53.566534       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:49:53 PMF0408 23:49:53.566721       1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
4/8/2016 4:49:55 PMI0408 23:49:55.570340       1 replication_controller.go:208] Starting RC Manager
4/8/2016 4:49:55 PMI0408 23:49:55.984078       1 nodecontroller.go:143] Sending events to api server.
4/8/2016 4:49:55 PMI0408 23:49:55.985310       1 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
4/8/2016 4:50:25 PME0408 23:50:25.989695       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:50:25 PME0408 23:50:25.990808       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:50:57 PME0408 23:50:57.016277       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:51:00 PME0408 23:51:00.994348       1 nodecontroller.go:229] Error monitoring node status: Get http://master/api/v1/nodes: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:51:27 PME0408 23:51:27.034411       1 controllermanager.go:259] Failed to get api versions from server: Get http://master/api: dial tcp 10.42.10.122:80: i/o timeout
4/8/2016 4:51:27 PMF0408 23:51:27.034556       1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
@sangeethah sangeethah added kind/bug Issues that are defects reported by users or that we know have reached a real release area/kubernetes labels Apr 9, 2016
@sangeethah sangeethah modified the milestone: Release 1.1.0-dev1 Apr 9, 2016
@sangeethah
Copy link
Contributor Author

When I powered on the host that was down , Controller-manager/scheduler/rancher-kubernetes-agent containers were able to start successfully.

@alena1108
Copy link

Health check was disabled on k8s api service, therefore it never was recreated.

@sangeethah
Copy link
Contributor Author

Rancher Version: Build from master

Create a "kubernetes" environment.
Added 2 Digitial Ocean machine hosts.

When Kubernetes stack gets launched on the hosts successfully , added few services.

Powered down the host where Controller-manager/scheduler/rancher-kubernetes-agent containers where running.

Controller-manager/scheduler/rancher-kubernetes-agent containers are able to start successfully on another host.

Able to start new service/rc/pod after this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes kind/bug Issues that are defects reported by users or that we know have reached a real release
Projects
None yet
Development

No branches or pull requests

2 participants