TASK [openshift_service_catalog : wait for api server to be ready] failed after 120 attempts #7101

packagewjx · 2018-02-12T02:05:43Z

Description

I am trying to run playbooks/byo/config.yml from release 3.7 branch to install openshift on my all-in-one server. It can't pass the TASK [openshift_service_catalog : wait for api server to be ready].

Version

Your ansible version per ansible --version

[root@openshift-master ~]# ansible --version
ansible 2.4.2.0
  config file = /root/openshift-ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

The output of git describe

[root@openshift-master openshift-ansible]# git describe 
openshift-ansible-3.7.29-1-11-g6d8538a

Steps To Reproduce

ansible-playbook openshift-ansible/playbooks/byo/config.yml

Expected Results

Playbook config.yml successfully executed?

Observed Results

The playbook failed at TASK [openshift_service_catalog : wait for api server to be ready] as below.

TASK [openshift_service_catalog : wait for api server to be ready] **************************************
Sunday 11 February 2018  23:25:49 +0800 (0:00:00.709)       0:17:10.818 ******* 
FAILED - RETRYING: wait for api server to be ready (120 retries left).
FAILED - RETRYING: wait for api server to be ready (119 retries left).
...
FAILED - RETRYING: wait for api server to be ready (1 retries left).
fatal: [116.56.140.108]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "-k", "https://apiserver.kube-service-catalog.svc/healthz"], "delta": "0:00:00.085110", "end": "2018-02-11 23:28:37.345533", "rc": 0, "start": "2018-02-11 23:28:37.260423", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100   180  100   180    0     0   2289      0 --:--:-- --:--:-- --:--:--  2307", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "100   180  100   180    0     0   2289      0 --:--:-- --:--:-- --:--:--  2307"], "stdout": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed", "stdout_lines": ["[+]ping ok", "[+]poststarthook/generic-apiserver-start-informers ok", "[+]poststarthook/start-service-catalog-apiserver-informers ok", "[-]etcd failed: reason withheld", "healthz check failed"]}

PLAY RECAP **********************************************************************************************
116.56.140.108             : ok=516  changed=80   unreachable=0    failed=1   
localhost                  : ok=12   changed=0    unreachable=0    failed=0   


INSTALLER STATUS ****************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : In Progress
	This phase can be restarted by running: playbooks/byo/openshift-cluster/service-catalog.yml

Sunday 11 February 2018  23:28:37 +0800 (0:02:48.105)       0:19:58.923 ******* 
=============================================================================== 
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ---------------- 606.81s
openshift_service_catalog : wait for api server to be ready ------------------------------------ 168.11s
Run health checks (install) - EL ---------------------------------------------------------------- 94.17s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------- 16.39s
openshift_service_catalog : oc_process ----------------------------------------------------------- 8.17s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.81s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.49s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.35s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.33s
openshift_excluder : Install docker excluder - yum ----------------------------------------------- 7.04s
openshift_master : restart master api ------------------------------------------------------------ 4.17s
restart master api ------------------------------------------------------------------------------- 4.13s
openshift_docker_facts : Set docker facts -------------------------------------------------------- 3.89s
openshift_manageiq : Configure role/user permissions --------------------------------------------- 3.71s
openshift_docker_facts : Set docker facts -------------------------------------------------------- 3.27s
openshift_docker_facts : Set docker facts -------------------------------------------------------- 2.92s
openshift_node_facts : Set node facts ------------------------------------------------------------ 2.88s
openshift_master : Update journald setup --------------------------------------------------------- 2.17s
openshift_hosted : Configure a passthrough route for docker-registry ----------------------------- 2.10s
Gather Cluster facts and set is_containerized if needed ------------------------------------------ 1.98s


Failure summary:


  1. Hosts:    116.56.140.108
     Play:     Service Catalog
     Task:     wait for api server to be ready
     Message:  Failed without returning a message.

For long output or logs, consider using a gist

Additional Information

OS version:

[root@openshift-master openshift-ansible]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

Your inventory file (especially any non-standard configuration parameters)

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=root

openshift_deployment_type=origin
openshift_public_hostname=116.56.140.108
openshift_public_ip=116.56.140.108
os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant

[masters]
116.56.140.108

[etcd]
116.56.140.108

[nodes]
116.56.140.108

Other Info

Executing curl -k "https://apiserver.kube-service-catalog.svc/healthz" get the following result

[root@openshift-master ~]# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed

I checked the health status of my etcd service, the result is

[root@openshift-master ~]# etcdctl -C https://116.56.140.108:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt \
--cert-file=/etc/origin/master/master.etcd-client.crt \
--key-file=/etc/origin/master/master.etcd-client.key cluster-health
member 63b3e6b3f1c410d5 is healthy: got healthy result from https://116.56.140.108:2379
cluster is healthy

And I tried to set and get a value from etcd. It was fine.

I don't know if it is helpful to check etcd state inside the service catalog docker container. But here's the result. It seems that there is no problem for service catalog container visiting the etcd service.

[root@openshift-master ~]# docker exec k8s_apiserver_apiserver-76zv8_kube-service-catalog_4ce365ee-0f96-11e8-be29-000af7b00488_0 curl -k https://116.56.140.108:2379/health --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    18  100    18    0     0    168      0 --:--:-- --:--:-- --:--:--   168
{"health": "true"}

Inside the container, when executing curl -k https://apiserver.kube-service-catalog.svc/healthz

I0212 01:57:26.336985       1 handler.go:160] service-catalog-apiserver: GET "/healthz" satisfied by nonGoRestful
I0212 01:57:26.337009       1 pathrecorder.go:240] service-catalog-apiserver: "/healthz" satisfied by exact match
I0212 01:57:26.337023       1 run_server.go:136] etcd checker called
E0212 01:57:26.338420       1 run_server.go:145] etcd failed to reach any server
I0212 01:57:26.338432       1 healthz.go:112] healthz check etcd failed: etcd failed to reach any server
I0212 01:57:26.338571       1 wrap.go:42] GET /healthz: (1.699786ms) 500

And here is the etcd log. It keeps complaining about Unavailable.

Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Canceled desc = context canceled")

Thanks for your help in advance.

The text was updated successfully, but these errors were encountered:

maximmold · 2018-03-04T13:25:21Z

I am seeing the same error. Any ideas?

packagewjx · 2018-03-04T14:52:13Z

@maximmold I successfully installed the service catalog. I added these lines in the ansible hosts file.

openshift_service_catalog_image_prefix=openshift/origin-
openshift_service_catalog_image_version=latest

openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/opt/osev3-etcd
openshift_hosted_etcd_storage_volume_name=etcd-vol2
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}

ansible_service_broker_image_prefix=openshift/
ansible_service_broker_registry_url="registry.access.redhat.com"

It seems that the etcd used by service catalog can not get a persistent storage, so it failed to start.

packagewjx · 2018-03-04T14:53:14Z

Since I have solved this problem, I am going to close this issue.

packagewjx closed this as completed Mar 4, 2018

bhushangholave mentioned this issue Jan 2, 2019

openshift installation hung up on openshift_service_catalog install (OKD 3.11) - Wait for API Server rollout success #10935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TASK [openshift_service_catalog : wait for api server to be ready] failed after 120 attempts #7101

TASK [openshift_service_catalog : wait for api server to be ready] failed after 120 attempts #7101

packagewjx commented Feb 12, 2018

maximmold commented Mar 4, 2018

packagewjx commented Mar 4, 2018

packagewjx commented Mar 4, 2018

TASK [openshift_service_catalog : wait for api server to be ready] failed after 120 attempts #7101

TASK [openshift_service_catalog : wait for api server to be ready] failed after 120 attempts #7101

Comments

packagewjx commented Feb 12, 2018

Description

Version

Steps To Reproduce

Expected Results

Observed Results

Additional Information

maximmold commented Mar 4, 2018

packagewjx commented Mar 4, 2018

packagewjx commented Mar 4, 2018