Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TASK [openshift_service_catalog : wait for api server to be ready] failed after 120 attempts #7101

Closed
packagewjx opened this issue Feb 12, 2018 · 3 comments

Comments

@packagewjx
Copy link

Description

I am trying to run playbooks/byo/config.yml from release 3.7 branch to install openshift on my all-in-one server. It can't pass the TASK [openshift_service_catalog : wait for api server to be ready].

Version
  • Your ansible version per ansible --version
[root@openshift-master ~]# ansible --version
ansible 2.4.2.0
  config file = /root/openshift-ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
  • The output of git describe
[root@openshift-master openshift-ansible]# git describe 
openshift-ansible-3.7.29-1-11-g6d8538a
Steps To Reproduce
  1. ansible-playbook openshift-ansible/playbooks/byo/config.yml
Expected Results

Playbook config.yml successfully executed?

Observed Results

The playbook failed at TASK [openshift_service_catalog : wait for api server to be ready] as below.

TASK [openshift_service_catalog : wait for api server to be ready] **************************************
Sunday 11 February 2018  23:25:49 +0800 (0:00:00.709)       0:17:10.818 ******* 
FAILED - RETRYING: wait for api server to be ready (120 retries left).
FAILED - RETRYING: wait for api server to be ready (119 retries left).
...
FAILED - RETRYING: wait for api server to be ready (1 retries left).
fatal: [116.56.140.108]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "-k", "https://apiserver.kube-service-catalog.svc/healthz"], "delta": "0:00:00.085110", "end": "2018-02-11 23:28:37.345533", "rc": 0, "start": "2018-02-11 23:28:37.260423", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100   180  100   180    0     0   2289      0 --:--:-- --:--:-- --:--:--  2307", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "100   180  100   180    0     0   2289      0 --:--:-- --:--:-- --:--:--  2307"], "stdout": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed", "stdout_lines": ["[+]ping ok", "[+]poststarthook/generic-apiserver-start-informers ok", "[+]poststarthook/start-service-catalog-apiserver-informers ok", "[-]etcd failed: reason withheld", "healthz check failed"]}

PLAY RECAP **********************************************************************************************
116.56.140.108             : ok=516  changed=80   unreachable=0    failed=1   
localhost                  : ok=12   changed=0    unreachable=0    failed=0   


INSTALLER STATUS ****************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : In Progress
	This phase can be restarted by running: playbooks/byo/openshift-cluster/service-catalog.yml

Sunday 11 February 2018  23:28:37 +0800 (0:02:48.105)       0:19:58.923 ******* 
=============================================================================== 
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ---------------- 606.81s
openshift_service_catalog : wait for api server to be ready ------------------------------------ 168.11s
Run health checks (install) - EL ---------------------------------------------------------------- 94.17s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------- 16.39s
openshift_service_catalog : oc_process ----------------------------------------------------------- 8.17s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.81s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.49s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.35s
openshift_hosted_facts : Set hosted facts -------------------------------------------------------- 7.33s
openshift_excluder : Install docker excluder - yum ----------------------------------------------- 7.04s
openshift_master : restart master api ------------------------------------------------------------ 4.17s
restart master api ------------------------------------------------------------------------------- 4.13s
openshift_docker_facts : Set docker facts -------------------------------------------------------- 3.89s
openshift_manageiq : Configure role/user permissions --------------------------------------------- 3.71s
openshift_docker_facts : Set docker facts -------------------------------------------------------- 3.27s
openshift_docker_facts : Set docker facts -------------------------------------------------------- 2.92s
openshift_node_facts : Set node facts ------------------------------------------------------------ 2.88s
openshift_master : Update journald setup --------------------------------------------------------- 2.17s
openshift_hosted : Configure a passthrough route for docker-registry ----------------------------- 2.10s
Gather Cluster facts and set is_containerized if needed ------------------------------------------ 1.98s


Failure summary:


  1. Hosts:    116.56.140.108
     Play:     Service Catalog
     Task:     wait for api server to be ready
     Message:  Failed without returning a message.

For long output or logs, consider using a gist

Additional Information
  • OS version:
[root@openshift-master openshift-ansible]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core) 
  • Your inventory file (especially any non-standard configuration parameters)
[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=root

openshift_deployment_type=origin
openshift_public_hostname=116.56.140.108
openshift_public_ip=116.56.140.108
os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant

[masters]
116.56.140.108

[etcd]
116.56.140.108

[nodes]
116.56.140.108
  • Other Info

Executing curl -k "https://apiserver.kube-service-catalog.svc/healthz" get the following result

[root@openshift-master ~]# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed

I checked the health status of my etcd service, the result is

[root@openshift-master ~]# etcdctl -C https://116.56.140.108:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt \
--cert-file=/etc/origin/master/master.etcd-client.crt \
--key-file=/etc/origin/master/master.etcd-client.key cluster-health
member 63b3e6b3f1c410d5 is healthy: got healthy result from https://116.56.140.108:2379
cluster is healthy

And I tried to set and get a value from etcd. It was fine.

I don't know if it is helpful to check etcd state inside the service catalog docker container. But here's the result. It seems that there is no problem for service catalog container visiting the etcd service.

[root@openshift-master ~]# docker exec k8s_apiserver_apiserver-76zv8_kube-service-catalog_4ce365ee-0f96-11e8-be29-000af7b00488_0 curl -k https://116.56.140.108:2379/health --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    18  100    18    0     0    168      0 --:--:-- --:--:-- --:--:--   168
{"health": "true"}

Inside the container, when executing curl -k https://apiserver.kube-service-catalog.svc/healthz

I0212 01:57:26.336985       1 handler.go:160] service-catalog-apiserver: GET "/healthz" satisfied by nonGoRestful
I0212 01:57:26.337009       1 pathrecorder.go:240] service-catalog-apiserver: "/healthz" satisfied by exact match
I0212 01:57:26.337023       1 run_server.go:136] etcd checker called
E0212 01:57:26.338420       1 run_server.go:145] etcd failed to reach any server
I0212 01:57:26.338432       1 healthz.go:112] healthz check etcd failed: etcd failed to reach any server
I0212 01:57:26.338571       1 wrap.go:42] GET /healthz: (1.699786ms) 500

And here is the etcd log. It keeps complaining about Unavailable.

Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 1; CANCEL")
Feb 12 10:04:00 openshift-master etcd[101639]: failed to receive watch request from gRPC stream ("rpc error: code = Canceled desc = context canceled")

Thanks for your help in advance.

@maximmold
Copy link

I am seeing the same error. Any ideas?

@packagewjx
Copy link
Author

@maximmold I successfully installed the service catalog. I added these lines in the ansible hosts file.

openshift_service_catalog_image_prefix=openshift/origin-
openshift_service_catalog_image_version=latest

openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/opt/osev3-etcd
openshift_hosted_etcd_storage_volume_name=etcd-vol2
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}

ansible_service_broker_image_prefix=openshift/
ansible_service_broker_registry_url="registry.access.redhat.com"

It seems that the etcd used by service catalog can not get a persistent storage, so it failed to start.

@packagewjx
Copy link
Author

Since I have solved this problem, I am going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants