[3.10] etcd fails to start after redeploy_certificates - cluster broken #10289

morsik · 2018-10-02T09:44:23Z

Description

etcd fails to start after running redeploy_certificates because of possibly broken task:

TASK [etcd : Unarchive cert tarball] *******************************************************************************************************************
changed: [etcd1] => {"changed": true, "dest": "/etc/etcd", "extract_results": {"cmd": ["/usr/bin/gtar", "--extract", "-C", "/etc/etcd", "-z", "-f", "/root/.ansible/tmp/ansible-tmp-1538472935.26-171855511450945/source"], "err": "", "out": "", "rc": 0}, "failed": false, "gid": 0, "group": "root", "handler": "TgzArchive", "mode": "0700", "owner": "root", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1538472935.26-171855511450945/source", "state": "directory", "uid": 0}
changed: [etcd2] => {"changed": true, "dest": "/etc/etcd", "extract_results": {"cmd": ["/usr/bin/gtar", "--extract", "-C", "/etc/etcd", "-z", "-f", "/root/.ansible/tmp/ansible-tmp-1538472935.27-272334517936876/source"], "err": "", "out": "", "rc": 0}, "failed": false, "gid": 0, "group": "root", "handler": "TgzArchive", "mode": "0700", "owner": "root", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1538472935.27-272334517936876/source", "state": "directory", "uid": 0}
changed: [etcd3] => {"changed": true, "dest": "/etc/etcd", "extract_results": {"cmd": ["/usr/bin/gtar", "--extract", "-C", "/etc/etcd", "-z", "-f", "/root/.ansible/tmp/ansible-tmp-1538472935.28-199690342727042/source"], "err": "", "out": "", "rc": 0}, "failed": false, "gid": 0, "group": "root", "handler": "TgzArchive", "mode": "0700", "owner": "root", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1538472935.28-199690342727042/source", "state": "directory", "uid": 0}

etcd error is:

rejected connection from "10.209.18.41:40906" (error "open /etc/etcd/peer.crt: permission denied", ServerName "")

Changing /etc/etcd back to etcd:etcd solved issue, but only in means to start etcd cluster - which actually can't start again anyway cause redeploy_certificates failed in the middle.

Version

ansible 2.4.4.0
  config file = /opt/openshift-ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jul 13 2018, 13:06:57) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

openshift-ansible-3.10.51-1-23-g48a823a

Steps To Reproduce

Install cluster. Cluster works.
Run redeploy_certificate. Cluster no longer works cause Ansible tried to restart etcd but failed with broken permissions.

Expected Results

redeploy_certificates should run correctly and not destroy entire cluster...

The text was updated successfully, but these errors were encountered:

sdodson · 2019-01-10T19:16:28Z

/close
fixed by #10943

openshift-ci-robot · 2019-01-10T19:16:29Z

@sdodson: Closing this issue.

In response to this:

/close
fixed by #10943

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fixes openshift#10289

morsik added a commit to morsik/openshift-ansible that referenced this issue Oct 2, 2018

Setup correct permissions for etcd configuration, fixes openshift#10289

a7f929e

morsik added a commit to morsik/openshift-ansible that referenced this issue Oct 2, 2018

Setup correct permissiosn for etcd config dir, fixes openshift#10289

677547b

morsik mentioned this issue Oct 2, 2018

Setup correct permissions for etcd config dir, fixes #10289 #10291

Closed

openshift-ci-robot closed this as completed Jan 10, 2019

be4ndr added a commit to be4ndr/openshift-ansible that referenced this issue Feb 15, 2019

Setup correct permissiosn for etcd config dir.

58aa472

fixes openshift#10289

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.10] etcd fails to start after redeploy_certificates - cluster broken #10289

[3.10] etcd fails to start after redeploy_certificates - cluster broken #10289

morsik commented Oct 2, 2018

sdodson commented Jan 10, 2019

openshift-ci-robot commented Jan 10, 2019

[3.10] etcd fails to start after redeploy_certificates - cluster broken #10289

[3.10] etcd fails to start after redeploy_certificates - cluster broken #10289

Comments

morsik commented Oct 2, 2018

Description

Version

Steps To Reproduce

Expected Results

sdodson commented Jan 10, 2019

openshift-ci-robot commented Jan 10, 2019