Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.10] etcd fails to start after redeploy_certificates - cluster broken #10289

Closed
morsik opened this issue Oct 2, 2018 · 2 comments
Closed

Comments

@morsik
Copy link

morsik commented Oct 2, 2018

Description

etcd fails to start after running redeploy_certificates because of possibly broken task:

TASK [etcd : Unarchive cert tarball] *******************************************************************************************************************
changed: [etcd1] => {"changed": true, "dest": "/etc/etcd", "extract_results": {"cmd": ["/usr/bin/gtar", "--extract", "-C", "/etc/etcd", "-z", "-f", "/root/.ansible/tmp/ansible-tmp-1538472935.26-171855511450945/source"], "err": "", "out": "", "rc": 0}, "failed": false, "gid": 0, "group": "root", "handler": "TgzArchive", "mode": "0700", "owner": "root", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1538472935.26-171855511450945/source", "state": "directory", "uid": 0}
changed: [etcd2] => {"changed": true, "dest": "/etc/etcd", "extract_results": {"cmd": ["/usr/bin/gtar", "--extract", "-C", "/etc/etcd", "-z", "-f", "/root/.ansible/tmp/ansible-tmp-1538472935.27-272334517936876/source"], "err": "", "out": "", "rc": 0}, "failed": false, "gid": 0, "group": "root", "handler": "TgzArchive", "mode": "0700", "owner": "root", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1538472935.27-272334517936876/source", "state": "directory", "uid": 0}
changed: [etcd3] => {"changed": true, "dest": "/etc/etcd", "extract_results": {"cmd": ["/usr/bin/gtar", "--extract", "-C", "/etc/etcd", "-z", "-f", "/root/.ansible/tmp/ansible-tmp-1538472935.28-199690342727042/source"], "err": "", "out": "", "rc": 0}, "failed": false, "gid": 0, "group": "root", "handler": "TgzArchive", "mode": "0700", "owner": "root", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1538472935.28-199690342727042/source", "state": "directory", "uid": 0}

etcd error is:

rejected connection from "10.209.18.41:40906" (error "open /etc/etcd/peer.crt: permission denied", ServerName "")

Changing /etc/etcd back to etcd:etcd solved issue, but only in means to start etcd cluster - which actually can't start again anyway cause redeploy_certificates failed in the middle.

Version
ansible 2.4.4.0
  config file = /opt/openshift-ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jul 13 2018, 13:06:57) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
openshift-ansible-3.10.51-1-23-g48a823a
Steps To Reproduce
  1. Install cluster. Cluster works.
  2. Run redeploy_certificate. Cluster no longer works cause Ansible tried to restart etcd but failed with broken permissions.
Expected Results

redeploy_certificates should run correctly and not destroy entire cluster...

@sdodson
Copy link
Member

sdodson commented Jan 10, 2019

/close
fixed by #10943

@openshift-ci-robot
Copy link

@sdodson: Closing this issue.

In response to this:

/close
fixed by #10943

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

be4ndr added a commit to be4ndr/openshift-ansible that referenced this issue Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants