hook failed: "etcd-relation-changed" for etcd:db #17

Closed
arosales opened this Issue Jun 1, 2016 · 7 comments

Comments

Projects
None yet
3 participants
Contributor

arosales commented Jun 1, 2016

When I deploy the latest bundle via:
$ juju deploy observable-kubernetes

on the following client

$ juju version 
2.0-beta7-xenial-amd64
$ uname -a
Linux x230 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"

the relation between ectd and kubernetes fails:

[Units]      
ID           WORKLOAD-STATUS JUJU-STATUS VERSION   MACHINE PORTS PUBLIC-ADDRESS MESSAGE                                                    
kubernetes/0 error           idle        2.0-beta7 6             54.149.249.246 hook failed: "etcd-relation-joined" for etcd:db            
  filebeat/4 unknown         allocating  2.0-beta7               54.149.249.246 Waiting for agent initialization to finish                 
  topbeat/4  blocked         executing   2.0-beta7               54.149.249.246 (start) Waiting on relationship: elasticsearch or logstash 
kubernetes/1 error           idle        2.0-beta7 7             54.187.231.166 hook failed: "etcd-relation-changed" for etcd:db           
  filebeat/3 active          idle        2.0-beta7               54.187.231.166 Filebeat ready                                             
  topbeat/3  active          idle        2.0-beta7               54.187.231.166 Topbeat ready                                              
kubernetes/2 error           idle        2.0-beta7 8             54.187.80.70   hook failed: "etcd-relation-joined" for etcd:db            
  filebeat/5 unknown         allocating  2.0-beta7               54.187.80.70   Waiting for agent initialization to finish                 
  topbeat/5  maintenance     executing   2.0-beta7               54.187.80.70   (install) Updating apt cache  

the unit logs have an error on etcd-relation-changed:

2016-06-01 22:36:06 INFO etcd-relation-changed + docker -H unix:///var/run/docker-bootstrap.sock run --net=host --rm gcr.io/google_containers/etcd:2.0.12 etcdctl -C http://172.31.12.168:4001,http://172.31.42.154:4001,http://172.31.25.164:4001 set /coreos.com/network/config '{ "Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}'
2016-06-01 22:36:14 INFO etcd-relation-changed Error:  501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
2016-06-01 22:36:14 INFO etcd-relation-changed Traceback (most recent call last):
2016-06-01 22:36:14 INFO etcd-relation-changed   File "/var/lib/juju/agents/unit-kubernetes-0/charm/hooks/etcd-relation-changed", line 19, in <module>
2016-06-01 22:36:14 INFO etcd-relation-changed     main()
2016-06-01 22:36:14 INFO etcd-relation-changed   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/__init__.py", line 73, in main
2016-06-01 22:36:14 INFO etcd-relation-changed     bus.dispatch()
2016-06-01 22:36:14 INFO etcd-relation-changed   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 421, in dispatch
2016-06-01 22:36:14 INFO etcd-relation-changed     _invoke(other_handlers)
2016-06-01 22:36:14 INFO etcd-relation-changed   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 404, in _invoke
2016-06-01 22:36:14 INFO etcd-relation-changed     handler.invoke()
2016-06-01 22:36:14 INFO etcd-relation-changed   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 280, in invoke
2016-06-01 22:36:14 INFO etcd-relation-changed     self._action(*args)
2016-06-01 22:36:14 INFO etcd-relation-changed   File "/var/lib/juju/agents/unit-kubernetes-0/charm/reactive/flannel.py", line 43, in run_bootstrap_daemons
2016-06-01 22:36:14 INFO etcd-relation-changed     check_call(split(cmd))
2016-06-01 22:36:14 INFO etcd-relation-changed   File "/usr/lib/python3.4/subprocess.py", line 561, in check_call
2016-06-01 22:36:14 INFO etcd-relation-changed     raise CalledProcessError(retcode, cmd)
2016-06-01 22:36:14 INFO etcd-relation-changed subprocess.CalledProcessError: Command '['scripts/bootstrap_docker.sh', 'http://172.31.12.168:4001,http://172.31.42.154:4001,http://172.31.25.164:4001']' returned non-zero exit status 4
2016-06-01 22:36:14 ERROR juju.worker.uniter.operation runhook.go:107 hook "etcd-relation-changed" failed: exit status 1
2016-06-01 22:36:14 INFO juju.worker.uniter resolver.go:107 awaiting error resolution for "relation-changed" hook

I saw this on two different attempts to deploy the bundle to AWS us-west-2

Additional logs attached.

unit-kubernetes-0.log.txt
observable-k8-juju-debug-log.txt

Collaborator

chuckbutler commented Jun 2, 2016

Seems related to juju-solutions/layer-etcd#16

@chuckbutler chuckbutler added the bug label Jun 2, 2016

@chuckbutler chuckbutler self-assigned this Jun 2, 2016

Contributor

battlemidget commented Jun 7, 2016

Hitting this as well:

http://paste.ubuntu.com/17088139/

Collaborator

chuckbutler commented Jun 7, 2016

We're close to having a stable fix for this. The quorem failure was largely in part to missing critical information during static-node turn up that was emitted from the leader. We're now capturing this and it has eliminated turn up errors.

We're putting the finishing touches on the departing logic to properly unregister a unit as its terminated using this new quorem routine.

A proper fix for this will ship with the 0.2.0 release of layer-etcd

Collaborator

chuckbutler commented Jun 7, 2016

Upon further communication with mbruzek, this will be a 1.0.0 release, as it breaks backwords compatibility with non-tls terminated connections.

Contributor

battlemidget commented Jun 7, 2016

Just fyi we're doing a major conjure-up release this week and I planned on including this bundle as part of the initial release notes blog

Collaborator

chuckbutler commented Jun 16, 2016

Ok so we're nearly ready for a release pending some MP's landing. However @battlemidget - cs:~lazypower/bundle/kubernetes-core will get you a good feel for where we're headed. Expect this release to happen either by EOW or early next week - I'm having some trouble landing open PR's that built the assembled bundle above. That's the only blocker to calling this done at the moment.

Contributor

battlemidget commented Jun 16, 2016

@chuckbutler nice will give it a go this week

@AdamIsrael AdamIsrael closed this in #22 Jun 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment