Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
kube-apiserver 1.13.x refuses to work when first etcd-server is not available. #72102
How to reproduce the problem:
The kube-apiserver is then not able to connect to etcd any more.
kube-apiserver does not start.
If I upgrade etcd to version 3.3.10, it reports an error
I also experience this bug in an environment with a real etcd cluster.
apologies, just had another look and it's indeed an api-machinery issue.
we are passing the server list straight into etcd v3 client which return the error u reported. not sure if it's designed
Feb 1, 2019
I was able to repro this issue with the repro steps provided by @Cytrian. I also reproduced this issue with a real etcd cluster.
The problem seems to be that the etcd library uses the first node’s address as the
An important thing to highlight is that when the first etcd server goes down, it also takes the Kubernetes API servers down, because they fail to connect to the remaining etcd servers.
With that said, this all depends on what your etcd server certificates look like:
To reproduce the issue with a real etcd cluster:
API server crash log: https://gist.github.com/alexbrand/ba86f506e4278ed2ada4504ab44b525b
I was unable to reproduce this issue with API server v1.12.5 (n.b. this was somewhat of a non-scientific test => tested by updating the image field of the API server static pod produced by kubeadm v1.13.2)
@dims Sure. The earliest we can do is August 13, 2019.
Today, we do etcd 3.4 release code freeze, which means we will start running functional (failure injection) tests + kubemark to test new etcd client changes including etcd-io/etcd#10911.
If anything changes, we will post updates here.
Hi, is this the same issue:
The certificates are correct. I have this check with curl -v --cert apiserver-etcd-client.crt --key apiserver-etcd-client.key --cacert etcd/ca.crt ...
@gyuho a quick clarification, is August 13, 2019 the earliest day the fix going to land on 3.3? thx
Update: https://github.com/grpc/grpc-go/releases/tag/v1.23.0 is out. Bumping up gRPC etcd-io/etcd#11029 in etcd master branch, in addition to Go runtime upgrade https://groups.google.com/forum/#!topic/golang-announce/65QixT3tcmg. Once tests look good, we will start working on 3.3 backports.
@dims @jpbetz https://github.com/etcd-io/etcd/releases/tag/v3.3.14-beta.0 has been released with all the fixes. Please try. Once tests look good in the next few days, I will release
@liggitt that leaves people on 1.13-1.15 without proper H/A? I think this issue deserves to be fixed in the three supported releases of Kubernetes. The hotfix mentioned here looks simple enough to be added, but you are saying the proper fix will require much more. So this is all sorta obscure and confusing to the community, IMO.
Not that I am complaining, don't get me wrong, it's just I think everyone would welcome some clarity on this issue. Maybe document it somewhere and provide some workarounds for the people who are still on v1.13.-1.15? Because right now, bring down the first etcd member and oops, api is not working, cluster is not working.