standby nodes in 0.5.0 #1416

vany-egorov · 2014-10-25T18:20:23Z

In etcd v0.4.6 standby node was able to act as peer node if one of peer nodes is dead. Standbys are not part of the Raft cluster themselves.
Standby node was able to replace dead peer node automaticly.
It was very usefull in case of big cluster size (>20 machines).
It was able to set cluster-active-size, cluster-remove-delay, cluster-sync-interval.

How to get a similar effect using proxy=on mode?
What will happen if all of peer nodes will die?

xiang90 · 2014-10-25T18:23:18Z

We have deprecated the standby feature in 0.5 by introducing new proxy mode. The proxy will not promote itself automatically. We found out that the auto demotion/promotion confuse people, since their behavior is not that controllable and deterministic. And configuration change is an important event to an etcd cluster. It should not happen frequently and we do expect human involvement. We may introduce deterministic auto recovery strategy to replace dead node.

vany-egorov · 2014-10-25T18:29:58Z

Auto recovery would be a great feature!
When there is a cluster of 30 nodes and even if 15 of them are dead service would be available.
Because Raft cluster would recover itself.

xiang90 · 2014-10-25T18:36:09Z

@vany-egorov etcd likes other datastore, it consumes CPU/network resources. It is even more important than the normal db service, since it provides the truth for your whole cluster. We do not think people should ever put the actual etcd server on a arbitrary machine in the cluster. Also the recovery process comes with a big cost.

We have a very draft plan to share with you at this moment:

having hot backups (observers or readonly etcd that holds the actual data)
the operators gives the recovery priority. (if one etcd machines goes down for X mins, we will try to join hot backup A then hot backup B, etc.) In another word, the recovery operations should be pre-defined by human operators.

xiang90 · 2014-10-25T18:48:21Z

@vany-egorov Moreover, we have two assumptions:

the probability of machine failure is low
the cluster maintenance can be controlled.(probably via etcd cluster itself)

So the case you described should not happen in most the case. Or you have a relatively huge probability to lose all your etcd machines (if all the etcd machines are in the 15 machines that went down). If that is really the case, then you need to consider to put etcd on a more stable and dedicated cluster.

vany-egorov · 2014-10-25T19:06:45Z

@xiangli-cmu Maybe you are right.

We shall try to implement recovery by add/remove node to/from etcd cluster. Automaticly or throught web ui.

Thanks!

asiragusa · 2014-10-26T13:35:15Z

@xiangli-cmu What about the automatic updates? During the updates the machine is considered failed and it can't be controlled, as they happen in background and ALL the nodes are involved in the operation.

xiang90 · 2014-10-26T15:23:07Z

@asiragusa
First we need to make clear when you want to remove a member from the existing etcd cluster. Basically there are two failure cases:

Member temporary unavailable: in this case, we probably do not want to remove the member from the cluster. If the number of unavailable members is less than the quorum, etcd cluster is still available. That is why we need etcd and how we provide HA. If the number of unavailable members is greater than or equal to the quorum, etcd cluster is temporary unavailable. When the members get recovered, etcd cluster will be available again.
Member permanent unavailable: in this case, we need to remove the permanent gone member from the etcd cluster and have a new member to join into the cluster to maintain the same failure tolerant ability. We hope this can be done explicitly by human. We can automate this process at some level though.

The automatic updates falls into case 1. Thus it has nothing to do with dynamic configuration change. Moreover, if you are using CoreOS it can be controlled.

asiragusa · 2014-10-26T15:33:16Z

Yep, I see, but you stated that

Also the recovery process comes with a big cost.

Which is the impact of this in case of a whole cluster update? And what about a node reinstall (#863)?

I am getting scared of etcd/fleet as it seems to me quite unstable ATM...

xiang90 · 2014-10-26T15:37:16Z

Recovery means adding a brand new member to the cluster to replace the retired one. It comes with a big cost, the log/snapshot need to be synced to the member.

If you still have the data on all the members, restarting etcd comes with nearly zero cost.

asiragusa · 2014-10-26T15:41:35Z

Can you estimate this cost please? Is it based on the snapshot size + the ops since the last snapshot?

asiragusa · 2014-10-26T15:43:05Z

And does it freeze the whole cluster during this operation?

Sorry about that, I know that it's a lot of questions :)

xiang90 · 2014-10-26T15:43:42Z

Can you please tell me what exact problem you want to solve first?

asiragusa · 2014-10-26T16:00:38Z

I need a reliable platform to work with and I had a lot of troubles with etcd / fleet.

I have to start on my cluster an good amount of short-lived units, let's say 1/s and each one lives for 15 minutes. I know that this involves fleet too, but during my tests etcd become unavailable too often and this caused troubles to fleet too.

Moreover, while rebooting my machines (once at a time), the cluster stopped working and therefore I had to create a new one with a new discovery url. Due to the high load of etcd I disabled the snapshots, as stated in the doc, but this probably made the things worse.

Now I am considering using Mesos / Marathon to do that job and CoreOS / etcd just to deploy the Mesos slaves on each server, due to the easy and fast setup.

However I still don't know if it will be able to handle that load. For sure etcd / fleet are still not ready and I have little hope that they will be soon :/

xiang90 · 2014-10-26T16:46:32Z

@asiragusa I think the first thing you need to do is to isolate the problems you met in etcd.
I see you are referring to several quite different problems. Please feel free to create issues for each of the problem when you have logs or you can help us reproduce it.
Thanks.

asiragusa · 2014-10-26T16:49:09Z

Sorry I'd rather put this comment on that issue

vany-egorov closed this as completed Oct 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

standby nodes in 0.5.0 #1416

standby nodes in 0.5.0 #1416

vany-egorov commented Oct 25, 2014

xiang90 commented Oct 25, 2014

vany-egorov commented Oct 25, 2014

xiang90 commented Oct 25, 2014

xiang90 commented Oct 25, 2014

vany-egorov commented Oct 25, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014

standby nodes in 0.5.0 #1416

standby nodes in 0.5.0 #1416

Comments

vany-egorov commented Oct 25, 2014

xiang90 commented Oct 25, 2014

vany-egorov commented Oct 25, 2014

xiang90 commented Oct 25, 2014

xiang90 commented Oct 25, 2014

vany-egorov commented Oct 25, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014

xiang90 commented Oct 26, 2014

asiragusa commented Oct 26, 2014