Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standby nodes in 0.5.0 #1416

Closed
vany-egorov opened this issue Oct 25, 2014 · 15 comments
Closed

standby nodes in 0.5.0 #1416

vany-egorov opened this issue Oct 25, 2014 · 15 comments

Comments

@vany-egorov
Copy link

In etcd v0.4.6 standby node was able to act as peer node if one of peer nodes is dead. Standbys are not part of the Raft cluster themselves.
Standby node was able to replace dead peer node automaticly.
It was very usefull in case of big cluster size (>20 machines).
It was able to set cluster-active-size, cluster-remove-delay, cluster-sync-interval.

  1. How to get a similar effect using proxy=on mode?
  2. What will happen if all of peer nodes will die?
@xiang90
Copy link
Contributor

xiang90 commented Oct 25, 2014

We have deprecated the standby feature in 0.5 by introducing new proxy mode. The proxy will not promote itself automatically. We found out that the auto demotion/promotion confuse people, since their behavior is not that controllable and deterministic. And configuration change is an important event to an etcd cluster. It should not happen frequently and we do expect human involvement. We may introduce deterministic auto recovery strategy to replace dead node.

@vany-egorov
Copy link
Author

Auto recovery would be a great feature!
When there is a cluster of 30 nodes and even if 15 of them are dead service would be available.
Because Raft cluster would recover itself.

@xiang90
Copy link
Contributor

xiang90 commented Oct 25, 2014

@vany-egorov etcd likes other datastore, it consumes CPU/network resources. It is even more important than the normal db service, since it provides the truth for your whole cluster. We do not think people should ever put the actual etcd server on a arbitrary machine in the cluster. Also the recovery process comes with a big cost.

We have a very draft plan to share with you at this moment:

  1. having hot backups (observers or readonly etcd that holds the actual data)
  2. the operators gives the recovery priority. (if one etcd machines goes down for X mins, we will try to join hot backup A then hot backup B, etc.) In another word, the recovery operations should be pre-defined by human operators.

@xiang90
Copy link
Contributor

xiang90 commented Oct 25, 2014

@vany-egorov Moreover, we have two assumptions:

  1. the probability of machine failure is low
  2. the cluster maintenance can be controlled.(probably via etcd cluster itself)

So the case you described should not happen in most the case. Or you have a relatively huge probability to lose all your etcd machines (if all the etcd machines are in the 15 machines that went down). If that is really the case, then you need to consider to put etcd on a more stable and dedicated cluster.

@vany-egorov
Copy link
Author

@xiangli-cmu Maybe you are right.

We shall try to implement recovery by add/remove node to/from etcd cluster. Automaticly or throught web ui.

Thanks!

@asiragusa
Copy link

@xiangli-cmu What about the automatic updates? During the updates the machine is considered failed and it can't be controlled, as they happen in background and ALL the nodes are involved in the operation.

@xiang90
Copy link
Contributor

xiang90 commented Oct 26, 2014

@asiragusa
First we need to make clear when you want to remove a member from the existing etcd cluster. Basically there are two failure cases:

  1. Member temporary unavailable: in this case, we probably do not want to remove the member from the cluster. If the number of unavailable members is less than the quorum, etcd cluster is still available. That is why we need etcd and how we provide HA. If the number of unavailable members is greater than or equal to the quorum, etcd cluster is temporary unavailable. When the members get recovered, etcd cluster will be available again.
  2. Member permanent unavailable: in this case, we need to remove the permanent gone member from the etcd cluster and have a new member to join into the cluster to maintain the same failure tolerant ability. We hope this can be done explicitly by human. We can automate this process at some level though.

The automatic updates falls into case 1. Thus it has nothing to do with dynamic configuration change. Moreover, if you are using CoreOS it can be controlled.

@asiragusa
Copy link

Yep, I see, but you stated that

Also the recovery process comes with a big cost.

Which is the impact of this in case of a whole cluster update? And what about a node reinstall (#863)?

I am getting scared of etcd/fleet as it seems to me quite unstable ATM...

@xiang90
Copy link
Contributor

xiang90 commented Oct 26, 2014

Recovery means adding a brand new member to the cluster to replace the retired one. It comes with a big cost, the log/snapshot need to be synced to the member.

If you still have the data on all the members, restarting etcd comes with nearly zero cost.

@asiragusa
Copy link

Can you estimate this cost please? Is it based on the snapshot size + the ops since the last snapshot?

@asiragusa
Copy link

And does it freeze the whole cluster during this operation?

Sorry about that, I know that it's a lot of questions :)

@xiang90
Copy link
Contributor

xiang90 commented Oct 26, 2014

Can you please tell me what exact problem you want to solve first?

@asiragusa
Copy link

I need a reliable platform to work with and I had a lot of troubles with etcd / fleet.

I have to start on my cluster an good amount of short-lived units, let's say 1/s and each one lives for 15 minutes. I know that this involves fleet too, but during my tests etcd become unavailable too often and this caused troubles to fleet too.

Moreover, while rebooting my machines (once at a time), the cluster stopped working and therefore I had to create a new one with a new discovery url. Due to the high load of etcd I disabled the snapshots, as stated in the doc, but this probably made the things worse.

Now I am considering using Mesos / Marathon to do that job and CoreOS / etcd just to deploy the Mesos slaves on each server, due to the easy and fast setup.

However I still don't know if it will be able to handle that load. For sure etcd / fleet are still not ready and I have little hope that they will be soon :/

@xiang90
Copy link
Contributor

xiang90 commented Oct 26, 2014

@asiragusa I think the first thing you need to do is to isolate the problems you met in etcd.
I see you are referring to several quite different problems. Please feel free to create issues for each of the problem when you have logs or you can help us reproduce it.
Thanks.

@asiragusa
Copy link

Sorry I'd rather put this comment on that issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants