Ephemeral nodes? #174

derekchiang · 2013-09-16T04:20:02Z

ZooKeeper supports the creation of "ephemeral znodes", which are znodes (key-value pairs in etcd parlance) that are automatically deleted when the session that creates them terminates, either deliberately or due to a failure. Do you guys think this feature is worth adding to etcd?

I'm bringing this up because I'm working on a set of sync primitives, and some of them could be implemented more easily if there was this feature. For example, in implementing distributed locks, I have to make sure that if a node dies when holding a lock, the lock gets deleted. The way I'm doing this right now is to set TTL for the lock file and reset the file periodically, so that if the node dies the lock file will also disappear after some time. However if etcd supported ephemeral keys, I would just make the lock file ephemeral, so that if the node dies the lock file automatically gets deleted.

There are many other potential applications of this feature of course; this is just an example.

jzawodn · 2013-09-16T04:43:24Z

+1

xiang90 · 2013-09-16T04:43:57Z

@derekchiang We do not want to keep session concept in core etcd. We might leaveephemeral nodes to implement as modules.
Also, in your case I do not think an ephemeral node will help to much unless you want to bind a lock to a particular client.

derekchiang · 2013-09-16T05:01:54Z

@xiangli-cmu I thought it'd be nice to have it in core. The interface could be really simple:

curl -L http://127.0.0.1:4001/v1/keys/message -d value="Hello world" ephemeral=1

Then, the node would be created, but etcd would just hold the connection instead of closing it. When the client closes the connection, either deliberately or due to failure, etcd would just delete the node.

Also, I'm not sure why you think having ephemeral nodes won't help in implementing distributed locks. Could you explain a bit?

xiang90 · 2013-09-16T05:32:03Z

@derekchiang
The interface could be simple, but the implementation is not. If the etcd node the client connects to fails, the client need to reconnect to the leader to recover the session. Think about this, what if during the recover session, the client dies? How long should the new leader to wait for a client to recover the session? To do this, the server need to store the client state, which I do not want to do now.

If you think ephemeral as a node binding to a client, the node should disappear when the client dies/disconnects.
I was saying it will not help much under a condition. If you want to use ephemeral node to implement a lock, it means the lock will bind to a client. Then other clients will try to create a node at the lock path, the one successfully create the node will be the new lock owner.

progrium · 2013-09-16T05:35:09Z

Furthermore, Zookeeper ephemeral nodes are a trap because most use cases
for them are to represent a host availability (why else have it tied to tcp
session), but this can be misleading since connections can be held open
after they've closed, or a host can be unresponsive despite active session.
This is why best practice in this area is to have active heartbeats, which
are only slightly more work on the client side but is overall much simpler
and more effective in most cases.

On Sun, Sep 15, 2013 at 10:32 PM, Xiang Li notifications@github.com wrote:

@derekchiang https://github.com/derekchiang
The interface could be simple, but the implementation is not. If the etcdnode the client connects to fails, the client need to reconnect to the
leader to recover the session. Think about this, what if during the recover
session, the client dies? How long should the new leader to wait for a
client to recover the session? To do this, the server need to store the
client state, which I do not want to do now.

If you think ephemeral as a node binding to a client, the node should
disappear when the client dies/disconnects.
I was saying it will not help much under a condition. If you want to use
ephemeral node to implement a lock, it means the lock will bind to a
client. Then other clients will try to create a node at the lock path, the
one successfully create the node will be the new lock owner.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/174#issuecomment-24489649
.

Jeff Lindsay
http://progrium.com

derekchiang · 2013-09-16T17:21:55Z

I see. Yeah I agree that having ephemeral nodes would complicate the design. Closing this issue.

banks · 2014-02-09T16:17:36Z

Apologies for dredging this up after being closed for 5 months but this was top result when searching for etcd and ephemeral nodes and I wanted to add something for any other users who might search for that and not follow the conclusion here (I imagine quite a few people who are trying to compare to Zookeeper).

Zookeeper ephemeral nodes are a trap ... can be misleading since connections can be held open
after they've closed, or a host can be unresponsive despite active session ... This is why best practice in this area is to have active heartbeats

To be clear Zookeeper ephemeral nodes are implemented using heartbeats. Relying on tcp connection not closing is obviously very flawed as stated and is a great reason that this request is an invalid suggestion for etcd. ZooKeeper works because every connected client sends a heartbeat every few seconds and if the server stop receiving them it will close the session and THEN it will remove ephemeral nodes. Which is why they are exactly the right thing to use for host availability as suggested above.

The reason this is not a good fit for etcd as far as I see it is because etcd has chosen to model the service as a Restful API. Restful APIs explicitly choose not to be stateful or session based. Even ignoring that, HTTP protocol is not really flexible enough to support the kind of constant tiny heartbeat messages send periodically over a single socket (ignoring keep-alive) which is required for ephemeral nodes to be any use at all. The closest you can get is to write a value with a short TTL and then refresh it periodically to keep the "session" active -- exactly as the OP suggested his current availability markers/locks work.

Hope this helps clarify for anyone else who might be wondering about this and finds this thread.

Sorry again for resurrecting long dead thread.

ghost · 2014-03-20T02:42:58Z

+1

Also, just to repeat what @banks said, ZooKeeper ephemeral nodes are implemented using active heartbeats.

kjeet · 2014-04-16T07:40:49Z

@m1ch1 @banks
Zookeeper does not have heartbeats for ephemeral nodes, it heartbeats per session, which I believe is not a concept of Etcd.

@progrium
I'm not trying to make the case that Etcd needs to be all things for all people (least of all a fit of every use-case for Zookeeper), but does present a substantial difference in what Etcd is good at. Etcd always comes up as an alternative to Zookeeper, but its worth noting that this represents a substantial difference. Ephemeral nodes are extremely useful for most distributed algorithms, membership tracking, etc.

I'm just starting to explore etcd so correct me if I'm wrong, but Etcd doesn’t have sessions, thus it means scalability issues for a few reasons:

Heartbeats on the TTL requires a heartbeat per lock in Eetcd instead of on a per node basis in Zookeeper. If I have a single node holding a thousand locks participating in multiple semaphores or leader elections, Zookeeper's single session saves us from the trouble of thousands of independent heartbeats.
Additionally, Etcd's TTL heartbeats cost a hell of a lot more since (I presume) they are updates that require consensus and notifying watchers. In Zookeeper only a disconnect would start triggering watchers and changes in the nodes themselves.

I suppose there may be clever solution in Etcd around this, but its late and I'm tired.

banks · 2014-04-16T07:52:17Z

I realise the heartbeats were per session but thanks for your comment - it more eloquently sums up the significant difference in session based rather than restful design which was largely what I was trying to point out to :)

On 16 Apr 2014, at 08:40, Kevinjeet Gill notifications@github.com wrote:

@m1ch1 @banks
Zookeeper does not have heartbeats for ephemeral nodes, it heartbeats per session, which I believe is not a concept of etcd.

@progrium
I'm not trying to make the case that etcd needs to be all things for all people (least of all a fit of every use-case for zookeeper), but does present a substantial difference in what etcd is good at. Etcd always comes up as an alternative to zookeeper, but its worth noting that this represents a substantial difference. Ephemeral nodes are extremely useful for most distributed algorithms, membership tracking, etc.

I'm just starting to explore etcd so correct me if I'm wrong, but etcd doesn’t have sessions, thus it means scalability issues for a few reasons:

Heartbeats on the TTL requires a heartbeat per lock in etcd instead of on a per node basis in zookeeper. If I have a single node holding a thousand locks participating in multiple semaphores or leader elections, zookeeper's single session saves us from the trouble of thousands of independent heartbeats.
Additionally, Etcd's TTL heartbeats cost a hell of a lot more since (I presume) they are updates that require consensus and notifying watchers. In zookeeper only a disconnect would start triggering watchers and changes in the nodes themselves.
I suppose there may be clever solution in etcd around this, but its late and I'm tired.

—
Reply to this email directly or view it on GitHub.

ghost · 2014-04-16T08:56:59Z

@kjeet I did not say each ephemeral node has a separate heartbeat.

bmizerany · 2014-04-16T17:44:53Z

TTLs and ephemeral nodes are convenient for a certain class of problems. Determining when to release locks is NOT one of them. It's like slamming the door in a friend's face while screaming "Too late!", and leaving them behind in the cold to starve, and fend for themselves. What gets worse is when that friend takes their revenge by corrupting your data for being so callous. Nobody wins.

To determine liveness, you need many distributed probes performing meaningful diagnostic checks. Once these probes can prove a peer is down and no longer doing the work it acquired the lock for, they can then release the lock on behalf of that peer.

I used the word meaningful above for a reason: There is no one-size-fits-all way for this. There are already monitoring systems to make building these systems easier, but you must still ensure when you tell them something is not ok, it isn't.

philips · 2014-04-17T06:16:01Z

@bmizerany Yes, the current primitives we have aren't enough.

For most simple systems it all comes down to timeouts. The trick is having the primitives to detect when your friend has been out of the conversation for too long to have the context to join back in and say anything useful.

A way to solve the issue with etcd having TTLs is having a mechanism for guarding the keyspace against actions when the lock is lost. The concept of a key index predicate, that we discussed to solve the fleet version upgrade problem, can be used to solve the TTL lock problem too. Essentially you would say: update this other key only if this other key (the lock) hasn't changed its index from X.

bmizerany · 2014-04-17T22:12:23Z

@philips You're right. That solution works well for preventing a client from writing to a key they no longer have a lock for. It doesn't work when the holder of the lock is writing to another database. For eample: A postgres leader goes down, or is thought to be down, because it's monitor hasn't responded in awhile. It would be very unfortunate if clients are still writing to this master and a dumb TTL kicks in to say "Too late!", deletes the lock, and another picks it up to claim it is the new leader. This results in a split-brain. That is the data corruption I was referencing. I'm sorry for not being so clear.

The above class of problems comes down to: "It depends." when thinking through solutions. TTLs may have a role in ones solution, but usually there are more reliable ways of determining liveness.

jolestar · 2015-09-23T07:09:23Z

+1

philips · 2015-09-23T23:27:56Z

The etcd v3 API will add the concept of a lease. When the lease ttl expires multiple keys can be deleted. See https://github.com/coreos/etcd/blob/master/Documentation/rfc/v3api.md

We have an early preview of the v3 API in the latest etcd v2.2 release but leases haven't been implemented yet.

Please let us know if leases solve your use case @jolestar @m1ch1 @derekchiang @banks @kjeet

derekchiang closed this as completed Sep 16, 2013

derekchiang reopened this Sep 16, 2013

derekchiang closed this as completed Sep 16, 2013

jolynch mentioned this issue Sep 24, 2015

Etcd service watcher airbnb/synapse#58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ephemeral nodes? #174

Ephemeral nodes? #174

derekchiang commented Sep 16, 2013

jzawodn commented Sep 16, 2013

xiang90 commented Sep 16, 2013

derekchiang commented Sep 16, 2013

xiang90 commented Sep 16, 2013

progrium commented Sep 16, 2013

derekchiang commented Sep 16, 2013

banks commented Feb 9, 2014

ghost commented Mar 20, 2014

kjeet commented Apr 16, 2014

banks commented Apr 16, 2014

ghost commented Apr 16, 2014

bmizerany commented Apr 16, 2014

philips commented Apr 17, 2014

bmizerany commented Apr 17, 2014

jolestar commented Sep 23, 2015

philips commented Sep 23, 2015

Ephemeral nodes? #174

Ephemeral nodes? #174

Comments

derekchiang commented Sep 16, 2013

jzawodn commented Sep 16, 2013

xiang90 commented Sep 16, 2013

derekchiang commented Sep 16, 2013

xiang90 commented Sep 16, 2013

progrium commented Sep 16, 2013

derekchiang commented Sep 16, 2013

banks commented Feb 9, 2014

ghost commented Mar 20, 2014

kjeet commented Apr 16, 2014

banks commented Apr 16, 2014

ghost commented Apr 16, 2014

bmizerany commented Apr 16, 2014

philips commented Apr 17, 2014

bmizerany commented Apr 17, 2014

jolestar commented Sep 23, 2015

philips commented Sep 23, 2015