Skip to content
This repository has been archived by the owner on Oct 7, 2023. It is now read-only.

correctly account for ephemeral node expiration in parent znode stats #90

Open
heyitsanthony opened this issue Nov 9, 2017 · 0 comments

Comments

@heyitsanthony
Copy link
Contributor

Spun off of #88.

zetcd uses the CVersion key's revision and version to compute the znode's Pzxid and CVersion respectively. When a child changes (e.g., creation, deletion), it touches the CVersion key to bump these values. Ephemeral key expiration uses etcd lease expiration, so it does not touch CVersion when it is deleted.

One possible solution involves extending etcd to associate a transaction with a lease (cf. etcd-io/etcd#8842). Ideally, each ephemeral key would have a lease transaction that would touch its parent's CVersion key. This is probably expecting too much since it is too invasive on the etcd side; the txn logic would have to permit multiple updates to a key in the same revision and likely require deep mvcc changes. Alternatively, new "deleted ephemeral" keys could be created in the lease txn to mark tombstones for each expired key; the tombstones would then be used for reconciling the fields. Tombstones avoid multi-updates, but would need STM extensions for ranges (a feature request made a few times in the past, but only possible in 3.3+).

An approach with reconciliation but without lease txns: maintain a per-znode list of ephemeral children (elist), a per-ephemeral node key with a matching ephemeral owner (ekey), and a global revision offset key:

  • When creating an ephemeral key, add name to elist and create ekey if key does not exist. Wait on reconciliation if already in the elist.
  • When computing Stat, fetch the elist and compare with the child keys to detect expiry and wait for reconciliation.
  • A reconciliation goroutine watches for ekey deletion events. For each set of deleted ekeys under the same znode, set CVersion's count to the count-1, its zxid to the deletion event zxid and the current revision offset version, remove the keys from the elist, and touch the revision offset key. Notify waiters.
  • The revision offset is subtracted from the current zxid to compensate for the extra revisions from reconciliation txns.
  • Record the current revision offset in the mtime and ctime keys for computing mzxid and czxid. Compute via etcdrev-offset.
  • Record a count and the current revision offset in CVersion.
  • Compute CVersion by adding the stored count value to the key version.
  • Compute PZxid by using the stored CVersion zxid if no changes since last expiry
  • Will need some way to handle losing the reconciliation watch due to compaction.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant