Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
lease: Persist remainingTTL to prevent indefinite auto-renewal of long lived leases #9924
Fixes #9888 by introducing a "lease checkpointing" mechanism.
The basic ideas is that for all leases with TTLs greater than 5 minutes, their remaining TTL will be checkpointed every 5 minutes so that if a new leader is elected, the leases are not auto-renewed to their full TTL, but instead only to the remaining TTL from the last checkpoint. A checkpoint is an entry that persisted to the RAFT consensus log that records the remainingTTL as determined by the leader when the checkpoint occurred.
If keep-alive is called on a lease that has been checkpointed. The remaining TTL will be cleared by a checkpoint entry in the RAFT consensus log where remainingTTL=0, indicating it is unset and that the original TTL should be used.
All checkpointing is scheduled and performed by the leader, and when a new leader is elected, it takes over checkpointing as part of
An advantage of this approach is that leases where keep-alive is called often will still write at most two entries to the RAFT consensus log every 5 minutes since only the first keep-alive after a checkpoint must be recorded to the RAFT consensus log, all other keep-alives can be ignored.
Additionally, to prevent this mechanism from degrading system performance, it is designed to be best effort. There is a limit on how many checkpoints can be persisted per second, and how many pending checkpoint operations can be scheduled. If these limits are reached, checkpoints may not be scheduled or written to the RAFT consensus log to prevent the checkpointing operations from overwhelming the system, which could otherwise occur if large volumes of long lived leases were granted.
gyuho left a comment
I will have another look next week as well. And just quick question from first pass, if
3 times, most recently
Jul 16, 2018
changed the title
[WIP] lease: Persist remainingTTL to prevent indefinite auto-renewal of long lived leases
Jul 17, 2018
For review, note that the PR now has 4 separate commits:
The main change is c81870f, which is only +243 -45 lines.
@@ Coverage Diff @@ ## master #9924 +/- ## ========================================== + Coverage 68.99% 69.03% +0.03% ========================================== Files 386 386 Lines 35792 35891 +99 ========================================== + Hits 24695 24776 +81 - Misses 9296 9300 +4 - Partials 1801 1815 +14
Ran two benchmarks:
Checkpoint heap size Benchmark
Checked etcd server heap size up to 10,000,000 live leases.
Checkpoint rate limit Benchmark
Set leases to checkpoint every 1s, created 15k of them, and then checked server performance with
Since 1,000,000 checkpoints per sec seems sufficient, and the limits of maxLeaseCheckpointBatchSize=1000, leaseCheckpointRate=1000 appear to have negligible impact on performance, I've gone with those settings.