Today ClusterSingleton.start({ typeName, entityProps }) runs the singleton on whichever node membership thinks is oldest. Under a network partition the gossip protocol can converge to two oldest views, briefly running the singleton on two nodes simultaneously — the classic split-brain failure mode.
Adding an optional lease parameter that wraps the entity-spawn in an acquire-and-guard pattern fixes this:
const lease = new KubernetesLease({
name: 'app-cron-singleton', namespace: 'default',
owner: process.env.HOSTNAME ?? 'local',
ttlMs: 30_000,
});
singleton.start({
typeName: 'cron',
entityProps: Props.create(() => new CronActor()),
lease, // ← only the lease holder spawns
});
Behaviour change:
- On 'I am the elected oldest' notification, the manager calls
lease.acquire() before spawning. If acquire returns false, it leaves the slot empty and re-tries on a configurable interval — another node currently holds the lease.
- The manager subscribes to
lease.onLost(reason) and stops the entity if the lease is revoked mid-flight.
- On graceful handoff (older node becoming reachable), the current holder calls
lease.release() before the membership transition completes.
Out of scope:
- The Lease object itself — it's already provided by #33 (KubernetesLease, just landed) and the existing InMemoryLease.
- Lease integration on ShardCoordinator (the cluster's other singleton-shaped thing) — would warrant its own issue if needed.
Components:
| File |
Task |
| src/cluster/singleton/ClusterSingleton.ts |
accept lease?: Lease in start options |
| src/cluster/singleton/ClusterSingletonManager.ts |
wrap entity spawn in lease.acquire(); subscribe to onLost; release on shutdown |
| tests/unit/cluster/singleton/ClusterSingletonLease.test.ts (new) |
acquire-blocks-spawn, onLost-stops-entity, release-on-leave |
| examples/coordination/k8s-lease-singleton.ts |
rewrite using the new option (much shorter than the manual acquire-and-guard loop the example currently shows) |
Estimate: 1-2 days. Surfaced as a follow-up to #33 — the example for KubernetesLease today reaches into a manual acquire/onLost/release loop because this option doesn't exist yet.
Today
ClusterSingleton.start({ typeName, entityProps })runs the singleton on whichever node membership thinks is oldest. Under a network partition the gossip protocol can converge to two oldest views, briefly running the singleton on two nodes simultaneously — the classic split-brain failure mode.Adding an optional
leaseparameter that wraps the entity-spawn in an acquire-and-guard pattern fixes this:Behaviour change:
lease.acquire()before spawning. If acquire returns false, it leaves the slot empty and re-tries on a configurable interval — another node currently holds the lease.lease.onLost(reason)and stops the entity if the lease is revoked mid-flight.lease.release()before the membership transition completes.Out of scope:
Components:
lease?: Leasein start optionsEstimate: 1-2 days. Surfaced as a follow-up to #33 — the example for KubernetesLease today reaches into a manual acquire/onLost/release loop because this option doesn't exist yet.