Skip to content

LeaseMajority split-brain resolver — external Lease as tiebreaker #51

@pathosDev

Description

@pathosDev

We ship four split-brain resolver strategies today: `StaticQuorum`, `KeepOldest`, `KeepReferee`, `KeepMajority`. Akka adds a fifth: `LeaseMajority` — when two equal-sized partitions form, the side that successfully acquires an external `Lease` survives; the other side downs itself.

This is the gold-standard split-brain resolver because:

  • The Lease is held by an external service (K8s API, Postgres advisory lock, …) that's outside the partitioned cluster, so it sees both sides clearly.
  • Equal-size partitions where neither has a deterministic winner via membership properties (oldest, role) get a clean external arbiter.

We already have the pieces: `KubernetesLease` (#33) + `InMemoryLease` + the `Lease` interface. This issue wires them into a downing strategy.

API:

```ts
await Cluster.join(system, {
...,
downing: new LeaseMajority({
lease: new KubernetesLease({ name: 'cluster-arbitration', ... }),
acquireTimeoutMs: 5_000,
role: 'leader-eligible', // optional — restrict who tries to acquire
}),
});
```

When the cluster sees an unreachable partition, both sides start the algorithm:

  1. Compute "would I be the majority?" using the existing `KeepMajority` math.
  2. If yes → unconditionally survive (no Lease needed).
  3. If equal-size partition → try `lease.acquire()` with the timeout. Acquiring side survives, the other side downs itself.

Components:

File Task
`src/cluster/downing/LeaseMajority.ts` (new) Strategy implementation.
`src/cluster/downing/index.ts` Re-export.
`tests/unit/cluster/downing/DowningStrategies.test.ts` Add LeaseMajority cases.
`tests/multi-node/lease-majority.test.ts` (new) 4-node split (2/2) where both sides try to acquire; only one wins.

Estimate: 2-3 days.

Verification:

  • Equal-size partition: the side that calls `acquire` first wins; the other side eventually downs itself within `acquireTimeoutMs` + downing-detection-window.
  • Lease unavailable (network problem reaching the K8s API): both sides fall back to staying in unreachable mode rather than risk both surviving.

Out of scope:

  • Lease implementation — already exists (`KubernetesLease`, `InMemoryLease`).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpriority: highTop priority — high impact, plan next

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions