Follow-up to #35. Today the ShardCoordinator rebuilds its allocation map (shardHome, regions, entitiesPerShard) entirely from RegisterRegion gossip on every LeaderChanged event. For small clusters this is fine — a few hundred shards re-register in well under a second. For clusters with thousands of shards plus rememberEntities, leader failover triggers a brief reallocation storm: every region re-registers, the new leader's coordinator runs tryAllocate for every shard from scratch, and entities are briefly unreachable.
Strategy: persist the coordinator's authoritative state in DistributedData (which now exists, see #37) under a well-known key. On leader-change the new leader reads the snapshot, applies pending re-registrations as deltas, and skips full re-allocation. When DD is unavailable (single-node dev, no extension started) we fall back to the current rebuild-from-gossip path.
Components:
| File |
Task |
| `src/cluster/sharding/ShardCoordinator.ts` |
Persist `{regions, shardHome, entitiesPerShard}` to DD on every meaningful state transition; load on `onLeaderChanged` if `isLeader`. |
| `src/cluster/sharding/CoordinatorState.ts` (new) |
Serialisable shape + a CRDT-friendly merge (per-shard LWW on `shardHome`, per-region LWW on `regions`). |
| `tests/multi-node/sharding-coordinator-recovery.test.ts` (new) |
3-node test: leader crashes mid-allocation, new leader recovers state in < 500 ms, entities reachable without rebalance. |
Estimate: 3-5 days.
Verification:
- Existing sharding tests stay green.
- New multi-node test confirms allocations survive a leader crash without re-running `tryAllocate` for every shard.
Out of scope:
- Cross-cluster federation (DD is intra-cluster only).
- Coordinator state for ShardedDaemonProcess specifically (covered by the existing rememberEntities path; DD-backing comes for free).
Follow-up to #35. Today the
ShardCoordinatorrebuilds its allocation map (shardHome,regions,entitiesPerShard) entirely fromRegisterRegiongossip on everyLeaderChangedevent. For small clusters this is fine — a few hundred shards re-register in well under a second. For clusters with thousands of shards plusrememberEntities, leader failover triggers a brief reallocation storm: every region re-registers, the new leader's coordinator runstryAllocatefor every shard from scratch, and entities are briefly unreachable.Strategy: persist the coordinator's authoritative state in
DistributedData(which now exists, see #37) under a well-known key. On leader-change the new leader reads the snapshot, applies pending re-registrations as deltas, and skips full re-allocation. When DD is unavailable (single-node dev, no extension started) we fall back to the current rebuild-from-gossip path.Components:
Estimate: 3-5 days.
Verification:
Out of scope: