Follow-up to #37. `DistributedData` is in-memory: every replica's CRDT state is reconstructed via gossip on join. In a stable cluster this is fine, but a full cluster restart (deploy, outage) loses every CRDT — the cluster comes back up empty even though no individual replica crashed in isolation.
Strategy: plug a per-key durable backend behind the existing API, reusing the `DurableStateStore` interface (already supports SQLite, S3, Cassandra, filesystem) so users get persistence on whatever they already configured for actor state.
Behaviour:
- `extension(DistributedDataId).start(cluster, { durableStore })` opts in. Without the option DD stays in-memory (default).
- On every successful `update(key, ...)` or merge, write the resulting CRDT state to the durable store keyed by `(replicaId, key)`.
- On `preStart`, load every key the durable store knows for this replica and merge into the in-memory view.
- Listeners fire on the merge of in-memory + persisted state (so subscribers see the recovered value immediately on startup).
Components:
| File |
Task |
| `src/crdt/DistributedData.ts` |
Accept optional `durableStore: DurableStateStore` setting; wire load-on-start + write-on-merge. |
| `src/crdt/DurableDistributedDataStore.ts` (new) |
Thin wrapper that handles serialisation + version tracking per (replica, key). |
| `tests/unit/crdt/DurableDistributedData.test.ts` (new) |
Round-trip, restart-persistence, two-replicas-converge-after-restart. |
Estimate: 2-3 days.
Verification:
- Restart a node, its DD view is restored to last-saved state.
- Two-replica test: both write to a key, restart both, the merged value matches the pre-restart merged value.
Out of scope:
- Cross-replica replication of the durable store itself (each replica has its own). The CRDT semantics handle convergence across replicas via gossip — durability is per-replica.
Follow-up to #37. `DistributedData` is in-memory: every replica's CRDT state is reconstructed via gossip on join. In a stable cluster this is fine, but a full cluster restart (deploy, outage) loses every CRDT — the cluster comes back up empty even though no individual replica crashed in isolation.
Strategy: plug a per-key durable backend behind the existing API, reusing the `DurableStateStore` interface (already supports SQLite, S3, Cassandra, filesystem) so users get persistence on whatever they already configured for actor state.
Behaviour:
Components:
Estimate: 2-3 days.
Verification:
Out of scope: