Skip to content

Persistent Remember-Entities for Cluster Sharding #49

@pathosDev

Description

@pathosDev

Today `ShardCoordinator` keeps the remembered-entities list in memory: `entitiesPerShard` is rebuilt from `EntityStarted` / `EntityStopped` notifications during the cluster's lifetime. That works while the cluster runs, but a full cluster restart (deploy, outage) loses the entire entity registry — `rememberEntities: true` becomes a no-op until messages re-arrive.

Strategy: persist `entitiesPerShard` to the existing journal under a well-known persistenceId (`sharding-coordinator-{typeName}`). On coordinator preStart, replay the journal to rebuild the registry before processing the first message. Builds on issue #39 (Persistent ShardCoordinator state) but is scoped narrower — only the entity registry, not the full allocation map.

Components:

File Task
`src/cluster/sharding/ShardCoordinator.ts` On `handleEntityStarted` / `handleEntityStopped`, persist a delta to the journal.
`src/cluster/sharding/RememberEntitiesStore.ts` (new) Tiny abstraction so users can plug in a custom backend (default: same Journal as `PersistentActor`).
`tests/multi-node/sharding-remember-entities.test.ts` (new) Multi-node test: spawn entities, full cluster restart, entities recreated on the new coordinator without user messages.

Estimate: 3-4 days.

Verification:

  • Restart the entire cluster: the new coordinator loads the persisted entity registry and re-issues `RememberedEntities` to the regions, which respawn entities.
  • Stress test: 10k entities across 3 nodes, restart cluster, every entity is reachable in < 10 s.

Out of scope:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions