Skip to content

Conversation

@nshkrdotcom
Copy link
Contributor

What

  • Recover monitors and stale PIDs after ActorOwner restart.
  • Clean up ActorTable entries and terminate actors when shard ownership is lost.
  • Isolate ActorTable keys by capability and module to prevent collisions.
  • Serialize sync_shards/0 and gate capability propagation to avoid transient races.

Why

These fixes close lifecycle gaps that can leave stale registry entries, orphaned actors, or cross-capability collisions, and make shard sync deterministic under concurrent updates.

Fixes #2

Tests

  • mix test test/mesh/shard_lifecycle_test.exs --trace
  • mix test test/mesh/shard_rebalancing_test.exs --trace

- Recover monitors and stale PIDs after ActorOwner restart.
- Terminate actors and purge ActorTable on shard ownership loss.
- Key ActorTable entries by {capability, module, id} to prevent collisions.
- Serialize sync_shards and gate capability propagation to avoid races.
- Update test infra for distributed rebalancing.

Fixes eigr#2
@sleipnir
Copy link
Member

Hi @nshkrdotcom, this is beautiful, thank you for contributing!

@sleipnir sleipnir merged commit 3fb589b into eigr:main Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shard lifecycle gaps cause stale ActorTable entries, orphaned actors, and capability collisions

2 participants