You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We already expose a basic set of management endpoints (`/cluster/members`, `/cluster/leader`, `/health`, `/ready`, `/cluster/leave` — see `src/management/Management.ts`). A few more would round it out for production K8s deployments where operators want fine-grained control + observability without dropping into the codebase:
To add:
`GET /cluster/shards/{typeName}` — current shard-to-node allocations for a sharded type. Backed by `ShardCoordinator` state on the leader (proxied if request lands on a follower).
`GET /cluster/shards/{typeName}/{shardId}` — shard's current owner + pending handoff state if any.
`POST /cluster/leave-and-shutdown` — graceful `cluster.leave()` + `system.terminate()` in one call. Used by `preStop` hooks on K8s pods.
`POST /cluster/down/{address}` — operator-initiated downing for a stuck-unreachable node (already exists in some Akka versions; we'd need to expose it through the existing downing path).
`GET /metrics` — Prometheus exposition. Builds on Metrics export (Prometheus / OpenMetrics) #11 (Prometheus metrics) — mounted on the same HTTP backend used for the other endpoints.
Components:
File
Task
`src/management/Management.ts`
Add the four new routes; gate `/cluster/down` + `/cluster/leave-and-shutdown` behind opt-in flags so they don't surprise operators.
`src/management/ShardManagement.ts` (new)
Helpers that ask the local `ShardCoordinator` for shard state.
`tests/unit/management/Management.test.ts`
New cases for each endpoint.
Estimate: 2 days.
Verification:
`GET /cluster/shards/entity` returns the same allocations a direct coordinator query would.
`POST /cluster/leave-and-shutdown` cleanly terminates the system in < 5 s.
We already expose a basic set of management endpoints (`/cluster/members`, `/cluster/leader`, `/health`, `/ready`, `/cluster/leave` — see `src/management/Management.ts`). A few more would round it out for production K8s deployments where operators want fine-grained control + observability without dropping into the codebase:
To add:
Components:
Estimate: 2 days.
Verification:
Out of scope: