Skip to content

Cluster-management HTTP endpoints — extended (shards, leave-and-shutdown, metrics) #56

@pathosDev

Description

@pathosDev

We already expose a basic set of management endpoints (`/cluster/members`, `/cluster/leader`, `/health`, `/ready`, `/cluster/leave` — see `src/management/Management.ts`). A few more would round it out for production K8s deployments where operators want fine-grained control + observability without dropping into the codebase:

To add:

  • `GET /cluster/shards/{typeName}` — current shard-to-node allocations for a sharded type. Backed by `ShardCoordinator` state on the leader (proxied if request lands on a follower).
  • `GET /cluster/shards/{typeName}/{shardId}` — shard's current owner + pending handoff state if any.
  • `POST /cluster/leave-and-shutdown` — graceful `cluster.leave()` + `system.terminate()` in one call. Used by `preStop` hooks on K8s pods.
  • `POST /cluster/down/{address}` — operator-initiated downing for a stuck-unreachable node (already exists in some Akka versions; we'd need to expose it through the existing downing path).
  • `GET /metrics` — Prometheus exposition. Builds on Metrics export (Prometheus / OpenMetrics) #11 (Prometheus metrics) — mounted on the same HTTP backend used for the other endpoints.

Components:

File Task
`src/management/Management.ts` Add the four new routes; gate `/cluster/down` + `/cluster/leave-and-shutdown` behind opt-in flags so they don't surprise operators.
`src/management/ShardManagement.ts` (new) Helpers that ask the local `ShardCoordinator` for shard state.
`tests/unit/management/Management.test.ts` New cases for each endpoint.

Estimate: 2 days.

Verification:

  • `GET /cluster/shards/entity` returns the same allocations a direct coordinator query would.
  • `POST /cluster/leave-and-shutdown` cleanly terminates the system in < 5 s.
  • Metrics endpoint emits valid Prometheus exposition format (covered by Metrics export (Prometheus / OpenMetrics) #11's tests once both land).

Out of scope:

  • A web UI for this data — operators consume via curl / Prom / Grafana. UI is a separate, larger effort.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpriority: lowNice-to-have / niche / demand-driven

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions