api, node: add runtime aggregator role toggle#636
api, node: add runtime aggregator role toggle#636tcoratger merged 1 commit intoleanEthereum:mainfrom
Conversation
tcoratger
left a comment
There was a problem hiding this comment.
Can we have test vectors as well for the client teams to consume?
Like this https://github.com/leanEthereum/leanSpec/tree/main/tests/consensus/devnet/api
|
@tcoratger good call — pushed 364c8cc which adds To support a stateful endpoint, I extended the
Six new fixtures land under
Regenerate locally with I deliberately kept error paths (400 on missing/non-bool |
7b21e2a to
97f327d
Compare
Adds a new admin endpoint /lean/v0/admin/aggregator that allows
operators to activate or deactivate a node's aggregator role at
runtime, enabling rotation of aggregation duties across nodes
without restarting when an active aggregator becomes unhealthy.
- GET /lean/v0/admin/aggregator returns the current role.
- POST /lean/v0/admin/aggregator with {"enabled": bool} toggles
the role and returns the previous value.
A new AggregatorController serializes toggles under an asyncio
lock and keeps the sync and network services in sync. The Node
wires the controller into ApiServer when API is enabled.
When no controller is wired, the endpoints return 503.
Includes consensus test vectors for the aggregator admin endpoint
so client teams can validate their implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
97f327d to
48a4132
Compare
## Summary Ports [leanSpec PR #636](leanEthereum/leanSpec#636). Adds two admin endpoints that let operators rotate aggregation duties across nodes at runtime without a restart. - `GET /lean/v0/admin/aggregator` returns `{"is_aggregator": bool}` - `POST /lean/v0/admin/aggregator` with body `{"enabled": bool}` returns `{"is_aggregator": <new>, "previous": <old>}` Returns `503` when the controller is not wired, `400` on malformed body or non-boolean `enabled` (JSON ints `0`/`1` are explicitly rejected, matching the spec). ## Design The CLI flag `--is-aggregator` now seeds a shared `AggregatorController` (a thin `Arc<AtomicBool>` in `ethlambda-types`) instead of a plain bool. The blockchain actor reads the flag fresh on every tick and every gossip attestation, so runtime toggles take effect from the next tick with no protocol-message churn. `axum::Extension` is used to thread the controller into admin handlers so existing store-backed routes don't need to change their signatures. The `lean_is_aggregator` metric is updated from the actor's tick read (with a cached `last_seen_aggregator`) rather than from the HTTP handler, so the gauge reflects what the actor actually acted on. ## Scope limitations (intentional, matching spec) | Scope | Status | |---|---| | Persistence across restart | ❌ in-process only | | Gossip subnet re-subscription | ❌ frozen at startup | | ENR advertisement | ❌ no ENR lifecycle | | Authentication | ❌ consistent with rest of HTTP API | **Operational model**: standby aggregators must boot with `--is-aggregator=true` so gossip subscriptions are in place, then use the admin endpoint to rotate duties (hot-standby model). A node booted with `--is-aggregator=false` and toggled on later will have no extra subnets to aggregate. This is documented in the CLI flag's docstring and the `CLAUDE.md` gotcha section.
Summary
Adds a new admin endpoint
/lean/v0/admin/aggregatorthat allows operators to activate or deactivate a node's aggregator role at runtime. This enables rotating aggregation duties across nodes without restarting when an active aggregator becomes unhealthy / unrecoverable.Endpoints
GET /lean/v0/admin/aggregator— returns{"is_aggregator": bool}reflecting the current role.POST /lean/v0/admin/aggregatorwith body{"enabled": bool}— toggles the role and returns{"is_aggregator": <new>, "previous": <old>}.Both endpoints return
503 Service Unavailablewhen no controller is wired (e.g. in tests that stand up a bareApiServer). The POST handler returns400 Bad Requestfor missing body, missingenabledfield, or non-boolean values.Implementation
AggregatorController(insrc/lean_spec/subspecs/api/aggregator_controller.py) holds references toSyncServiceandNetworkServiceand serializes toggles under anasyncio.Lock. Both flags are flipped together so observers see a consistent view.routes.pygains anADMIN_ROUTESlist for non-GET verbs.server.pyregisters both GET and admin routes viaweb.route(method, path, handler).ApiServergrows an optionalaggregator_controllerfield, mirroring the existingstore_getterpattern.Node.from_genesisbuilds the controller fromsync_serviceandnetwork_serviceand passes it to the API server when the API is enabled.NodeConfig.is_aggregatorupdated to remove the aspirational claim that the ENR advertises aggregator capability (no code path writes that key today) and to document the new runtime-toggle seeding semantics.Why this works at runtime
Both
SyncService.on_gossip_attestation/on_gossip_aggregated_attestationandStore.tick_intervalread theis_aggregatorflag fresh on every event/tick, so toggling takes effect from the next tick — no restart, no gossip re-subscribe, no handshake churn.Scope and known limitations
This PR deliberately keeps the change small. It is explicitly not:
--is-aggregatorflag seeds the initial value on each start. Runtime toggles are in-process only.run_node()based on the CLI flag. Toggling on at runtime only imports gossip for subnets the node is already subscribed to.--is-aggregator(so subscriptions are in place) and be left OFF via this API until needed — hot-standby model.DiscoveryServiceexists as a library but is not wired into the runtime node path.ENR.is_aggregatoris currently read-only — no writer exists. Even if re-publication were added, no peer-side logic insrc/lean_speccurrently reads a remote peer'sis_aggregatorENR bit to make decisions (peer selection, gossip routing, req/resp policy), so advertisement is decorative until a consumer is added. Tracked as a follow-up.Test plan
uv run pytest --no-cov— 3240 passeduvx tox -e all-checks— ruff + ty + codespell + mdformat cleantests/lean_spec/subspecs/api/test_aggregator_controller.pycover read, activate, deactivate, idempotence, and concurrent toggles.tests/lean_spec/subspecs/api/test_server.py(TestAggregatorAdminEndpoint) exercise 503, GET status, POST activate/deactivate, bad-request paths, and concurrent POSTs.tests/api/endpoints/test_aggregator.pyverify baseline 503 behavior and 405 for unsupported verbs against the default conformance server.Open questions for reviewers
/lean/v0/admin/(alternatives:/lean/v0/validator/aggregator,/lean/v0/node/aggregator).aggregate_subnet_idsshould also be mutable at runtime (requires gossip subscribe/unsubscribe plumbing — scoped out here).ENR.mutate/re-sign helper, handshake refresh hook, and a peer-side consumer of theis_aggregatorbit) so advertisement would actually influence peer behavior.