docs: translator series final state (PR 16/16)#724
Open
QuentinBisson wants to merge 8 commits into
Open
Conversation
Documents the muster-in-front-of-agentgateway intermediate topology
and ships the BDD scenarios that prove the regression + feature
surface the pivot preserves.
- docs/operations/installation.md: filesystem-mode subprocess auto-spawn
(binary resolver precedence, combined agentgateway.yaml location,
listener defaults), cluster-mode CRD emission (AgentgatewayBackend +
HTTPRoute + AgentgatewayPolicy per MCPServer with OwnerReference
cascade), spec.suspended pause/resume across both modes, and the
deprecation note on core_service_list / core_service_status.
- helm/muster/templates/rbac.yaml: grant the muster ServiceAccount full
CRUD on agentgateway.dev/v1alpha1 AgentgatewayBackend +
AgentgatewayPolicy and gateway.networking.k8s.io/v1 HTTPRoute so the
cluster-mode reconciler can emit the agentgateway config stack.
- internal/testing/scenarios:
- mcpserver-required-audiences-config.yaml (regression):
spec.auth.requiredAudiences round-trips through create + get +
update, including the multi-value case.
- mcpserver-token-exchange-config.yaml (regression): RFC 8693
standalone tokenExchange (non-Teleport) CRUD + the
mutually-exclusive-with-authorizationServer rejection.
- mcpserver-authorization-server-override.yaml (extended): the #599
override survives core_mcpserver_update.
- mcpserver-reconnect.yaml (feature): core_mcpserver_reconnect
force-reconnect with tool-call survival before and after.
- agentgateway-subprocess-stdio-target.yaml (feature): filesystem
stdio MCPServer spawned by agentgateway via mcp.targets[].stdio
and proxied through the subprocess data plane.
…-core-service' into docs/translator-agw-final-state
28b94e8 to
41d62f5
Compare
…-core-service' into docs/translator-agw-final-state
docs/operations/installation.md described the BDD port-override and
SIGTERM drain in terms of internal Go symbols (aggregator.AgentgatewayPort,
yamlapply.WithListenerPort). aggregator.AgentgatewayPort isn't even a
real config field — the BDD harness sets an internal AggregatorConfig
struct field, not anything user-visible. Replace the prose with a
user-facing description: muster picks an unused loopback port at
startup so parallel instances coexist; the SIGTERM drain waits up to
ten seconds.
The deprecation paragraph below it still framed core_service_start /
core_service_stop / core_service_restart as removed-but-noted; lead
with what users should do (spec.suspended via core_mcpserver_update,
plus core_mcpserver_reconnect for the force-reconnect verb).
mcpserver-reconnect.yaml:
- Use core_mcpserver_get (canonical) instead of core_service_status
(deprecated alias). State assertions move to the capitalised CRD
values ("Connected"/"Disconnected").
- reconnect-non-existent-fails asserted on "Failed to reconnect", an
English error-prefix string carried by HandleErrorWithPrefix. A
refactor of that helper would break the scenario without breaking
any unit test. Assert on the input name ("no-such-server") instead,
which the underlying NotFoundError formats into its message.
mcpserver-token-exchange-config.yaml and
mcpserver-required-audiences-config.yaml: tighten scope claims in the
description and header comment so it's clear these scenarios lock the
schema field round-trip, with on-wire RFC 8693 / cross-cluster SSO
covered by oauth-sso-token-exchange-basic.yaml against the mock Dex +
mock backend topology. requiredAudiences' aggregation helper is
unit-tested in internal/api/handlers_mcpserver_test.go::TestCollectRequiredAudiences.
…s/translator-agw-final-state
…s/translator-agw-final-state
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Step 16/16 of the agentgateway adapter series. Plan:
/home/quentin/.claude/plans/declarative-forging-lynx.md.Summary
Ships the documented end-state of the muster-in-front-of-agentgateway intermediate topology and the BDD scenarios that prove the regression + feature surface the pivot preserves. Two artefacts in one PR — docs and scenarios are both forms of executable specification; landing them together keeps docs from drifting from runtime reality.
Stacked on #723 (PR 15/16). PRs 11–15 must merge first; this PR's diff base shifts to
mainafter.Notes on scope drift vs the plan
agentgateway.dev/v1alpha1andgateway.networking.k8s.io/v1ships here. It arguably belongs with PR 3 (feat(agentgateway): PR 3/16 — K8s emitter (cluster mode) #692, cluster-mode k8s.Applier) or PR 11 (feat(aggregator): PR 11+15a — upstream-proxy rewrite + spec.suspended landing early #720, cluster-mode wiring) — without these grants the muster ServiceAccount 403s on every emit. Landing it in the docs+test PR means cluster-mode was effectively un-deployable across the prior stack. Bundling it here was the pragmatic choice given how late the gap was found; future series should land RBAC with the first PR that emits the resources.agentgateway-subprocess-data-plane,mcpserver-mixed-transports,mcpserver-suspended-resumed,session-multi-user-tool-isolation, etc.) already describe what they exercise under muster-in-front. No renames were required.Documentation
docs/operations/installation.md— newData plane: muster + agentgatewaysection withFilesystem mode,Cluster mode — one MCPServer CRD suffices, andPause / resume an MCPServersubsections. Covers binary resolver precedence (MUSTER_AGW_BINARYenv →~/.config/muster/bincache → pinned GitHub release with SHA-256), the combined<configPath>/agentgateway/agentgateway.yaml, listener defaults (musteron8080),MUSTER_AGW_UPSTREAM_URLand the Helm valuemuster.agentgateway.upstreamURL, the per-MCPServer emitted resources table, theNotSupportedInClusterstdio condition, and three equivalent ways to flipspec.suspended(kubectl patch, YAML edit,core_mcpserver_update). Closes with the deprecation note oncore_service_list/core_service_statusand points readers atcore_mcpserver_list/_get/_reconnect.docs/runbooks/deploy-muster-with-agentgateway.md—docs/runbooks/does not exist on this branch (prompt scoped the runbook as conditional).Helm RBAC
helm/muster/templates/rbac.yaml— grants the muster ServiceAccount full CRUD onagentgateway.dev/v1alpha1AgentgatewayBackend+AgentgatewayPolicyandgateway.networking.k8s.io/v1HTTPRoute. Without this the cluster-mode k8s.Applier would 403 on every emit. Also drops the staleteleport identitymention from the secrets-rule comment (teleport was removed in Remove Teleport authentication support #687).BDD scenarios (
internal/testing/scenarios/)Regression
mcpserver-required-audiences-config.yaml(new)spec.auth.requiredAudiencesround-trips through create + get + update; multi-value preservedmcpserver-token-exchange-config.yaml(new)mcpserver-authorization-server-override.yaml(extended)core_mcpserver_updatesession-multi-user-tool-isolation.yaml(existing)mcpserver-suspended-resumed.yaml(from #723)spec.suspendedpause / resume cycleFeature
agentgateway-subprocess-data-plane.yaml(existing)agentgateway-subprocess-stdio-target.yaml(new)mcp.targets[].stdiomcpserver-auth-required-through-agentgateway.yaml(existing)WWW-Authenticate+auth_requiredstate)mcpserver-reconnect.yaml(new)core_mcpserver_reconnectderegister + re-register round trip; tool calls survivemcpserver-mixed-transports.yaml(existing)spec.suspendedtoggleEnd-to-end
mcpserver-crud.yaml+mcpserver-delete.yaml+mcpserver-lifecycle.yaml(existing)internal/testing/integration/agw_subprocess_integration_test.go(existing)internal/agentgateway/subprocess/manager_test.go: TestManager_Restart_OnCrash_WithBackoff(existing)internal/agentgateway/binary/resolver_test.go(existing)internal/reconciler/agentgateway/k8s/applier_test.go(existing)Out of scope
spec.familyregression scenarios — the field lands onmainvia Group equivalent MCP servers under spec.family (supersedes #543) #670 + fix(family): deep-copy nested schemas, stable tools/list ordering, instanceArg collision fallback #705 but this stack forked from main pre-Group equivalent MCP servers under spec.family (supersedes #543) #670; the family scenarios onmainre-appear when this stack rebases.NotSupportedInClustercondition and per-MCPServer CRD emission — the BDD harness runs filesystem mode only (no kind / glean bootstrap). Coverage at the unit-test layer is already comprehensive viainternal/reconciler/agentgateway/k8s/./procprimitive; integration test ininternal/testing/integration/covers it.