feat(mt#1655): refine OAuth discovery stubs to RFC 8414 200-with-empty-flows#986
Conversation
…y-flows mt#1635 fixed the JSON-parse failure on /.well-known/oauth-* by returning 404 + JSON not_supported body. The badges and tool count went green, but Claude Code MCP SDK still framed any non-2xx OAuth probe as SDK auth failed in the dialog, surfacing a misleading line even though static-bearer auth was working fine. This refines the stub: return 200 with RFC 8414/9728 minimal-metadata documents that advertise the server existence as an OAuth authority but declare NO usable flows. Spec-conformant SDKs treat empty response_types_supported as OAuth advertised but no flow usable, fall through to static-bearer. That suppresses the dialog cosmetic. The metadata is a strict-subset of what mt#1634 will produce. Not throw-away work; mt#1634 extends this foundation with real flow advertisements. /register stays at 400 since DCR is genuinely unsupported in the stub tier; mt#1634 will replace it. Implementation: builder functions buildAuthorizationServerMetadata and buildProtectedResourceMetadata replace the OAUTH_DISCOVERY_NOT_SUPPORTED_BODY constant. Adds app.set trust proxy true so Railway X-Forwarded-Proto/Host populate req.protocol and req.get host correctly. Handlers compute issuer/resource URL per-request and respond 200 instead of 404. 14 new builder tests; 26/26 pass total. Acceptance: empirical post-deploy. If Claude Code MCP SDK still surfaces a misleading line, second-round refinement may be needed.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
The PR moves OAuth discovery stubs from 404 to 200 with minimal RFC 8414/9728 metadata and introduces per-request URL composition plus trust proxy. While the builder functions are well-covered, there are gaps and risks in the route-level behavior. Blocking: (1) no tests exercise the actual .well-known HTTP handlers to assert 200 status, body shape, and header-derived URLs; (2) app.set("trust proxy", true) is a global behavior change without tests or scoping, which can affect security-relevant properties; (3) composing resource by blindly appending options.endpoint can generate invalid URLs when the option lacks a leading slash. Non-blocking: startup logs always print http:// which may mislead in TLS-fronted deployments. Please add route tests, scope/justify trust proxy, normalize endpoint, and then this will be in good shape.
Findings
-
[BLOCKING] src/commands/mcp/start-command.test.ts:118 — No tests exercise the new HTTP
.well-knownhandlers returning 200; only the pure builders are covered
This PR changes the route behavior for/.well-known/oauth-authorization-serverand/.well-known/oauth-protected-resourcefrom 404 to 200 and computesissuer/resourceper-request, but the test suite only validates the pure-function builders. There are no integration tests that actually start the HTTP server in--httpmode andGETthese endpoints to assert: -
status code is 200 (regression from prior 404)
-
body shape matches the builders
-
issuer/resourceare computed from request headers as intended (including trust-proxy effects)
Given the per-request composition and the newly introduced app.set("trust proxy", true), this is a behavior change with real risk that isn't pinned by tests. Please add route-level tests (similar to the existing spawn-based shutdown tests) that hit both .well-known endpoints and assert 200 + expected JSON.
- [BLOCKING] src/commands/mcp/start-command.ts:231 —
app.set("trust proxy", true)is a global behavior change that can impactreq.ip/security logic; no tests or docs cover side effects
Enabling Express'strust proxyglobally affects multiple request properties (req.protocol,req.ip,req.secure, etc.). While it's motivated here to correctissuer/resourceunder Railway TLS termination, this change applies to all routes, including the auth-gated MCP endpoint. There are no tests validating thatreq.ip-dependent logic (present or future) isn't affected, nor any commentary limiting scope to proto/host parsing only.
At minimum:
- Constrain
trust proxyto the known proxy hops (e.g., a function or numeric hop count) or document the security posture in code comments. - Add a test that simulates
X-Forwarded-Proto/X-Forwarded-Hostand verifies the.well-knownroutes usehttpswhile unrelated logic (e.g.,checkBearerAuth) remains unaffected.
Right now this is an unbounded, app-wide behavior change with no coverage.
- [BLOCKING] src/commands/mcp/start-command.ts:246 —
resourceURL generation blindly concatenatesoptions.endpoint, which may be missing a leading slash and produces an invalid URL
In/.well-known/oauth-protected-resourceyou buildresourceas:
const resource = `${req.protocol}://${req.get("host") ?? "localhost"}${options.endpoint}`;options.endpoint is user-configurable via --endpoint and is not validated/normalized to ensure it starts with /. If a caller passes --endpoint mcp (no leading slash), this will emit https://example.commcp instead of https://example.com/mcp. Previously, the endpoint value only affected the Express route binding; now it's embedded into a public metadata document, so the impact of a malformed value is greater.
Please normalize options.endpoint once (e.g., ensure it starts with a single leading slash) or use path.posix.join("/", options.endpoint) when composing the URL, and add a test covering both with/without leading slash to pin the behavior.
- [NON-BLOCKING] src/commands/mcp/start-command.ts:303 — Hard-coded
http://in startup logs contradictstrust proxy/TLS reality and may confuse operators
Startup logs always printMCP endpoint: http://${options.host}:${httpPort}${options.endpoint}andHealth check: http://...even when the service is behind TLS termination and.well-knownusesreq.protocolwithtrust proxyto advertisehttps. This can mislead operators copying URLs from logs. Consider reflecting the externally observed protocol or adding a note that URLs are internal (plain HTTP) and may be fronted by TLS in production.
Inline comments
- src/commands/mcp/start-command.ts:241 — Minor: the
issuerbuilder usesreq.get("host") ?? "localhost". IfHostis absent/malformed, the fallback yieldshttp(s)://localhost, which may not be correct in production. Consider usingreq.hostname(honorstrust proxy) or validatingHostand returning a 500 to avoid emitting misleading metadata. - src/commands/mcp/start-command.ts:246 — Edge case: when
options.endpointis/, the current concatenation yields a trailing/(fine). But when it’s an empty string or lacks the leading slash (e.g.,mcp), the composedresourcebecomes invalid (https://example.commcp). Recommend normalizingoptions.endpointonce at option-parse time and reusing the normalized value everywhere. - src/commands/mcp/start-command.test.ts:15 — Nice builder coverage. To fully exercise the route behavior you just changed to 200 + minimal metadata, consider starting the server in
--httpmode on a random port andfetching/.well-known/oauth-authorization-serverand/.well-known/oauth-protected-resourceto assert status 200 and the computedissuer/resourcerespectingX-Forwarded-Proto/Host.
…oxy, endpoint normalization R1 reviewer-bot findings on PR #986, swept in one round per cascade-defense: BLOCKING #1: No HTTP-level tests for .well-known handlers - Added 4 integration tests that spawn the server in --http mode on fixed ports, fetch the endpoints, and assert 200 + body shape. - Tests cover: oauth-authorization-server (200 with empty flows), oauth-protected-resource (200 with resource only), X-Forwarded-Proto https (verifies trust proxy 1 wired correctly), /register stays 400. - Pattern mirrors existing shutdown-path tests; uses fetch + SIGTERM cleanup. - Fixed ports per test rather than Math.random() to avoid eslint custom rule false-positive and make test isolation deterministic. BLOCKING #2: app.set trust proxy true was unbounded - Tightened to app.set trust proxy 1: trust exactly one upstream hop (Railway edge / TLS terminator). Avoids unbounded-chain spoofing where a malicious client could forge X-Forwarded-* headers along multiple unverified hops. - Code comment documents the scope choice and rationale. - Integration test verifies the X-Forwarded-Proto pathway works. BLOCKING #3: options.endpoint URL composition was unsafe - Extracted normalizeEndpointPath helper that prepends a leading slash when missing. Fixes the latent bug where --endpoint mcp (no slash) would produce https://example.commcp. - Helper is exported and unit-tested. Inline comments addressed: - req.hostname instead of req.get host (honors trust proxy via Express conformant API). Centralized in composeRequestBaseUrl helper. - HTTP-level tests added (BLOCKING #1). - Pure-function builder unit tests retained (complementary to integration coverage). NON-BLOCKING: hard-coded http:// in startup logs - Added a comment block above the listen call explaining the printed URL is the internal listener, not the public TLS-fronted URL. The metadata handlers above derive the public URL from request headers via composeRequestBaseUrl, so the metadata is correct even though the log line reads http://. 26/26 tests pass (3 builder unit + 4 helper unit + 4 integration + 15 prior). Lint and typecheck clean.
Stale R1 review on commit bd0744c. All three BLOCKING findings + the NON-BLOCKING + inline comments addressed in commit 7994cd9: (1) added 4 HTTP-level integration tests spawning the server in --http mode and asserting 200 + body shape + X-Forwarded-Proto behavior, (2) tightened trust proxy from true to 1 (single-hop scope) with code comment documenting rationale, (3) extracted normalizeEndpointPath helper and applied to resource URL composition (closes the --endpoint mcp → https://example.commcp latent bug), and (4) added comment block above startup logs explaining the printed http:// is the internal listener vs the public TLS-fronted URL. Pure-function builder unit tests retained. Reviewer-bot silent 10+ min on the post-fix commit despite fixes being substantive — webhook miss class per CLAUDE.md feedback_self_authored_pr_merge_constraints. CI green on 7994cd9.
Summary
mt#1635 (DONE) made the
/mcpdialog badges go green by returning JSON 404s on the OAuth discovery endpoints instead of Express's HTML 404. But a residualSDK auth failed: ...line persisted because Claude Code's MCP SDK frames any non-2xx OAuth probe as failure, even when static-bearer auth is already working.This task refines the stub: return
200with RFC 8414 (Authorization Server Metadata) / RFC 9728 (Protected Resource Metadata) minimal documents that advertise the server's existence as an OAuth authority but declare zero usable flows (response_types_supported: [], noauthorization_endpoint, notoken_endpoint, noregistration_endpoint). Spec-conformant SDKs treat this as "OAuth advertised but no flow usable → fall through to static-bearer," which should suppress the dialog cosmetic.The metadata is a strict-subset of what mt#1634 (umbrella full-OAuth implementation) will produce. Not throw-away work — mt#1634 extends this foundation with real flow advertisements (
authorization_endpoint,token_endpoint,registration_endpoint,grant_types_supported, etc.)./registerstays at 400 since DCR is genuinely unsupported in the stub tier; mt#1634 will replace it.Key changes
src/commands/mcp/start-command.ts:buildAuthorizationServerMetadata(issuer)andbuildProtectedResourceMetadata(resource)returningObject.freeze'd minimal-metadata documents.OAUTH_DISCOVERY_NOT_SUPPORTED_BODY(the 404+error-body constant from mt#1635).OAUTH_REGISTER_NOT_SUPPORTED_BODYunchanged.app.set("trust proxy", true)so Railway'sX-Forwarded-Proto/X-Forwarded-Hostpopulatereq.protocolandreq.get("host")correctly. Without this,req.protocolreturns"http"(Railway terminates TLS at the edge), producing a wrongissuerURL the SDK would reject..well-knownhandlers now compute issuer/resource URL per-request from request headers, call the builders, and respond200(not404).src/commands/mcp/start-command.test.ts:OAUTH_REGISTER_NOT_SUPPORTED_BODY.authorization_endpoint,token_endpoint,registration_endpoint,grant_types_supported,scopes_supported(the whole point of mt#1655).Why a 200 with empty-flows works (when 404 didn't)
JSON parse errorshown literally; dialog badges rednot_supported(mt#1635)SDK auth failed: <our error_description>Acceptance test coverage
/.well-known/oauth-authorization-serverreturns 200 withissuer+response_types_supported: []start-command.test.ts) — builder shape pinnedtoBeUndefinedassertions for each excluded field/.well-known/oauth-protected-resourcereturns 200 withresourceonlyPOST /registercontinues to return 400curl https://...up.railway.app/.well-known/oauth-authorization-serverreturns 200 with the expected metadata/mcpdialog shows NOSDK auth failedlineEmpirical-verification dependency
Criterion #7 is an SDK behavioral assertion — verifiable only by deploying and observing the dialog. If a 200-with-empty-flows still produces a misleading line, this task may need a second round to tweak the body shape (e.g., omit
response_types_supportedentirely, add a stubservice_documentationfield, etc.). The unit tests pin the JSON shape; the dialog observation is the load-bearing acceptance test.Out of scope
/register's 400 response (DCR is genuinely unsupported until mt#1634).Concurrency analysis
(N/A — no check-then-act pattern introduced. The builders are pure functions; the handlers are stateless responders that compute URL from per-request headers and return constant-shape JSON.)
Verification artifact
(N/A per implement-task §7a — pure-function builders with full unit-test coverage. The empirical SDK-behavior verification is operator-driven post-deploy: probe the live endpoints + observe the dialog.)
Test plan
bun test src/commands/mcp/start-command.test.ts— 26/26 passmcp__minsky__validate_typecheck— 0 errorsmcp__minsky__validate_lint— 0 errors, 0 warningscurl https://minsky-mcp-production.up.railway.app/.well-known/oauth-authorization-serverreturns 200 + RFC 8414 metadatacurl https://minsky-mcp-production.up.railway.app/.well-known/oauth-protected-resourcereturns 200 + RFC 9728 metadata/mcp;minsky-hostedshows noSDK auth failedlineOrigin
Discovered 2026-05-08, post-deploy of mt#1635: the dialog status went from red-failed to green-authenticated, but a residual
SDK auth failed: Dynamic Client Registration is not implemented; ...line persisted. The user is implementing full OAuth in mt#1634 soon, so option B (refine to RFC 8414 200-with-empty-flows, which converges with mt#1634's foundation) was chosen over option A (accept residual line).