feat(070): registry easy upstream-add — keystone core op + CLI MVP (US2)#555
Conversation
… spec Close the search-to-add loop + reach surface parity across Web UI, MCP, CLI (all share the unified AddUpstreamServer path, quarantine-by-default). Real gaps from research: CLI has no registry/search/add commands; Web UI Repositories searches but can't one-click-add; registry list hardcoded/rebuild-only; key-less registries error. Adds unified add-from-registry-result, CLI parity, Web UI loop, MCP convenience, config-driven registry list + cache refresh, and a cross-surface consistency regression. Roadmap PILLAR A. Paperclip goal da399902.
…easy upstream-add Phase 0/1/2 speckit design artifacts for spec 070. Research found the search->add loop is already partially closed (Web UI Add button, CLI search/list) but via three divergent normalizations; keystone is one backend AddServerFromRegistry core op all surfaces call, guarded by a cross-surface consistency regression (CN-004/FR-010). No implementation yet — submitted for the per-spec design gate (Gate 2). Related #746
Single backend op that turns a registry reference (registryId + serverId +
overrides) into a validated, quarantined upstream server. Every surface
(REST/MCP/CLI) will funnel through this one normalization (CN-001); identical
input yields an identical persisted config (CN-004). The client sends a
reference, never a config blob — the server re-derives it (security D1), so a
buggy/compromised client cannot smuggle command/args or disable quarantine.
- registries.ServerEntry gains RequiredInputs ([]RequiredInput)
- registries.FindServerByID + sentinel errors (registry/server not found)
- registries.DetectRequiredInputs: explicit + ${VAR}/$VAR heuristic scan
- server.AddServerFromRegistry orchestrator + pure buildServerConfigFromEntry
- quarantine-by-default (CN-002, never client-overridable); refusals on
no_install_info / missing_required_input / duplicate_name
- RegistryAddErrorCode maps errors to stable cross-surface codes
Phase 1+2 of MCP-746. Tests: go test ./internal/server/ -run TestAddFromRegistry
-race (green); registries input/lookup unit tests; build + linter clean.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Repoint the registry "Add to MCP" button at the backend keystone instead of parsing install_cmd client-side (CN-001 / issue #483 class of bug): - T015: new api.addServerFromRegistry(registryId, serverId, {name,env,enabled}) POSTs to /api/v1/registries/{id}/servers/{serverId}/add and surfaces the structured cross-surface error (code + missing_inputs). Removes the old addServerFromRepository install_cmd.split path. - T016: required-input prompt dialog in Repositories.vue — opens on a missing_required_input response, renders one field per missing input (rich decl from search required_inputs[] when present, else bare names), blocks Add until all are filled, resubmits as env. data-test hooks added across the search → add flow. - T017: e2e/playwright/registry-add.spec.ts (search→Add→quarantined + prompt path); pending the live REST route to run green. - types: RequiredInput + RepositoryServer.required_inputs[]. Verified: 5 vitest unit tests green, vue-tsc clean, vite build green. Co-Authored-By: Paperclip <noreply@paperclip.ing>
Phase 3 of spec 070 — close the discover→add loop on the CLI, the genuine
net-new gap. After this a user can:
mcpproxy registry list → registry search <q> -r <reg> → registry add <reg> <id>
and the server appears quarantined in `mcpproxy upstream list`.
- cmd/mcpproxy/registry_cmd.go: `registry list|search|add` Cobra group.
list/search are daemon-first with an in-process fallback; add requires the
daemon because the keystone op (config derivation + quarantine) is
server-side (CN-001 / decision D1). Legacy `search-servers` retained.
- internal/cliclient: ListRegistries, SearchRegistry, AddFromRegistry +
client-side RegistryAddError carrying the stable code + missing inputs.
- internal/httpapi: POST /api/v1/registries/{id}/servers/{serverId}/add wired
to the keystone via a new AddServerFromRegistryRef controller method;
code→HTTP status mapping (404 not-found, 400 derivation/dup/missing-input).
- internal/server: AddServerFromRegistryRef adapter projecting failures onto
the cross-surface contracts.RegistryAddError (names missing --env keys).
- internal/contracts: AddFromRegistryRequest/Data/AddedServerSummary +
RegistryAddError.
Tests: cliclient httptest units; server adapter units; CLI e2e against a
running daemon + mock registry (list→search→add→quarantined) and the
missing-required-input actionable-error path.
Note: the REST add route was pulled forward from T014 because the CLI is a
thin REST client and the e2e ("add via running daemon") cannot pass without a
server endpoint; the frontend wiring (T015/T016) remains for US1.
Related #746
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Deploying mcpproxy-docs with
|
| Latest commit: |
d24c577
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://47dd4952.mcpproxy-docs.pages.dev |
| Branch Preview URL: | https://070-registry-easy-upstream-a.mcpproxy-docs.pages.dev |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…est registry T017 follow-up after running the e2e green against a live server: - nav to /ui/repositories (history-mode base /ui/), not a hash route - default the no-input happy path to docker-mcp-catalog (reliably addable) - target the required-input server (default fleur/stripe) independently of the no-input registry, so both tests run from one invocation Verified: both tests pass — search→Add→quarantined and search→Add→prompt→quarantined — against a fresh throwaway core. Co-Authored-By: Paperclip <noreply@paperclip.ing>
Adds operation=add_from_registry to the upstream_servers MCP tool so agents can add an upstream from a registry reference (registry + id) without hand-constructing command/args/url. Dispatches to the keystone AddServerFromRegistryRef op: the server re-derives the runnable config from the registry entry and persists it quarantined (CN-001 / CN-002 / security decision D1). New params: registry, id (env_json/name/enabled reuse the existing fields). Success returns the slim AddedServerSummary; failure returns a structured error (isError=true) carrying the same stable Code as REST/CLI plus missing_inputs for missing_required_input (FR-003), so the same failure reads identically across every surface. T012/T013 of spec 070 Phase 4. Tests: internal/server/mcp_add_from_registry_test.go (happy path + missing_required_input structured error).
The prior commit added only the upstream_servers enum/param surface; the dispatch case and handler were missing and the test referenced undefined helpers, so the package did not compile. This wires it up end-to-end: - handleAddServerFromRegistry dispatches to the keystone mainServer.AddServerFromRegistryRef; success -> AddedServerSummary, failure -> structured error (code + missing_inputs, isError=true) matching the REST/CLI surfaces (CN-001). - add_from_registry added to the dispatch switch and the agent-token block list (write-op parity). - Tests use a real runtime-backed mainServer plus an httptest registry (SetRegistriesFromConfig): happy path asserts a quarantined stdio config, missing-input path asserts the structured missing_required_input error with the offending input names. Verified: go test ./internal/server (incl. the two new tests), go build ./..., golangci-lint run ./internal/server/... all pass.
📦 Build ArtifactsWorkflow Run: View Run Available Artifacts
How to DownloadOption 1: GitHub Web UI (easiest)
Option 2: GitHub CLI gh run download 26742088591 --repo smart-mcp-proxy/mcpproxy-go
|
…-006) SetRegistriesFromConfig previously replaced the registry list wholesale, so a user who added one custom registry in their config dropped all built-in defaults. Change it to merge defaults union config keyed by ID: defaults first, custom appended, colliding IDs override in place. Extract the five built-in registries into config.DefaultRegistries() as the single source of truth shared by DefaultConfig() and the merge. Add RequiresKey to both RegistryEntry types (plumbing for FR-008). Related #765
A registry marked RequiresKey is skipped via ErrRegistryKeyMissing when no API key is configured (env MCPPROXY_REGISTRY_<ID>_API_KEY), instead of running a doomed fetch. SearchServers short-circuits before the network call; FindServerByID inherits the behaviour. Calling surfaces map the sentinel to an 'unavailable' marker so the overall search still succeeds. Related #765
…ger (US4 FR-007) Primitives for registry cache freshness: Invalidate drops a key, Refresh aliases it (lazy re-fetch), InvalidatePrefix drops all keys under a stable prefix (one refresh clears every tag/query variant of a registry), and Peek returns a record without evicting it even when expired so callers can serve stale data while flagging its age. All stats-accounted. Related #765
…bsent (US4 FR-007/FR-008)
Wire the cache primitives into the registry search path: SearchRegistryServers
now caches each (registry,tag,query,limit) result and serves it while flagging
age/staleness via contracts.RegistryCacheInfo. Add RefreshRegistryCache +
POST /api/v1/registries/{id}/refresh to drop a registry's cached lists on
demand. A key-requiring registry with no key configured surfaces as an
'unavailable' marker (HTTP 200) on both the REST search endpoint and the MCP
search_servers tool instead of failing the surface (FR-008). Regenerate the
OpenAPI spec for the new endpoint and response fields.
Related #765
Related #765
AddServer mutated the live immutable config snapshot returned by runtime.Config() in place (currentConfig.Servers = append(...)) before publishing via UpdateConfig. Background goroutines that range cfg.Servers concurrently — LoadConfiguredServers during startup and the 2s-delayed DiscoverAndIndexTools — raced that write, which the -race detector flagged on TestHandleUpstreamServers_AddFromRegistry_HappyPath in CI. Clone the config and its server slice, append to the clone, then publish the new snapshot atomically. Behaviour is unchanged; readers holding the prior snapshot are now isolated, as the immutable-snapshot contract intends. Related #555
DiscoverTools snapshotted client.Config directly while only holding the Manager's m.mu, but the reconcile add path (AddServerConfig) releases m.mu before calling managed.Client.SetConfig, which guards mc.Config with the client's own mc.mu. The two accesses therefore used different mutexes, producing a data race the -race detector flagged intermittently in CI (PR #555, run 26719908592). Fix the read side: read config via the mutex-guarded GetConfig() accessor in DiscoverTools and in the API-facing status readers (GetStats, GetTotalToolCount, ListServers) that share the identical hazard — they read client.Config after releasing m.mu and run concurrently with reconcile. mc.mu is a leaf lock (only Get/SetConfig take it, never while holding m.mu), so this is deadlock-safe. Add TestDiscoverTools_ConfigRace: drives AddServerConfig (SetConfig) against the readers concurrently; trips -race before the fix, green after. Related: MCP-770
The first MCP-770 fix routed Manager-side reads (DiscoverTools, GetStats, GetTotalToolCount, ListServers) through GetConfig(), but the underlying hazard is broader: managed.Client.Config is a pointer field that SetConfig swaps off m.mu (reconcile add path), while it is read concurrently from many sites that do NOT hold the client's mc.mu — including Client.Connect's unlocked phase (client.go:197) and detached state-change callback goroutines (onStateChange, spawned by StateManager.SetError). The CI -race detector flagged the Connect-vs-SetConfig variant on PR #555's macOS unit job after the first fix was cherry-picked. A mutex-guarded accessor can't cover this: most internal readers run while mc.mu is already held, so routing them through an RLock GetConfig() would deadlock (sync.RWMutex is not reentrant). Instead make the field itself lock-free and data-race-free: - Replace the exported `Config *config.ServerConfig` field with an unexported `cfg atomic.Pointer[config.ServerConfig]`. - GetConfig() returns cfg.Load(); SetConfig() does cfg.Store(). Both are lock-free and safe whether or not mc.mu is held. - Route every reader through GetConfig() across the managed client, the upstream Manager, and the supervisor actor pool. Add TestConnect_ConfigRace: drives AddServerConfig (SetConfig) against Client.Connect concurrently; trips -race on the old field, green after. Verified: go test -race ./internal/upstream/... ./internal/runtime/... and the original CI victims TestHandleUpstreamServers_AddFromRegistry_HappyPath / _AdminAllowed_WriteOps pass x5 under -race; build + linter clean. Related: MCP-770
The first MCP-770 fix routed Manager-side reads (DiscoverTools, GetStats, GetTotalToolCount, ListServers) through GetConfig(), but the underlying hazard is broader: managed.Client.Config is a pointer field that SetConfig swaps off m.mu (reconcile add path), while it is read concurrently from many sites that do NOT hold the client's mc.mu — including Client.Connect's unlocked phase (client.go:197) and detached state-change callback goroutines (onStateChange, spawned by StateManager.SetError). The CI -race detector flagged the Connect-vs-SetConfig variant on PR #555's macOS unit job after the first fix was cherry-picked. A mutex-guarded accessor can't cover this: most internal readers run while mc.mu is already held, so routing them through an RLock GetConfig() would deadlock (sync.RWMutex is not reentrant). Instead make the field itself lock-free and data-race-free: - Replace the exported `Config *config.ServerConfig` field with an unexported `cfg atomic.Pointer[config.ServerConfig]`. - GetConfig() returns cfg.Load(); SetConfig() does cfg.Store(). Both are lock-free and safe whether or not mc.mu is held. - Route every reader through GetConfig() across the managed client, the upstream Manager, and the supervisor actor pool. Add TestConnect_ConfigRace: drives AddServerConfig (SetConfig) against Client.Connect concurrently; trips -race on the old field, green after. Verified: go test -race ./internal/upstream/... ./internal/runtime/... and the original CI victims TestHandleUpstreamServers_AddFromRegistry_HappyPath / _AdminAllowed_WriteOps pass x5 under -race; build + linter clean. Related: MCP-770
newAddFromRegistryTestServer opened a real Server (and its BBolt config.db under t.TempDir()) but never closed it. On Windows an open file cannot be unlinked, so t.TempDir's RemoveAll failed the AddFromRegistry tests with "config.db ... being used by another process". Unix unlinks open files, so ubuntu/macOS passed and hid it; the failure only surfaced once the MCP-770 race fix stopped aborting the test binary before cleanup ran. Register t.Cleanup(mainSrv.Shutdown) after t.TempDir() so LIFO cleanup closes the runtime/storage (releasing the handle) before the temp dir is removed. Shutdown is nil-httpServer-safe and idempotent. Related: MCP-770, spec 070
StartStderrMonitoring (from connectStdio during a reconcile-driven Connect) and StopStderrMonitoring (from Disconnect during Manager.ShutdownAll) accessed the monitoring lifecycle fields — stderrMonitoringCtx/Cancel/WaitGroup and the process-monitoring equivalents — with no synchronization. When a connect and a shutdown overlap on the same client, the -race detector flags WG.Add (Start) vs WG.Wait (Stop). CI caught this on PR #555's ubuntu unit job once the earlier MCP-770 fixes (atomic Config) + the Windows temp-dir cleanup made the test's shutdown path actually run concurrently with reconcile. Add a dedicated monitoringMu that makes the four Start*/Stop*Monitoring methods mutually exclusive, so Add never overlaps Wait. The mutex is a leaf (monitor goroutines never acquire it; it is never held across c.mu), so Stop's bounded WG.Wait under the lock cannot deadlock. Add TestStderrMonitoring_StartStopRace: hammers Start/Stop concurrently; trips -race on the unsynchronized fields, green with monitoringMu. Related: MCP-770, spec 070
StartStderrMonitoring (from connectStdio during a reconcile-driven Connect) and StopStderrMonitoring (from Disconnect during Manager.ShutdownAll) accessed the monitoring lifecycle fields — stderrMonitoringCtx/Cancel/WaitGroup and the process-monitoring equivalents — with no synchronization. When a connect and a shutdown overlap on the same client, the -race detector flags WG.Add (Start) vs WG.Wait (Stop). CI caught this on PR #555's ubuntu unit job once the earlier MCP-770 fixes (atomic Config) + the Windows temp-dir cleanup made the test's shutdown path actually run concurrently with reconcile. Add a dedicated monitoringMu that makes the four Start*/Stop*Monitoring methods mutually exclusive, so Add never overlaps Wait. The mutex is a leaf (monitor goroutines never acquire it; it is never held across c.mu), so Stop's bounded WG.Wait under the lock cannot deadlock. Add TestStderrMonitoring_StartStopRace: hammers Start/Stop concurrently; trips -race on the unsynchronized fields, green with monitoringMu. Related: MCP-770, spec 070
The monitoringMu serialization (prev commit) was incomplete: StopStderr/ StopProcessMonitoring ran WG.Wait() in a DETACHED goroutine guarded by a 500ms timeout. When a real child keeps its stderr open, monitorStderr blocks in scanner.Scan(), the wait times out, Stop returns and releases monitoringMu — but the detached WG.Wait goroutine lives on and races the next connect's WG.Add. CI caught this on #555 (DiscoverTools→Connect StartStderrMonitoring WG.Add vs the lingering StopStderrMonitoring WG.Wait). Replace the reused sync.WaitGroup with a per-cycle `done chan struct{}` the monitor goroutine closes, and pass the context + done as locals so an abandoned (timed-out) monitor never reads the shared ctx/state a later Start overwrites. Stop now waits on done directly under monitoringMu (no detached waiter) and nils the per-cycle fields. monitorStderr/monitorProcess take ctx as a parameter. Strengthen TestStderrMonitoring with an AbandonedMonitorNoRace case: a blocking pipe keeps the monitor alive so Stop hits the timeout/abandon path — the exact escape the WaitGroup version raced on. AddFromRegistry passes 12x under -race. Related: MCP-770, spec 070
The monitoringMu serialization (prev commit) was incomplete: StopStderr/ StopProcessMonitoring ran WG.Wait() in a DETACHED goroutine guarded by a 500ms timeout. When a real child keeps its stderr open, monitorStderr blocks in scanner.Scan(), the wait times out, Stop returns and releases monitoringMu — but the detached WG.Wait goroutine lives on and races the next connect's WG.Add. CI caught this on #555 (DiscoverTools→Connect StartStderrMonitoring WG.Add vs the lingering StopStderrMonitoring WG.Wait). Replace the reused sync.WaitGroup with a per-cycle `done chan struct{}` the monitor goroutine closes, and pass the context + done as locals so an abandoned (timed-out) monitor never reads the shared ctx/state a later Start overwrites. Stop now waits on done directly under monitoringMu (no detached waiter) and nils the per-cycle fields. monitorStderr/monitorProcess take ctx as a parameter. Strengthen TestStderrMonitoring with an AbandonedMonitorNoRace case: a blocking pipe keeps the monitor alive so Stop hits the timeout/abandon path — the exact escape the WaitGroup version raced on. AddFromRegistry passes 12x under -race. Related: MCP-770, spec 070
Kimi's dual-AI review of #555 returned REQUEST_CHANGES on the missing docs per ENG-9. Add docs for the new registry surfaces: - docs/features/registry-add.md — cohesive feature guide covering all three surfaces (CLI `registry list/search/add`, REST add/refresh/browse endpoints, MCP `upstream_servers` add_from_registry op), the CN-001 re-derivation security model, quarantine-by-default, and the cross-surface missing_required_input error contract. - docs/api/rest-api.md — new Registries section listing the four endpoints (list, search, add, refresh) with request/response shapes, linking the feature guide. Docs only; no code change. Clears Kimi's REQUEST_CHANGES for the Gate-3 merge. Related: MCP-788, MCP-785, MCP-761, spec 070
DiscoverTools snapshotted client.Config directly while only holding the Manager's m.mu, but the reconcile add path (AddServerConfig) releases m.mu before calling managed.Client.SetConfig, which guards mc.Config with the client's own mc.mu. The two accesses therefore used different mutexes, producing a data race the -race detector flagged intermittently in CI (PR #555, run 26719908592). Fix the read side: read config via the mutex-guarded GetConfig() accessor in DiscoverTools and in the API-facing status readers (GetStats, GetTotalToolCount, ListServers) that share the identical hazard — they read client.Config after releasing m.mu and run concurrently with reconcile. mc.mu is a leaf lock (only Get/SetConfig take it, never while holding m.mu), so this is deadlock-safe. Add TestDiscoverTools_ConfigRace: drives AddServerConfig (SetConfig) against the readers concurrently; trips -race before the fix, green after. Related: MCP-770
The first MCP-770 fix routed Manager-side reads (DiscoverTools, GetStats, GetTotalToolCount, ListServers) through GetConfig(), but the underlying hazard is broader: managed.Client.Config is a pointer field that SetConfig swaps off m.mu (reconcile add path), while it is read concurrently from many sites that do NOT hold the client's mc.mu — including Client.Connect's unlocked phase (client.go:197) and detached state-change callback goroutines (onStateChange, spawned by StateManager.SetError). The CI -race detector flagged the Connect-vs-SetConfig variant on PR #555's macOS unit job after the first fix was cherry-picked. A mutex-guarded accessor can't cover this: most internal readers run while mc.mu is already held, so routing them through an RLock GetConfig() would deadlock (sync.RWMutex is not reentrant). Instead make the field itself lock-free and data-race-free: - Replace the exported `Config *config.ServerConfig` field with an unexported `cfg atomic.Pointer[config.ServerConfig]`. - GetConfig() returns cfg.Load(); SetConfig() does cfg.Store(). Both are lock-free and safe whether or not mc.mu is held. - Route every reader through GetConfig() across the managed client, the upstream Manager, and the supervisor actor pool. Add TestConnect_ConfigRace: drives AddServerConfig (SetConfig) against Client.Connect concurrently; trips -race on the old field, green after. Verified: go test -race ./internal/upstream/... ./internal/runtime/... and the original CI victims TestHandleUpstreamServers_AddFromRegistry_HappyPath / _AdminAllowed_WriteOps pass x5 under -race; build + linter clean. Related: MCP-770
StartStderrMonitoring (from connectStdio during a reconcile-driven Connect) and StopStderrMonitoring (from Disconnect during Manager.ShutdownAll) accessed the monitoring lifecycle fields — stderrMonitoringCtx/Cancel/WaitGroup and the process-monitoring equivalents — with no synchronization. When a connect and a shutdown overlap on the same client, the -race detector flags WG.Add (Start) vs WG.Wait (Stop). CI caught this on PR #555's ubuntu unit job once the earlier MCP-770 fixes (atomic Config) + the Windows temp-dir cleanup made the test's shutdown path actually run concurrently with reconcile. Add a dedicated monitoringMu that makes the four Start*/Stop*Monitoring methods mutually exclusive, so Add never overlaps Wait. The mutex is a leaf (monitor goroutines never acquire it; it is never held across c.mu), so Stop's bounded WG.Wait under the lock cannot deadlock. Add TestStderrMonitoring_StartStopRace: hammers Start/Stop concurrently; trips -race on the unsynchronized fields, green with monitoringMu. Related: MCP-770, spec 070
The monitoringMu serialization (prev commit) was incomplete: StopStderr/ StopProcessMonitoring ran WG.Wait() in a DETACHED goroutine guarded by a 500ms timeout. When a real child keeps its stderr open, monitorStderr blocks in scanner.Scan(), the wait times out, Stop returns and releases monitoringMu — but the detached WG.Wait goroutine lives on and races the next connect's WG.Add. CI caught this on #555 (DiscoverTools→Connect StartStderrMonitoring WG.Add vs the lingering StopStderrMonitoring WG.Wait). Replace the reused sync.WaitGroup with a per-cycle `done chan struct{}` the monitor goroutine closes, and pass the context + done as locals so an abandoned (timed-out) monitor never reads the shared ctx/state a later Start overwrites. Stop now waits on done directly under monitoringMu (no detached waiter) and nils the per-cycle fields. monitorStderr/monitorProcess take ctx as a parameter. Strengthen TestStderrMonitoring with an AbandonedMonitorNoRace case: a blocking pipe keeps the monitor alive so Stop hits the timeout/abandon path — the exact escape the WaitGroup version raced on. AddFromRegistry passes 12x under -race. Related: MCP-770, spec 070
…etConfig (MCP-770) (#556) * fix(upstream): route concurrent Config reads through GetConfig (MCP-770) DiscoverTools snapshotted client.Config directly while only holding the Manager's m.mu, but the reconcile add path (AddServerConfig) releases m.mu before calling managed.Client.SetConfig, which guards mc.Config with the client's own mc.mu. The two accesses therefore used different mutexes, producing a data race the -race detector flagged intermittently in CI (PR #555, run 26719908592). Fix the read side: read config via the mutex-guarded GetConfig() accessor in DiscoverTools and in the API-facing status readers (GetStats, GetTotalToolCount, ListServers) that share the identical hazard — they read client.Config after releasing m.mu and run concurrently with reconcile. mc.mu is a leaf lock (only Get/SetConfig take it, never while holding m.mu), so this is deadlock-safe. Add TestDiscoverTools_ConfigRace: drives AddServerConfig (SetConfig) against the readers concurrently; trips -race before the fix, green after. Related: MCP-770 * fix(upstream): make managed.Client config access atomic (MCP-770) The first MCP-770 fix routed Manager-side reads (DiscoverTools, GetStats, GetTotalToolCount, ListServers) through GetConfig(), but the underlying hazard is broader: managed.Client.Config is a pointer field that SetConfig swaps off m.mu (reconcile add path), while it is read concurrently from many sites that do NOT hold the client's mc.mu — including Client.Connect's unlocked phase (client.go:197) and detached state-change callback goroutines (onStateChange, spawned by StateManager.SetError). The CI -race detector flagged the Connect-vs-SetConfig variant on PR #555's macOS unit job after the first fix was cherry-picked. A mutex-guarded accessor can't cover this: most internal readers run while mc.mu is already held, so routing them through an RLock GetConfig() would deadlock (sync.RWMutex is not reentrant). Instead make the field itself lock-free and data-race-free: - Replace the exported `Config *config.ServerConfig` field with an unexported `cfg atomic.Pointer[config.ServerConfig]`. - GetConfig() returns cfg.Load(); SetConfig() does cfg.Store(). Both are lock-free and safe whether or not mc.mu is held. - Route every reader through GetConfig() across the managed client, the upstream Manager, and the supervisor actor pool. Add TestConnect_ConfigRace: drives AddServerConfig (SetConfig) against Client.Connect concurrently; trips -race on the old field, green after. Verified: go test -race ./internal/upstream/... ./internal/runtime/... and the original CI victims TestHandleUpstreamServers_AddFromRegistry_HappyPath / _AdminAllowed_WriteOps pass x5 under -race; build + linter clean. Related: MCP-770 * fix(core): serialize stderr/process monitoring lifecycle (MCP-770) StartStderrMonitoring (from connectStdio during a reconcile-driven Connect) and StopStderrMonitoring (from Disconnect during Manager.ShutdownAll) accessed the monitoring lifecycle fields — stderrMonitoringCtx/Cancel/WaitGroup and the process-monitoring equivalents — with no synchronization. When a connect and a shutdown overlap on the same client, the -race detector flags WG.Add (Start) vs WG.Wait (Stop). CI caught this on PR #555's ubuntu unit job once the earlier MCP-770 fixes (atomic Config) + the Windows temp-dir cleanup made the test's shutdown path actually run concurrently with reconcile. Add a dedicated monitoringMu that makes the four Start*/Stop*Monitoring methods mutually exclusive, so Add never overlaps Wait. The mutex is a leaf (monitor goroutines never acquire it; it is never held across c.mu), so Stop's bounded WG.Wait under the lock cannot deadlock. Add TestStderrMonitoring_StartStopRace: hammers Start/Stop concurrently; trips -race on the unsynchronized fields, green with monitoringMu. Related: MCP-770, spec 070 * fix(core): per-cycle done channel for monitoring stop (MCP-770) The monitoringMu serialization (prev commit) was incomplete: StopStderr/ StopProcessMonitoring ran WG.Wait() in a DETACHED goroutine guarded by a 500ms timeout. When a real child keeps its stderr open, monitorStderr blocks in scanner.Scan(), the wait times out, Stop returns and releases monitoringMu — but the detached WG.Wait goroutine lives on and races the next connect's WG.Add. CI caught this on #555 (DiscoverTools→Connect StartStderrMonitoring WG.Add vs the lingering StopStderrMonitoring WG.Wait). Replace the reused sync.WaitGroup with a per-cycle `done chan struct{}` the monitor goroutine closes, and pass the context + done as locals so an abandoned (timed-out) monitor never reads the shared ctx/state a later Start overwrites. Stop now waits on done directly under monitoringMu (no detached waiter) and nils the per-cycle fields. monitorStderr/monitorProcess take ctx as a parameter. Strengthen TestStderrMonitoring with an AbandonedMonitorNoRace case: a blocking pipe keeps the monitor alive so Stop hits the timeout/abandon path — the exact escape the WaitGroup version raced on. AddFromRegistry passes 12x under -race. Related: MCP-770, spec 070
* docs(064): Glass Cockpit — transparent & steerable agent cockpit spec Spec, plan, and design artifacts for making the existing Paperclip "MCPProxy" cockpit (spec 045) transparent and steerable: invert the default from "proceed" to "checkpoint at every design-decision boundary" via three human gates (plan-of-attack, per-spec design, pre-merge) mapped to Paperclip native primitives (executionPolicy approval stages, request_confirmation/suggest_tasks interactions, issue_tree_holds), with reasoning visible before each gate and a single "waiting on you" view. Phased rollout: A) config + agent-instruction only (the dry-run target), B) a Paperclip plugin for the fused transparency UI, C) a fork only if A/B fall short. SynapBus is log/wiki only, never on the critical path. Includes rewritten gate-aware agent instructions, consumed-API + executionPolicy + agent-instruction contracts, data model, research, and operator quickstart. Supersedes spec 062's fresh-dev-instance approach; extends spec 045. * docs(064): amend gate model — human-merge → dual-AI auto-merge + human veto Session 2 amendment to FR-005 + US3: replace the mandatory human-merge gate (throughput bottleneck) with draft-PR + dual-AI-review consensus auto-merge. Two reviewers on different model families (Gemini 2.5-pro Critic + Codex), never the implementer; tests-green + both-accept → auto-merge; human is an optional 3rd reviewer with veto (request-changes/hold freezes auto-merge). Prerequisite flagged: a bot GitHub identity (agents currently = author's gh, and GitHub forbids self-approval) — interim fallback is 2-AI-review-as- required-check with the human merging. codex-local Paperclip adapter exists. * docs(064): dual-AI-review auto-merge — reviewer doctrine + engineer draft-PR + setup Adds the Session-2 gate-model deliverables: reviewer/REVIEWER.md (shared RV-1..RV-6 dual-review doctrine), codex-reviewer/AGENTS.md (2nd reviewer on codex-local), amends engineer ENG-4 to open DRAFT PRs + request 2 AI reviewers (no self-merge), and auto-merge-setup.md (GitHub branch-protection config + the bot-identity prerequisite + interim human-merge fallback + open items). * docs(064): reviewers on subscription auth only; Gemini quota-exhausted, Codex gpt-5.5 ready Per user directive: both AI reviewers use paid SUBSCRIPTION logins, not API keys. - Gemini Critic: subscription/OAuth, pin gemini-2.5-pro (3.5/3 UNVERIFIED — quota exhausted on every probe today; switch if confirmed). TWO blockers: quota + empty-prompt adapter bug → cannot accept yet. - Codex reviewer: ChatGPT subscription, gpt-5.5 (codex-cli 0.46.0) — READY now. - Live two-reviewer set today = Codex + human (FR-005f) until Gemini recovers. * docs(064): Critic GEMINI.md — subscription-only, quota-blocked; Kimi+Codex as live pair Gemini settings pin gemini-3.1-pro-preview (subscription/OAuth) but quota is exhausted (no reset hint) + empty-prompt adapter bug → Critic can't accept yet. Live 2-AI reviewer pair = Codex gpt-5.5 + Kimi-K2.5 (opencode_local, Gcore key present); Gemini rejoins as 3rd reviewer when quota returns. * docs(064): stand up Codex+Kimi reviewer agents; codex config fix Live dual-AI reviewer pair created in the running Paperclip cockpit and verified responding (2026-05-31): - CodexReviewer — codex_local / gpt-5-codex (5b94562c-…) - KimiReviewer — opencode_local / Kimi-K2.5 (fdaa1d4c-…) Both carry managed instruction bundles (shared doctrine + RV-1..RV-6 + role notes), report to CEO, idle, heartbeat off (woken by review-stage). Docs: - add canonical kimi-reviewer/AGENTS.md (the design lacked it) - correct codex-reviewer/AGENTS.md model facts (gpt-5.5 -> gpt-5-codex) - auto-merge-setup.md: live pair is Codex+Kimi; Gemini Critic becomes the 3rd reviewer when its subscription quota recovers codex config fix (~/.codex/config.toml, not in repo): model_reasoning_effort xhigh->high and model gpt-5.5->gpt-5-codex. On codex-cli 0.46.0 + ChatGPT subscription auth, gpt-5.5 needs a newer CLI and gpt-5.4/5.3-codex/5.2 are auth-restricted; gpt-5-codex/gpt-5 are the working models. Backup at ~/.codex/config.toml.bak.pre-reviewer-fix.* * docs(064): engineers drive PRs green + bundle docs/ updates (ENG-8/9) engineer/AGENTS.md: - ENG-8: drive every required check to green before review — run local verification before push, watch `gh pr checks --watch`, push fixes until all green; never leave/hand off a red PR, never --no-verify or weaken a check. Green CI is the engineer's job, not the reviewer's. - ENG-9: when a change touches CLI/REST/MCP API/config/defaults/security or anything under docs/, the SAME PR must update docs/ (+ CLAUDE.md/ oas/swagger.yaml/README where mirrored). Docs-only changes exempt from TDD. - ENG-5 reworked to dual-AI merge-readiness (Codex+Kimi accept + all CI green). reviewer/REVIEWER.md RV-3: red/pending check = automatic request_changes; missing docs when the change warrants them = request_changes. Applied to the live Paperclip brains: 3 engineers (Backend/Frontend/MacOS) re-flattened from canonical; Codex+Kimi reviewer brains refreshed. * docs(064): record applied CI-context branch protection on main Phase-1 gate live on main (no bot identity needed): required_status_checks strict=false with 8 always-run, non-path-conditional contexts (Lint, Unit Tests ubuntu, Build ubuntu/macos/windows, Build Frontend, Validate PR title, Verify OpenAPI Artifacts). Existing 1-review + enforce_admins=false kept. Verified: green PR #553 satisfies all 8 (blocked only by review); in-flight PR #555 blocked on pending required checks. Documents the deliberately- excluded checks and the Go-version-pinned context-name fragility. * docs(064): ENG-3 — branch from origin/main, never from a feature branch Forking a work branch from another feature branch drags its unmerged commits into the PR (root cause of spec-064 docs leaking into the MCP-770 race fix #556). ENG-3 now mandates fetch + branch from origin/main explicitly, in both the engineer bundle and the contract.
…defaults in ListRegistries Address the two Codex Gate-3 review findings on PR #555 (spec 070). Finding 1 (security): the add_from_registry MCP op persisted a new upstream without the AllowServerAdd check that the ordinary `add` op enforces, so a user who set allow_server_add=false could still add a server by registry reference. Gate add_from_registry behind the same check. Finding 2 (FR-006 consistency): Runtime.ListRegistries returned a hard-coded legacy Smithery entry for an empty config, or only cfg.Registries when set — diverging from the merged defaults+custom source that search/add use. Route it through registries.SetRegistriesFromConfig so the daemon/API list always shows built-in defaults merged with custom entries. Tests-first for both (ENG-1). Related #555
Re-review verdict: APPROVED ✅ (MCP-803)Both Gate-3 findings from MCP-800 are correctly addressed in commit Finding 1 — SECURITY:
|
…lifecycle The retrieval-d1 job failed: `mcpproxy serve` exited immediately with "data_dir: directory does not exist" (serve refuses to create a missing data_dir), and the server was backgrounded in a separate step from the readiness poll (a process backgrounded in one step is reaped when that step's shell exits). Fix: mkdir -p the data_dir, and boot + readiness-poll + run the scorer in ONE step (shared shell) with a trap that stops the server however the step ends; also fail fast if the server process dies during startup. D2 gate unaffected. Verified: mcpproxy boots and serves /api/v1/status locally with the created data_dir; actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing>
…n + O3 amendment (Related MCP-746) (#559) * test(070): cross-surface registry-add consistency regression (T021, CN-004/SC-004) Guards that all three add surfaces (Web UI, CLI, MCP) route through the single AddServerFromRegistry keystone op with quarantine-by-default preserved. The production code landed via PR #555; this is the Phase-7 regression that was cut after #555's branch point. Related MCP-746 * docs(070): O3 spec amendment — record stale US1/US2 gap premise The US1/US2 gaps recorded 2026-05-31 were partly stale: Web UI already had an 'Add to MCP' path and the CLI already listed/searched registries. Real work was de-duplicating every surface onto the single AddServerFromRegistry keystone op (PR #555 / 354580f). Related MCP-746 * fix(upstream): resolve connectStdio↔monitorStderr data race on c.stderr TestCrossSurfaceConsistency_RegistryAdd is the first unit test to drive the real supervisor connect loop under -race and it surfaced a genuine production data race: connectStdio reassigns the shared c.stderr field on every (re)connect (connection_stdio.go:217) while the monitorStderr goroutine read that same field to build its scanner (monitoring.go:170). A reconnect's write raced a lingering monitor's read. Capture the stderr reader as a local under monitoringMu in StartStderrMonitoring and pass it to monitorStderr as an argument, completing the existing "abandoned goroutine touches only its locals" design — the monitor no longer reads the shared field, so the write can never race it. Add TestStderrMonitoring_ReassignDuringMonitorNoRace, which reproduces the exact write@217/read@170 pair under -race (red before this change, green after). Related #559 * fix(configsvc): guard Subscribe init-send against concurrent Close (send-on-closed) The registry-add unit tests added in this PR drive a full Runtime + configsvc under -race and surfaced a second production data race (distinct from the connectStdio↔monitorStderr fix): Subscribe spawns a goroutine that sends the initial snapshot on the subscriber channel (service.go:206) holding no lock, while Close (service.go:261) and Unsubscribe close that same channel under subMu. The send racing the close both data-races and panics with "send on closed channel". Deliver the initial snapshot under subMu(R) after confirming the channel is still a live subscriber (isSubscribedLocked), so the close paths — which run under the subMu write lock — are mutually exclusive with the send. The send is non-blocking (the cap-10 buffer is empty at subscribe time) so the lock is never held across a blocking send. Preserve the ctx-canceled early-out. Add TestService_SubscribeCloseRace, which reproduces the panic/race under -race (red before this change, green after). Related #559
…h envelope
The retrieval-d1 readiness poll never passed: mcpproxy booted and indexed all 7
servers (45 tools), but the probe parsed the index/search response at the top
level while results are nested under the `{"success":true,"data":{"results":[…]}}`
envelope, so it read 0 every attempt and timed out.
Fix: parse `data.results`. Verified locally — index returns 5 results for q=file
within ~6s of boot. actionlint clean.
Related #555 datasets; MCP-742.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The retrieval scorer was running against a partially-indexed instance: the readiness probe passed at the first indexed tool (>=1 search result), so scoring started before all 7 reference servers connected -> Recall@5 measured 0.387 vs baseline threshold 0.631 (false regression). Fix: poll /api/v1/tools until the catalog reaches the near-full count (~45 tools across the 7 servers) and add a short settle for the index build, then score. Verified locally end-to-end on a fully-indexed instance: Recall@1/3/5/10 = 0.418/0.560/0.681/0.791, Gate(recall_at_5) PASS (0.681 vs 0.631) — the baseline is exactly reproducible. actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing>
…eval (MCP-742) (#561) * ci(065): eval.yml regression gate — D2 security (blocking) + D1 retrieval (MCP-742) Spec 065 / C1 (FR-009, US3/P2). Add `.github/workflows/eval.yml` running both Spec-065 evaluations as a regression gate over the frozen datasets: - security-d2 (blocking): provenance/license guard (FR-007/CN-005) → cmd/scan-eval ×3 → mcp-eval SecurityScorer. Thresholds --fpr-ceiling 0.10 --recall-floor 0.05 (the sensitive-data detector measures recall ≈0.10 on this corpus; scorer defaults of 0.80 would always fail). Sourced in one place pending the MCP-815 baseline `security.gate` block so gate and baseline never drift. - retrieval-d1: boots mcpproxy over snapshot-servers.config.json, waits for index readiness, runs the RetrievalScorer with baseline+tolerance. Report-only on PRs (npx/uvx fetch flake), blocking on the nightly schedule. Shared D2 logic in scripts/eval-ci-smoke.sh (CI == local). Reports upload as artifacts, never committed (CN-003, guarded). mcp-eval checked out at a pinned public ref. Verified locally: full D2 gate PASS (P=0.667 R=0.100 FPR=0.043), actionlint clean. Related #555 datasets; implements MCP-742 (Gate-2 plan rev 2 accepted). Co-Authored-By: Paperclip <noreply@paperclip.ing> * ci(065): fix D1 retrieval job — create data_dir + single-step server lifecycle The retrieval-d1 job failed: `mcpproxy serve` exited immediately with "data_dir: directory does not exist" (serve refuses to create a missing data_dir), and the server was backgrounded in a separate step from the readiness poll (a process backgrounded in one step is reaped when that step's shell exits). Fix: mkdir -p the data_dir, and boot + readiness-poll + run the scorer in ONE step (shared shell) with a trap that stops the server however the step ends; also fail fast if the server process dies during startup. D2 gate unaffected. Verified: mcpproxy boots and serves /api/v1/status locally with the created data_dir; actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing> * ci(065): fix D1 readiness probe — parse data.results from index/search envelope The retrieval-d1 readiness poll never passed: mcpproxy booted and indexed all 7 servers (45 tools), but the probe parsed the index/search response at the top level while results are nested under the `{"success":true,"data":{"results":[…]}}` envelope, so it read 0 every attempt and timed out. Fix: parse `data.results`. Verified locally — index returns 5 results for q=file within ~6s of boot. actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing> * ci(065): D1 readiness waits for full tool catalog before scoring The retrieval scorer was running against a partially-indexed instance: the readiness probe passed at the first indexed tool (>=1 search result), so scoring started before all 7 reference servers connected -> Recall@5 measured 0.387 vs baseline threshold 0.631 (false regression). Fix: poll /api/v1/tools until the catalog reaches the near-full count (~45 tools across the 7 servers) and add a short settle for the index build, then score. Verified locally end-to-end on a fully-indexed instance: Recall@1/3/5/10 = 0.418/0.560/0.681/0.791, Gate(recall_at_5) PASS (0.681 vs 0.631) — the baseline is exactly reproducible. actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
* docs(065): Evaluation Foundation spec (D1+D2) — measure security & discovery First implementation epic of the H2-2026 roadmap. Move from asserting to measuring: D1 tool-retrieval golden set (Recall@k/MRR/nDCG over a frozen corpus, the prerequisite + GEPA fitness function) and D2 security regression corpus (per-detector precision/recall/F1 + false-positive-rate, the quiet- security metric). Both extend the existing mcp-eval harness; gated in CI. D3/D4/D5/D6 are follow-on specs that compose on these. * docs(065): plan + research + data-model + contracts + quickstart for Evaluation Foundation Phase 0/1 design for D1 (tool-retrieval golden set, Recall@k/MRR/nDCG via REST /api/v1/index/search) and D2 (security regression corpus, per-detector precision/recall/F1/FPR via a cmd/scan-eval JSON bridge). Extends the mcp-eval harness; datasets frozen + versioned; 3 JSON-schema contracts; quickstart. Constitution PASS. D3/D4/D5/D6 remain follow-on specs. * feat(065): add cmd/scan-eval D2 detector bridge (B1) (#550) Bridge the Spec 065 / D2 security corpus to mcpproxy's production sensitive-data detector and emit per-entry, per-detector verdict JSON for the Python SecurityScorer (B3). Offline, deterministic test tooling only — no runtime or REST surface (Security-by-Default, R-03). - cmd/scan-eval: reads a security-corpus.schema.json-conforming file, runs each entry.description through security.NewDetector(nil).Scan, echoes ground-truth id/label/category, emits scan-verdict.schema.json. - Flags: --corpus (required), --out (default stdout), --detectors=sensitive-data (default), --scanners (reserved opt-in extension point for the deferred Docker bundled-scanner pass). - Exit codes: 0 ok, 4 bad/missing corpus or flags, 1 write failure. - contracts/scan-verdict.schema.json: the verdict output contract B3 consumes to derive per-detector TP/FP/TN/FN -> P/R/F1/FPR. - Test-first: TP (embedded AWS key), TN, missing/empty corpus, and deterministic-output coverage; committed minimal corpus fixture. The fixture demonstrates honest measurement (INV-3): the detector is a true positive on a credential-exfil description, a false negative on pure prompt-injection text, and a visible false positive on a benign doc referencing ~/.aws/credentials — i.e. it measures real coverage rather than trivially passing. Co-authored-by: Paperclip <noreply@paperclip.ing> * feat(065): security_corpus_v1.json (D2 security regression dataset) (#551) Add the D2 labeled security corpus the detection scorer measures against, plus a co-located validator test enforcing the contract and cross-entity invariants (INV-3, INV-4 / SC-004). - 43 entries: 20 malicious (tool_poisoning/prompt_injection/shadowing/rug_pull), 15 clean benign, 8 attack-resembling hard negatives (2 per attack category). - Every entry carries label + category + provenance.{source,license} (FR-007). - Sources limited to self-authored + DVMCP (MIT); MCPTox/MCP-AttackBench and the unconfirmed-license mcp-injection-experiments are referenced externally only, never vendored (CN-005, R-07, R-A). The validator fails the build on any non-redistributable license. - corpus_test.go validates against security-corpus.schema.json (santhosh-tekuri jsonschema/v6) and asserts attack coverage + >=1 hard negative per category. Related #739 Co-authored-by: Paperclip <noreply@paperclip.ing> * feat(065): D1 retrieval datasets (renamed for clarity) + merged D1/D2 README (#554) * feat(065): D1 retrieval datasets — frozen corpus, golden set, baseline (A2) Generate and commit the Spec 065 D1 retrieval evaluation artifacts via A1's mcp-eval datasets/retrieval CLI (cb37f84): - corpus_v1.json: frozen 45-tool snapshot over 7 no-auth reference MCP servers (filesystem, git, memory, sqlite, fetch, time, sequential-thinking), via GET /api/v1/tools. Immutable (CN-002); refresh = corpus_v2 (FR-012). - corpus_v1.source.json: secret-free, reproducible mcpproxy source config. - retrieval_golden_v1.json: 47 graded queries (relevance 0|1|2), 11 cross-server hard-negatives (FR-001), R-C compliant (queries never name the tool). Passes schema + INV-1 validation. - baseline_v1.json: reference Recall@k/MRR/nDCG@10/MAP + Recall@5 tolerance 0.05, the CI regression anchor (FR-009). Retrieval metrics top-level (scorer reads them directly); empty security section reserved for D2 (CN-004). - README.md: documented, repeatable regeneration procedure (FR-012). Verified end-to-end against a live BM25 index: validate OK, gate PASS (Recall@5=0.681 >= baseline-0.05). Score reports stay local (CN-003). Related #MCP-740 Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(065): rename D1 dataset files for clarity + merge D1/D2 README Addresses pre-merge review on #552: corpus_v1.source.json (valid config) vs corpus_v1.json (scored snapshot, NOT a config) invited 'serve --config corpus_v1.json' which fails. Renamed: - corpus_v1.source.json -> snapshot-servers.config.json (the servable config) - corpus_v1.json -> corpus_v1.tools.json (the scored snapshot) Updated baseline_v1.json source_config + corpus note refs; merged the D1 and D2 dataset READMEs into one with an explicit servable-vs-dataset cheat sheet. --------- Co-authored-by: Paperclip <noreply@paperclip.ing> * ci(065): eval.yml regression gate — D2 security (blocking) + D1 retrieval (MCP-742) (#561) * ci(065): eval.yml regression gate — D2 security (blocking) + D1 retrieval (MCP-742) Spec 065 / C1 (FR-009, US3/P2). Add `.github/workflows/eval.yml` running both Spec-065 evaluations as a regression gate over the frozen datasets: - security-d2 (blocking): provenance/license guard (FR-007/CN-005) → cmd/scan-eval ×3 → mcp-eval SecurityScorer. Thresholds --fpr-ceiling 0.10 --recall-floor 0.05 (the sensitive-data detector measures recall ≈0.10 on this corpus; scorer defaults of 0.80 would always fail). Sourced in one place pending the MCP-815 baseline `security.gate` block so gate and baseline never drift. - retrieval-d1: boots mcpproxy over snapshot-servers.config.json, waits for index readiness, runs the RetrievalScorer with baseline+tolerance. Report-only on PRs (npx/uvx fetch flake), blocking on the nightly schedule. Shared D2 logic in scripts/eval-ci-smoke.sh (CI == local). Reports upload as artifacts, never committed (CN-003, guarded). mcp-eval checked out at a pinned public ref. Verified locally: full D2 gate PASS (P=0.667 R=0.100 FPR=0.043), actionlint clean. Related #555 datasets; implements MCP-742 (Gate-2 plan rev 2 accepted). Co-Authored-By: Paperclip <noreply@paperclip.ing> * ci(065): fix D1 retrieval job — create data_dir + single-step server lifecycle The retrieval-d1 job failed: `mcpproxy serve` exited immediately with "data_dir: directory does not exist" (serve refuses to create a missing data_dir), and the server was backgrounded in a separate step from the readiness poll (a process backgrounded in one step is reaped when that step's shell exits). Fix: mkdir -p the data_dir, and boot + readiness-poll + run the scorer in ONE step (shared shell) with a trap that stops the server however the step ends; also fail fast if the server process dies during startup. D2 gate unaffected. Verified: mcpproxy boots and serves /api/v1/status locally with the created data_dir; actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing> * ci(065): fix D1 readiness probe — parse data.results from index/search envelope The retrieval-d1 readiness poll never passed: mcpproxy booted and indexed all 7 servers (45 tools), but the probe parsed the index/search response at the top level while results are nested under the `{"success":true,"data":{"results":[…]}}` envelope, so it read 0 every attempt and timed out. Fix: parse `data.results`. Verified locally — index returns 5 results for q=file within ~6s of boot. actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing> * ci(065): D1 readiness waits for full tool catalog before scoring The retrieval scorer was running against a partially-indexed instance: the readiness probe passed at the first indexed tool (>=1 search result), so scoring started before all 7 reference servers connected -> Recall@5 measured 0.387 vs baseline threshold 0.631 (false regression). Fix: poll /api/v1/tools until the catalog reaches the near-full count (~45 tools across the 7 servers) and add a short settle for the index build, then score. Verified locally end-to-end on a fully-indexed instance: Recall@1/3/5/10 = 0.418/0.560/0.681/0.791, Gate(recall_at_5) PASS (0.681 vs 0.631) — the baseline is exactly reproducible. actionlint clean. Related #555 datasets; MCP-742. Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing> * ci(065): trigger eval gate on D1 retrieval system-under-test paths (#563) The eval.yml pull_request.paths filter only matched the D2 security surface (internal/security, cmd/scan-eval) and the harness/datasets, so changes to the D1 retrieval system it gates — BM25 index, MCP tool- discovery routing, the REST search envelope, server/CLI boot — never triggered the workflow. Spec 065 requires CI to catch discovery regressions when search/index/tool-discovery behavior changes. Add internal/index/**, internal/server/**, internal/httpapi/**, and cmd/mcpproxy/** to the trigger paths. Job logic is unchanged, so D2 security and D1 retrieval stay green. Related MCP-833 --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
Spec 070 — Registry: make discovery actual & easy to add to upstream
Two phases of spec 070, opened for review under the Glass Cockpit gates (agents open PRs, the board merges — do not self-merge).
Phase 2 — Keystone core op (foundational, already on branch)
server.AddServerFromRegistry— the single backend op every surface funnels through: resolve a registry reference (registryId+serverId+overrides) → re-derive a validatedconfig.ServerConfigserver-side → persist quarantined. The client never sends a config blob, so a compromised/buggy client cannot smuggle command/args or disable quarantine (CN-001 / CN-002 / decision D1). Stable cross-surface error codes:registry_not_found,server_not_found,no_install_info,missing_required_input,duplicate_name.Phase 3 — CLI MVP (US2) 🎯 — this change
Closes the genuine net-new gap: discover→add on the CLI.
cmd/mcpproxy/registry_cmd.go—registry list|search|addCobra group.list/searchare daemon-first with an in-process fallback (work with or without a running daemon);addrequires the daemon (keystone op is server-side). Legacysearch-serversretained unchanged as a back-compat alias.internal/cliclient—ListRegistries,SearchRegistry,AddFromRegistry+ client-sideRegistryAddError(carries the stable code + missing-input names).internal/httpapi—POST /api/v1/registries/{id}/servers/{serverId}/add→ newAddServerFromRegistryRefcontroller method; code→HTTP status mapping (404 / 400). OpenAPI regenerated.internal/server—AddServerFromRegistryRefadapter projecting failures ontocontracts.RegistryAddError.internal/contracts—AddFromRegistryRequest/AddFromRegistryData/AddedServerSummary/RegistryAddError.Tests / verification
list → search → add → assert quarantined in upstream list, plus themissing_required_inputactionable-error path (names the--envkey).go buildpersonal + server editions;golangci-lint0 issues;go test ./internal/server -shortgreen.Decisions worth a reviewer's eye
protocolinheritscustom/pulsefrom the default registry list via viper's index-based slice merge, so its servers get parsed with the wrong parser. Test configs set an explicit protocol to avoid it. T018 (merge defaults∪config by ID) should fix the root cause.Related #746