Move CheckCreateServer policy check before image download#4734
Move CheckCreateServer policy check before image download#4734
Conversation
The policy gate check was buried deep inside Runner.Run(), after image download and auth/middleware initialization. Users had to wait for the full image pull before learning their server was rejected by policy. Split GetMCPServer into ResolveMCPServer (fast registry lookup) and PullMCPServerImage (slow download), then check the policy gate between the two. All three entry points (CLI, API, MCP handler) now share EnforcePolicyAndPullImage which enforces the gate, skips pull for K8s and remote servers, and delegates the actual pull to an injectable ImagePuller. The existing check in Runner.Run stays as defense-in-depth. Additional hardening from review: - Remove fragile hoisted nil imageCtx; EnforcePolicyAndPullImage now accepts a pullTimeout and creates its own child context - Remove dead GetMCPServer wrapper (no production callers) - Defer ImageManager creation to protocol-scheme branch only, avoiding an expensive Docker daemon ping for registry lookups - Add nil-runConfig test case for EnforcePolicyAndPullImage - Document ActivePolicyGate export intent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4734 +/- ##
==========================================
- Coverage 68.82% 68.74% -0.09%
==========================================
Files 516 516
Lines 54153 54180 +27
==========================================
- Hits 37271 37245 -26
- Misses 14019 14092 +73
+ Partials 2863 2843 -20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
For protocol-scheme images (npx://, uvx://, go://), ResolveMCPServer already creates an ImageManager to build the image. The subsequent call to PullMCPServerImage was creating a second ImageManager (and Docker daemon connection) only to confirm the image already exists locally. Add a locallyBuilt parameter to EnforcePolicyAndPullImage so callers can signal when the image was built from a protocol scheme, skipping the unnecessary pull. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@claude check if your comment was addressed |
|
Claude finished @aponcedeleonch's task in 46s —— View job Review comment check
Yes, the comment was addressed. My original concern was that protocol-scheme images (npx://, uvx://, go://) would incur a second Docker daemon ping: once in The fix introduces a // Skip pull when the image was already built locally (protocol-scheme)
// or when running on Kubernetes (the kubelet pulls its own image).
if locallyBuilt || containerRuntime.IsKubernetesRuntime() {
return nil
}All three call sites correctly pass
For protocol-scheme URIs, |
Summary
Policy gate checks ran too late — deep inside
Runner.Run(), after theimage had already been downloaded and auth/middleware had been initialized.
Users had to wait for the full image pull (potentially minutes for large
images) before learning their server was rejected by policy.
This PR restructures the flow so the policy check runs before the
image pull across all three entry points (CLI, API, MCP handler):
GetMCPServerintoResolveMCPServer(fast registry lookup) andPullMCPServerImage(slow download)EnforcePolicyAndPullImageas the shared enforcement pointthat checks the gate, skips pull for K8s/remote servers, and delegates
the pull to an injectable
ImagePullerRunner.Runstays as defense-in-depthAdditional hardening:
imageCtx;EnforcePolicyAndPullImagenowaccepts a
pullTimeoutand creates its own child contextGetMCPServerwrapper (no production callers remained)ImageManagercreation to the protocol-scheme branch only,avoiding an expensive Docker daemon ping for registry lookups
runConfigtest case forEnforcePolicyAndPullImageActivePolicyGateexport intentType of change
Test plan
task test)task lint-fix)Changes
pkg/runner/retriever/retriever.goGetMCPServer→ResolveMCPServer+PullMCPServerImage; addEnforcePolicyAndPullImage; deferImageManagercreation to protocol-scheme branchpkg/runner/retriever/retriever_test.goResolveMCPServer; add nil-runConfig and pullTimeout test casescmd/thv/app/run_flags.goResolveMCPServerthenEnforcePolicyAndPullImagebefore returning configpkg/api/v1/workload_service.goimageCtx; callEnforcePolicyAndPullImagewith parentctx+ timeoutpkg/api/v1/workloads_test.goimagePullerto test service structspkg/mcp/server/run_server.goEnforcePolicyAndPullImagebetween resolve and runpkg/runner/policy_gate.goActivePolicyGatewith intent documentationpkg/runner/policy_gate_test.goActivePolicyGatepkg/runner/runner.goActivePolicyGate(defense-in-depth check stays)Does this introduce a user-facing change?
Servers blocked by policy now fail immediately instead of after a potentially
long image download. No API or CLI interface changes.
Special notes for reviewers
GetMCPServerhad no remaining production callers after the refactor, soit was removed entirely. Tests now call
ResolveMCPServerdirectly.EnforcePolicyAndPullImagetimeout parameter (pullTimeout) is onlyused by the API path (10-minute budget); CLI and MCP handler pass
0(no timeout).
Generated with Claude Code