Skip to content

Move CheckCreateServer policy check before image download#4734

Merged
reyortiz3 merged 2 commits intomainfrom
early-policy-check
Apr 10, 2026
Merged

Move CheckCreateServer policy check before image download#4734
reyortiz3 merged 2 commits intomainfrom
early-policy-check

Conversation

@aponcedeleonch
Copy link
Copy Markdown
Member

Summary

Policy gate checks ran too late — deep inside Runner.Run(), after the
image had already been downloaded and auth/middleware had been initialized.
Users had to wait for the full image pull (potentially minutes for large
images) before learning their server was rejected by policy.

This PR restructures the flow so the policy check runs before the
image pull across all three entry points (CLI, API, MCP handler):

  • Split GetMCPServer into ResolveMCPServer (fast registry lookup) and
    PullMCPServerImage (slow download)
  • Introduce EnforcePolicyAndPullImage as the shared enforcement point
    that checks the gate, skips pull for K8s/remote servers, and delegates
    the pull to an injectable ImagePuller
  • The existing check inside Runner.Run stays as defense-in-depth

Additional hardening:

  • Remove fragile hoisted nil imageCtx; EnforcePolicyAndPullImage now
    accepts a pullTimeout and creates its own child context
  • Remove dead GetMCPServer wrapper (no production callers remained)
  • Defer ImageManager creation to the protocol-scheme branch only,
    avoiding an expensive Docker daemon ping for registry lookups
  • Add nil-runConfig test case for EnforcePolicyAndPullImage
  • Document ActivePolicyGate export intent

Type of change

  • Refactoring (no behavior change)

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Changes

File Change
pkg/runner/retriever/retriever.go Split GetMCPServerResolveMCPServer + PullMCPServerImage; add EnforcePolicyAndPullImage; defer ImageManager creation to protocol-scheme branch
pkg/runner/retriever/retriever_test.go Rename tests to match ResolveMCPServer; add nil-runConfig and pullTimeout test cases
cmd/thv/app/run_flags.go Use ResolveMCPServer then EnforcePolicyAndPullImage before returning config
pkg/api/v1/workload_service.go Remove hoisted nil imageCtx; call EnforcePolicyAndPullImage with parent ctx + timeout
pkg/api/v1/workloads_test.go Supply no-op imagePuller to test service structs
pkg/mcp/server/run_server.go Insert EnforcePolicyAndPullImage between resolve and run
pkg/runner/policy_gate.go Export ActivePolicyGate with intent documentation
pkg/runner/policy_gate_test.go Update to use exported ActivePolicyGate
pkg/runner/runner.go Update to use exported ActivePolicyGate (defense-in-depth check stays)

Does this introduce a user-facing change?

Servers blocked by policy now fail immediately instead of after a potentially
long image download. No API or CLI interface changes.

Special notes for reviewers

  • GetMCPServer had no remaining production callers after the refactor, so
    it was removed entirely. Tests now call ResolveMCPServer directly.
  • The EnforcePolicyAndPullImage timeout parameter (pullTimeout) is only
    used by the API path (10-minute budget); CLI and MCP handler pass 0
    (no timeout).

Generated with Claude Code

The policy gate check was buried deep inside Runner.Run(), after image
download and auth/middleware initialization. Users had to wait for the
full image pull before learning their server was rejected by policy.

Split GetMCPServer into ResolveMCPServer (fast registry lookup) and
PullMCPServerImage (slow download), then check the policy gate between
the two. All three entry points (CLI, API, MCP handler) now share
EnforcePolicyAndPullImage which enforces the gate, skips pull for K8s
and remote servers, and delegates the actual pull to an injectable
ImagePuller. The existing check in Runner.Run stays as defense-in-depth.

Additional hardening from review:
- Remove fragile hoisted nil imageCtx; EnforcePolicyAndPullImage now
  accepts a pullTimeout and creates its own child context
- Remove dead GetMCPServer wrapper (no production callers)
- Defer ImageManager creation to protocol-scheme branch only, avoiding
  an expensive Docker daemon ping for registry lookups
- Add nil-runConfig test case for EnforcePolicyAndPullImage
- Document ActivePolicyGate export intent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/M Medium PR: 300-599 lines changed label Apr 10, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 48.71795% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.74%. Comparing base (ecd7872) to head (45eea10).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pkg/runner/retriever/retriever.go 54.54% 10 Missing ⚠️
pkg/mcp/server/run_server.go 0.00% 7 Missing ⚠️
pkg/api/v1/workload_service.go 75.00% 1 Missing and 1 partial ⚠️
pkg/runner/runner.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4734      +/-   ##
==========================================
- Coverage   68.82%   68.74%   -0.09%     
==========================================
  Files         516      516              
  Lines       54153    54180      +27     
==========================================
- Hits        37271    37245      -26     
- Misses      14019    14092      +73     
+ Partials     2863     2843      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

For protocol-scheme images (npx://, uvx://, go://), ResolveMCPServer
already creates an ImageManager to build the image. The subsequent call
to PullMCPServerImage was creating a second ImageManager (and Docker
daemon connection) only to confirm the image already exists locally.

Add a locallyBuilt parameter to EnforcePolicyAndPullImage so callers
can signal when the image was built from a protocol scheme, skipping
the unnecessary pull.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aponcedeleonch
Copy link
Copy Markdown
Member Author

@claude check if your comment was addressed

@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 10, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 10, 2026

Claude finished @aponcedeleonch's task in 46s —— View job


Review comment check

  • Gather context — read retriever.go and all three call sites
  • Verify the fix addresses the double Docker ping issue

Yes, the comment was addressed.

My original concern was that protocol-scheme images (npx://, uvx://, go://) would incur a second Docker daemon ping: once in ResolveMCPServer to build the image, and again in PullMCPServerImage to confirm it exists locally.

The fix introduces a locallyBuilt bool parameter to EnforcePolicyAndPullImage (line 175 of retriever.go). When true, the function skips the puller() call entirely:

// Skip pull when the image was already built locally (protocol-scheme)
// or when running on Kubernetes (the kubelet pulls its own image).
if locallyBuilt || containerRuntime.IsKubernetesRuntime() {
    return nil
}

All three call sites correctly pass runner.IsImageProtocolScheme(...) as the locallyBuilt argument:

Call site Argument
cmd/thv/app/run_flags.go:367 runner.IsImageProtocolScheme(serverOrImage)
pkg/api/v1/workload_service.go:352 runner.IsImageProtocolScheme(req.Image)
pkg/mcp/server/run_server.go:74 runner.IsImageProtocolScheme(args.Server)

For protocol-scheme URIs, locallyBuilt is true → the puller is skipped → no second Docker daemon connection. The fix correctly eliminates the redundant ping.

@reyortiz3 reyortiz3 merged commit 0021b0b into main Apr 10, 2026
54 of 55 checks passed
@reyortiz3 reyortiz3 deleted the early-policy-check branch April 10, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants