Skip to content

feat(cloud): scaffold the read-only cloud-context MCP package and safety harness#48

Merged
sourcehawk merged 9 commits into
feature/cloud-context-mcpfrom
feature/cloud-context-mcp--scaffold
May 30, 2026
Merged

feat(cloud): scaffold the read-only cloud-context MCP package and safety harness#48
sourcehawk merged 9 commits into
feature/cloud-context-mcpfrom
feature/cloud-context-mcp--scaffold

Conversation

@sourcehawk
Copy link
Copy Markdown
Owner

Description

Towards #45

Lays the foundation for the read-only cloud-context MCP (pkg/mcp/cloud/): the provider-agnostic interface, the bypass-resistant command harness, the shared identity probe, the four tools, and the serve.go wiring. This is the scaffold every downstream PR builds on — the GCP provider (#43), the AWS provider (#46), and the launcher integration all consume the contracts that land here. No real cloud provider ships in this PR; a fakeProvider drives the package tests, and --provider=gcp|aws reports "not built yet" until B/C wire the implementations.

The security model is the heart of the feature and is implemented by construction here: run_cli never touches a shell, argv is a typed token array (no in-house splitter to fool), a positive allowlist gates the subcommand path, and a hardcoded deny floor the config can never re-enable covers credential-reading subcommands, identity/endpoint flags, and local-file/SSRF argument prefixes.

Changes

  • cloud.Provider interface plus projection structs (Inventory, IdentityStatus, CLIResult) and a fakeProvider test double — the contract Implement the GCP provider for the cloud-context MCP #43 and Implement the AWS provider for the cloud-context MCP #46 implement.
  • Command allowlist loader (LoadCommandAllowlist) mirroring the k8s LoadAllowlist pattern: embedded default or profile override, always filtered through a hardcoded denyFloor plus provider additions. A too-broad override can never re-enable a floored command.
  • Argv validation: exact-match the positional subcommand path (so a surplus metacharacter token can't ride through on an allowed prefix), reject deny-floored flags and arg-prefixes, validate --project/region against the scope allowlist.
  • No-shell exec core (execCLI): direct execve, explicit minimal env, closed stdin, output truncation. Shell metacharacters handed in as argv tokens are inert.
  • Shared identity probe (cloud.Probe) returning IdentityStatus — the single struct session_status, the connections panel, and preflight all render from. Degrades (Valid:false + Hint), never errors, on a stale credential.
  • Four tools (list_inventory, session_status, run_cli, list_allowed_commands) with ToolSpecs() and a wire test that fails if registration drifts from the catalog.
  • triagent-mcp serve --kind=cloud --provider=<gcp|aws> wiring, the TRIAGENT_CLOUD_PROVIDER / TRIAGENT_CLOUD_ALLOWLIST_PATH / TRIAGENT_CLOUD_SCOPE env-name constants, and cloud.ToolSpecs() folded into the launcher tool catalog.

Challenges

The metacharacter-rejection guarantee and the allowlist match are the same mechanism: making Allows match the positional subcommand path exactly (rather than as a prefix) is what makes ["compute","instances","list",";","rm"] fail — the surplus ; and rm tokens make the positional path no longer equal any allowlisted entry. That keeps the no-shell guarantee structural rather than relying on string sanitization. The security tests assert both halves: a source-level scan that no "-c" / sh -c construction exists, and a behavioural check that an argv full of metacharacters prints them verbatim and spawns no second process.

Related

Testing

make test-go is race-clean and green across all 45 packages; make lint reports 0 issues. Every task was built TDD-first (failing test watched fail for the right reason, then implemented). The load-bearing security tests in harness_security_test.go cover the no-shell guarantee (source scan + behavioural metacharacter-inert + minimal-env + truncation); validate_test.go tables every deny-floor flag, arg-prefix, scope pivot, and metacharacter token; tools_wire_test.go guards registration-vs-catalog drift. Reviewers may want to poke at the deny-floor coverage in allowlist.go/validate.go and confirm the Allows exact-match reasoning above holds for their threat model.

🤖 Generated with Claude Code

sourcehawk and others added 9 commits May 30, 2026 04:53
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#45)

Exact-match the positional subcommand path so a surplus token (a shell
metacharacter, an extra argument) cannot ride through on an allowed prefix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mmands (#45)

Wire the four tools onto the server, load the command allowlist through the
deny floor at construction, and bind run_cli + the providers to a single
validated no-shell run core. list_allowed_commands reads the same allowlist
run_cli enforces, so advertised equals permitted. Add the TRIAGENT_CLOUD_*
env-name constants the launcher injects through the subprocess env.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parse --provider, decode the frozen scope and allowlist override from the
subprocess env, and construct the server behind cloud.Provider. The gcp/aws
implementations land in their own PRs; until then a known provider reports it
is not yet built and an unknown one is named in the error. Also fold
cloud.ToolSpecs() into the launcher tool catalog so the four tools surface in
the MCP catalog view alongside every other server.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iting the parent (#45)

s.run passed nil to execCLI, which makes Go's exec set cmd.Env = nil and
inherit the entire parent process environment — violating the spec's
minimal-env guarantee and harness.go's own no-leak doc comment. The
existing TestExecCLIMinimalEnv passed only because it called execCLI
directly with an explicit env; the real caller bypassed that.

Add Provider.EnvPassthrough() so each provider declares the env var names
its CLI needs forwarded, and build the subprocess env from os.Environ()
filtered to the base set plus those names via Server.subprocessEnv. A new
test exercises the server-built env: a parent-env canary is dropped while
a declared passthrough var survives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert the scaffold's tests from bare t.Fatal/t.Fatalf/t.Errorf to
testify, the repo standard: require for preconditions a failure must stop
at (a non-nil error before a dereference, setup that must succeed), assert
for independent checks that should keep running. Assertion intent is
preserved exactly; no security assertion is weakened, and the
harness_security_test source-scan logic (reading harness.go bytes) stays
intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sourcehawk sourcehawk force-pushed the feature/cloud-context-mcp--scaffold branch from 7d2b921 to 43390a1 Compare May 30, 2026 02:58
@sourcehawk sourcehawk merged commit 2a0d6d4 into feature/cloud-context-mcp May 30, 2026
4 checks passed
@sourcehawk sourcehawk deleted the feature/cloud-context-mcp--scaffold branch May 30, 2026 03:02
sourcehawk added a commit that referenced this pull request May 30, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant