Skip to content

fix(cloud): address review findings in the harness, allowlist, and providers#58

Merged
sourcehawk merged 8 commits into
feature/cloud-context-mcpfrom
feature/cloud-context-mcp--fix-harness
May 30, 2026
Merged

fix(cloud): address review findings in the harness, allowlist, and providers#58
sourcehawk merged 8 commits into
feature/cloud-context-mcpfrom
feature/cloud-context-mcp--fix-harness

Conversation

@sourcehawk
Copy link
Copy Markdown
Owner

Description

Towards #44

Addresses verified review findings in the read-only cloud-context MCP's parent package, GCP/AWS providers, and serve wiring. The headline correctness issue: most describe/get commands were advertised but unusable because the allowlist exact-matched the verb chain and rejected the resource operand, and a non-zero CLI exit surfaced an empty result with no error context. This PR makes the harness report stderr, the allowlist accept resource operands while rejecting shell-control tokens, the identity probes validate the actually-pinned identity, and the scope parser fail closed.

Changes

  • Capture stderr in CLIResult so a non-zero run_cli carries the provider's error context instead of an empty stdout. Stderr is capped at the same byte limit as stdout (json:"stderr,omitempty").
  • Allowlist matches the verb chain as a token-wise prefix, so compute instances describe my-vm passes (the operand is an inert argument — there is no shell). Defense in depth: validateArgv now rejects any argv token that is or contains a shell-control sequence (; | & backtick $( > < newline).
  • GCP identity validates impersonation instead of comparing the operator's base account to the target SA. It confirms the in-process impersonation env is pinned to the expected SA, then runs a minimal impersonated read (gcloud auth print-access-token) to prove the pin took effect.
  • Exit-code checks before JSON-parse in GCP inventory and identity, so a failed gcloud projects list returns the exit code + stderr, not a misleading parse error.
  • AWS inventory falls back to the single caller account only on an Organizations-unavailable condition (AccessDenied / not-a-member / AWSOrganizationsNotInUseException); any other failure (throttling, transport) is surfaced, not masked.
  • AWS assumed-role ARN parser handles all partitions (aws, aws-us-gov, aws-cn) and roles carrying IAM paths (assumed-role/path/to/Role/session).
  • parseCloudScope fails closed: a malformed TRIAGENT_CLOUD_SCOPE aborts cloud-server startup instead of silently returning an unconstrained scope.
  • Doc fix: ScopeAllowlist no longer claims account-axis argv enforcement it doesn't perform.

Challenges

The GCP impersonation probe is the one change that can't be fully proven in unit tests: gcloud auth list reports the base account, not the impersonated SA, so the prior active == expected check marked correct impersonation invalid. The new probe reads CLOUDSDK_AUTH_IMPERSONATE_SERVICE_ACCOUNT, confirms it equals the expected target, then mints a token to prove the grant is live. The exact print-access-token shape is flagged in a code comment as needing verification against a live gcloud.

Testing

make test-go (race, -count=1) green across the whole module; make lint reports 0 issues. Each finding landed test-first: stderr capture and truncation, prefix + metacharacter rejection (including embedded $(), GCP impersonation (no-target / mismatched-pin / impersonated-read success / failure), exit-code-before-parse for both providers, AWS Organizations-unavailable vs. throttling vs. transport fallback behavior, ARN parsing across partitions and IAM paths, and the malformed-scope fail-closed path in runCloud. frontend/ untouched.

🤖 Generated with Claude Code

sourcehawk and others added 8 commits May 30, 2026 18:13
execCLI used cmd.Output(), discarding the child's stderr where gcloud
and aws write their error context. A non-zero run_cli returned an empty
stdout and no explanation. Capture stderr into a capped buffer, surface
it as CLIResult.Stderr (json "stderr,omitempty"), and truncate it at the
same byte limit as stdout. The no-shell, minimal-env, closed-stdin
guarantees and the "non-zero exit is a normal result" contract are
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…character tokens

Allows exact-matched the positional subcommand path, so an allowlisted
"compute instances describe" rejected "compute instances describe my-vm"
— the resource operand made the path unequal, leaving most describe/get
commands advertised but unusable. Match the allowlisted path as a
token-wise prefix of argv's leading positionals so trailing resource
operands ride through. There is no shell, so a trailing token is an inert
argument.

As defense in depth, validateArgv now rejects any argv token that is or
contains a shell-control sequence (`;`, `|`, `&`, backtick, `$(`, `>`,
`<`, newline). A metacharacter token like ["...","list",";","rm"] is
refused by this check rather than by the allowlist; a literal resource
name or a key=value filter passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…account

`gcloud auth list` reports the operator's base active account, not the SA
selected by CLOUDSDK_AUTH_IMPERSONATE_SERVICE_ACCOUNT, so the old
active==expected check marked correctly-configured impersonation invalid.
Validity now means "impersonation is pinned to the expected SA and the pin
works": read the in-process impersonation env, confirm it equals the
expected target, then run a minimal impersonated read
(gcloud auth print-access-token) to prove the grant is active. AssumedIdentity
is the SA on success; failures degrade through Valid/Hint with the captured
stderr.

NOTE: needs verification against a live gcloud before relying on the exact
print-access-token shape (flagged in a code comment).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
execCLI returns a non-zero exit as CLIResult{ExitCode:n} with err==nil, so
a failed `gcloud projects list` was JSON-parsed and surfaced as a
misleading parse error. Check ExitCode before unmarshalling and return the
exit code plus captured stderr, mirroring the AWS provider. (The gcp
identity probe added in the prior commit already checks ExitCode.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…vailable

Inventory fell back to the single-account projection on ANY non-zero exit
or transport error, masking throttling, network faults, and other real
failures. Now that stderr is captured, fall back only when the stderr
names an Organizations-unavailable condition (AccessDenied, "not a member
of an organization", or AWSOrganizationsNotInUseException). Any other
non-zero exit returns the exit code plus stderr; a transport error is
surfaced rather than silently degrading.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
assumedRoleARN hardcoded arn:aws:sts:: and Cut at the first slash, so
GovCloud (arn:aws-us-gov:) / China (arn:aws-cn:) ARNs and roles carrying
an IAM path (assumed-role/path/to/Role/session) misparsed. Accept any
partition, and split so the role path-and-name is everything between
assumed-role/ and the final /<session>, rebuilding the IAM role ARN under
the same partition.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The doc comment said the agent "cannot pivot to an un-allowlisted
project, account, or region", but allowedFor only maps --project and
--region/--zone — Accounts is not argv-enforced. State it accurately:
project and region/zone are enforced against argv; account reach is
constrained by the pinned identity/role and the deny-floored --account /
--profile flags, not validated here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
parseCloudScope swallowed a malformed TRIAGENT_CLOUD_SCOPE and returned an
empty (unconstrained) ScopeAllowlist, failing OPEN and silently widening
run_cli. It now returns an error, and runCloud parses the scope before
resolving the provider and aborts startup on a malformed value, so a
misconfigured scope can never silently drop the deployment's restrictions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sourcehawk sourcehawk merged commit 435219e into feature/cloud-context-mcp May 30, 2026
4 checks passed
@sourcehawk sourcehawk deleted the feature/cloud-context-mcp--fix-harness branch May 30, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant