feat(cloud): read-only cloud-context MCP for GCP and AWS#53
Open
sourcehawk wants to merge 35 commits into
Open
feat(cloud): read-only cloud-context MCP for GCP and AWS#53sourcehawk wants to merge 35 commits into
sourcehawk wants to merge 35 commits into
Conversation
One pkg/mcp/cloud/ package bound by --provider, thin typed tools (list_inventory, session_status) plus a gated read-only run_cli escape hatch over a profile-overridable command allowlist with a hardcoded deny floor. Pinned read-only identity via operator-ambient impersonation (env-injected), Workload Identity, or a deferred static-key connection; visible degrade and a shared whoami probe across the connections panel and preflight. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tate Four-PR breakdown on feature/cloud-context-mcp: scaffold+harness (#45) produces the Provider interface, identity probe, and env contracts; GCP (#43), AWS (#46), and launcher integration (#47) consume them in parallel. Includes the contracts table, conventions, and the resumable state file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ify bubble-ups Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ety harness (#48) * feat(cloud): provider interface and server skeleton (#45) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): command allowlist with hardcoded deny floor (#45) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): argv validation against allowlist, deny floor, and scope (#45) Exact-match the positional subcommand path so a surplus token (a shell metacharacter, an extra argument) cannot ride through on an allowed prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): no-shell argv exec core with output truncation (#45) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): shared identity probe (#45) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): list_inventory, session_status, run_cli, list_allowed_commands (#45) Wire the four tools onto the server, load the command allowlist through the deny floor at construction, and bind run_cli + the providers to a single validated no-shell run core. list_allowed_commands reads the same allowlist run_cli enforces, so advertised equals permitted. Add the TRIAGENT_CLOUD_* env-name constants the launcher injects through the subprocess env. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): register --kind=cloud --provider in serve.go (#45) Parse --provider, decode the frozen scope and allowlist override from the subprocess env, and construct the server behind cloud.Provider. The gcp/aws implementations land in their own PRs; until then a known provider reports it is not yet built and an unknown one is named in the error. Also fold cloud.ToolSpecs() into the launcher tool catalog so the four tools surface in the MCP catalog view alongside every other server. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud): build an explicit minimal subprocess env instead of inheriting the parent (#45) s.run passed nil to execCLI, which makes Go's exec set cmd.Env = nil and inherit the entire parent process environment — violating the spec's minimal-env guarantee and harness.go's own no-leak doc comment. The existing TestExecCLIMinimalEnv passed only because it called execCLI directly with an explicit env; the real caller bypassed that. Add Provider.EnvPassthrough() so each provider declares the env var names its CLI needs forwarded, and build the subprocess env from os.Environ() filtered to the base set plus those names via Server.subprocessEnv. A new test exercises the server-built env: a parent-env canary is dropped while a declared passthrough var survives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(cloud): assert with testify across the cloud package (#45) Convert the scaffold's tests from bare t.Fatal/t.Fatalf/t.Errorf to testify, the repo standard: require for preconditions a failure must stop at (a non-nil error before a dereference, setup that must succeed), assert for independent checks that should keep running. Assertion intent is preserved exactly; no security assertion is weakened, and the harness_security_test source-scan logic (reading harness.go bytes) stays intact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nflict Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ovider-factory contract #47's preflight/connections probe constructs cloud.Provider values to call cloud.Probe, so it imports the provider packages — it cannot compile until both #43 and #46 land. Correct the plan's parallel claim and add a shared provider factory (pkg/mcp/cloud/providers) as #47's first task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(cloud/gcp): provider skeleton, default allowlist, deny-floor additions (#43) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud/gcp): identity probe over impersonation (#43) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud/gcp): inventory projection (#43) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): wire gcp provider into serve.go (#43) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-env, and #46 binary findings Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(cloud/aws): provider skeleton, default allowlist, deny-floor additions (#46) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud/aws): identity probe over assumed role (#46) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud/aws): inventory projection with single-account fallback (#46) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): wire aws provider into serve.go (#46) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(cloud/aws): lowercase fixture error string for staticcheck (#46) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud/aws): error on missing aws binary instead of falling back to a relative path (#46) New() now resolves aws to an absolute path via exec.LookPath and errors when it is absent, matching the gcp provider. The relative "aws" fallback defeated the startup-resolution guarantee: a poisoned PATH could substitute a different binary at exec time. A missing-binary deployment is handled by the launcher (#47) marking the cloud source unavailable, not by a fallback inside the provider. Adds the newWithBinary seam (mirroring gcp) so tests inject a fixed path and stay hermetic on a CI box without the aws CLI installed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-env) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… the parent (#51) Probe built its RunFunc with a nil cmd.Env, which makes the gcloud/aws whoami subprocess inherit the entire launcher environment — leaking ambient secrets into the identity probe used by session_status, preflight, and connections. This contradicted the spec's "explicit minimal cmd.Env" requirement and diverged from Server.run, which already filters the env. Extract a package-level minimalEnv helper (os.Environ filtered to the base passthrough plus the provider-declared names) so both the run_cli harness and the probe build their subprocess env through one home. Server.subprocessEnv now delegates to it, and Probe forwards minimalEnv(p.EnvPassthrough()) instead of nil — the whoami still gets the credential/impersonation env it needs, nothing more. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(cloud): shared provider factory; serve.go delegates construction (#47) Introduce pkg/mcp/cloud/providers.New(name), the single construction site for a cloud.Provider. It imports the concrete gcp and aws packages (which the cloud package itself cannot, without a cycle) and mirrors how the launcher builds an auth.Provider from pkg/auth/teleport and pkg/auth/kubeconfig. serve.go's newCloudProvider is removed in favour of delegating to the factory, so preflight and connections can obtain a provider the same way the serve arm does. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(profile): cloud source config block (#47) Add Profile.Cloud []CloudSource so a deployment can declare read-only cloud connections in the profile. Each source carries the alias, provider, pinned AssumedIdentity, optional aws Profile selector, scope allowlist, and an optional command-allowlist override path. applyBase inherits cloud sources with the same replace-on-presence rule as linked_repos. AssumedIdentity is the canonical pinned identity (SA email for gcp, role ARN for aws); Profile is the aws-only AWS_PROFILE selector, ignored by gcp. Scope reuses cloud.ScopeAllowlist so the launcher can JSON-encode it into the cloud MCP subprocess env unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(preflight): wire triagent-cloud-<alias> servers with pinned-identity env (#47) Emit a triagent-cloud-<alias> MCP server per profile cloud source, with args ["serve","--kind=cloud","--provider=<p>"] and env carrying the provider selector, the optional allowlist-override path, the JSON-encoded scope, and the per-provider pinned-identity env. The cloud loop mirrors the per-repo git loop. Per-provider identity env, by mechanism: gcp impersonates the assumed identity directly via CLOUDSDK_AUTH_IMPERSONATE_SERVICE_ACCOUNT (one env is both the impersonation target and the expected identity); aws selects an assume-role profile via AWS_PROFILE and checks the role ARN via TRIAGENT_CLOUD_AWS_EXPECTED_ROLE_ARN. mcpconfig references the env-name constants from the provider packages, never raw literals — so the gcp impersonation const and the aws profile/expected-role consts are exported. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(preflight): cloud identity probe with visible degrade (#47) Run the read-only identity probe for each profile cloud source after the kubeconfig freeze, recording the outcome in Result.CloudSources. The probe degrades, never blocks: a failed probe — or a provider construction error, e.g. a missing CLI binary — marks that source unavailable with a hint, and the session still starts. The existing k8s block-on-failure behaviour is unchanged. The shared providers.ProbeSource constructs the source's provider via the factory and pins the per-provider expected-identity env around the whoami (serialized, then restored) so each probe validates against its own pinned identity. Preflight exposes a CloudProbe seam for tests; nil uses the real prober. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(connections): read-only cloud identity status in /api/connections (#47) GET /api/connections grows a cloud array of {provider, assumed_identity, valid, hint}, built from the profile's cloud sources probed at request time. The fields mirror cloud.IdentityStatus so the panel renders directly from the probe. Read-only: no PUT/DELETE route for cloud, since a cloud connection is configured in the profile, not entered in the panel. The cloud array is profile-sourced (the connections wallet holds only stored tokens, which cloud has none of), so it lives on the response builder beside slack_channel_prefix rather than in the connections package. cloudProbe is an injectable seam; nil uses the real providers.ProbeSource. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): read-only cloud identity pills in connections panel (#47) Render the profile-configured cloud connections as read-only pills in the manage-connections modal: the assumed identity with a checkmark when the request-time probe is valid, the reauth hint when not. Cloud is configured in the deployment profile, never entered in the panel, so the pills carry no edit affordance. The section is omitted when no cloud sources exist. ConnectionStatus grows an optional cloud[] of {provider, assumed_identity, valid, hint}, mirroring the /api/connections response. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a read-only cloud-context MCP that lets triagent expose GCP/AWS investigation context through typed tools and a gated no-shell CLI, with launcher/profile integration for pinned cloud identities and pre-session visibility.
Changes:
- Introduces
pkg/mcp/cloudwith provider abstraction, allowlist/deny-floor validation, minimal-env exec harness, tools, and MCP server wiring. - Adds GCP and AWS providers over
gcloud/aws, including embedded read-only command allowlists and identity/inventory projections. - Wires cloud sources into profiles, preflight, MCP config generation,
/api/connections, and the frontend connections panel.
Reviewed changes
Copilot reviewed 60 out of 60 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/mcp/cloud/validate.go |
Adds argv validation, deny-floor flag checks, and scope checks. |
pkg/mcp/cloud/validate_test.go |
Covers validation behavior for allowlist, deny floor, and scope. |
pkg/mcp/cloud/tools_wire_test.go |
Verifies registered cloud tools match the tool catalog. |
pkg/mcp/cloud/tools_test.go |
Adds handler tests for cloud tools. |
pkg/mcp/cloud/tools_status.go |
Implements session_status. |
pkg/mcp/cloud/tools_inventory.go |
Implements list_inventory. |
pkg/mcp/cloud/tools_cli.go |
Implements run_cli and list_allowed_commands. |
pkg/mcp/cloud/specs.go |
Adds cloud tool specs for meta catalog integration. |
pkg/mcp/cloud/server.go |
Adds cloud MCP server construction, env filtering, and tool registration. |
pkg/mcp/cloud/server_test.go |
Tests server construction and subprocess env filtering. |
pkg/mcp/cloud/providers/registry.go |
Adds shared provider factory for GCP/AWS. |
pkg/mcp/cloud/providers/registry_test.go |
Tests provider factory behavior. |
pkg/mcp/cloud/providers/probe.go |
Adds launcher-side source probing with pinned env. |
pkg/mcp/cloud/providers/gcp/provider.go |
Adds GCP provider skeleton and env contract. |
pkg/mcp/cloud/providers/gcp/provider_test.go |
Tests GCP provider metadata and allowlist. |
pkg/mcp/cloud/providers/gcp/inventory.go |
Adds GCP project inventory projection. |
pkg/mcp/cloud/providers/gcp/inventory_test.go |
Tests GCP inventory parsing and argv. |
pkg/mcp/cloud/providers/gcp/identity.go |
Adds GCP identity probe. |
pkg/mcp/cloud/providers/gcp/identity_test.go |
Tests GCP identity states. |
pkg/mcp/cloud/providers/gcp/default_commands.json |
Adds default GCP read-only command allowlist. |
pkg/mcp/cloud/providers/aws/provider.go |
Adds AWS provider skeleton and env contract. |
pkg/mcp/cloud/providers/aws/provider_test.go |
Tests AWS provider metadata and allowlist. |
pkg/mcp/cloud/providers/aws/inventory.go |
Adds AWS org-account inventory with fallback. |
pkg/mcp/cloud/providers/aws/inventory_test.go |
Tests AWS inventory parsing and fallback. |
pkg/mcp/cloud/providers/aws/identity.go |
Adds AWS caller identity validation. |
pkg/mcp/cloud/providers/aws/identity_test.go |
Tests AWS identity validation modes. |
pkg/mcp/cloud/providers/aws/default_commands.json |
Adds default AWS read-only command allowlist. |
pkg/mcp/cloud/provider.go |
Defines cloud provider interface and shared structs. |
pkg/mcp/cloud/probe.go |
Adds shared identity probe. |
pkg/mcp/cloud/probe_test.go |
Tests probe behavior and minimal env. |
pkg/mcp/cloud/harness.go |
Adds no-shell CLI execution harness. |
pkg/mcp/cloud/harness_test.go |
Tests non-zero exit handling. |
pkg/mcp/cloud/harness_security_test.go |
Tests no-shell, inert metacharacters, truncation, and env isolation. |
pkg/mcp/cloud/fake_test.go |
Adds fake provider for package tests. |
pkg/mcp/cloud/env.go |
Adds cloud MCP env var constants. |
pkg/mcp/cloud/default_commands.json |
Adds empty parent default allowlist. |
pkg/mcp/cloud/allowlist.go |
Adds command allowlist loading/filtering and deny floor. |
pkg/mcp/cloud/allowlist_test.go |
Tests allowlist filtering and matching. |
internal/server/meta.go |
Adds cloud tools to meta catalog. |
internal/server/handlers.go |
Adds cloud probe dependency seam. |
internal/server/handlers_connections.go |
Adds read-only cloud status to connections response. |
internal/server/handlers_connections_test.go |
Tests cloud array in /api/connections. |
internal/profile/profile.go |
Adds profile cloud: source model. |
internal/profile/profile_test.go |
Tests parsing cloud profile block. |
internal/profile/embed.go |
Adds base-profile merge behavior for cloud sources. |
internal/profile/cloud_base_test.go |
Tests cloud base inheritance/override behavior. |
internal/preflight/preflight.go |
Adds cloud source preflight probing with visible degrade. |
internal/preflight/preflight_test.go |
Tests cloud probe degrade behavior. |
internal/preflight/mcpconfig.go |
Adds triagent-cloud-<alias> MCP config generation and env injection. |
internal/preflight/mcpconfig_test.go |
Tests cloud MCP config/env generation. |
frontend/lib/api.ts |
Adds frontend cloud connection types. |
frontend/components/Icons.tsx |
Exports cloud icon. |
frontend/components/ConnectionsPanel.tsx |
Renders read-only cloud connection pills. |
frontend/components/ConnectionsPanel.test.tsx |
Tests cloud pill rendering. |
docs/superpowers/states/2026-05-30-cloud-context-mcp-state.md |
Adds feature orchestration state. |
docs/superpowers/specs/2026-05-30-cloud-context-mcp-design.md |
Adds cloud-context MCP design spec. |
docs/superpowers/plans/2026-05-30-cloud-context-mcp.md |
Adds implementation plan. |
cmd/triagent-mcp/serve.go |
Adds --kind=cloud serve wiring and scope/env parsing. |
cmd/triagent-mcp/serve_cloud_test.go |
Tests cloud serve flag/error behavior. |
CLAUDE.md |
Documents testify assertion convention. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The plan and orchestration state are scratch artifacts for the feature's development; the durable design spec stays. Removed as the final commit now that the integration PR's CI is green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(cloud): add cloud-providers page and register it in docs nav New public docs page covering the read-only GCP/AWS cloud-context MCP: what it gives the agent, the pinned-identity model, per-provider setup (GCP serviceAccountTokenCreator impersonation, AWS assume-role profile), the full cloud: profile block, scope and command allowlists, and visible degrade. Registered the section in both docs/site/lib/sections.ts and frontend/components/DocsView.tsx, placed next to Connections. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(cloud): document the cloud block in profiles and connections Add a "Cloud sources" section and the cloud: block to the profiles page (anatomy YAML plus a prose reference pointing at the cloud-providers page), and a "Cloud (read-only)" subsection to connections explaining that cloud identities are profile-configured, read-only, and validated by the identity probe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(profile): add commented cloud example to the default profile Operators forking the default see the cloud: block shape (a gcp and an aws source) with every field explained. Kept commented so it does not activate a cloud source on the runnable default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(web): point the connections cloud note at the SA/assume-role config Enhance the read-only cloud section's note to say identities are pinned in the profile's cloud: block (not entered here), point at the Cloud providers docs for the impersonation / assume-role setup, and name the operator's own re-auth commands. No edit affordance added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…process env (#56) * refactor(cloud): thread expected identity and env through Probe/Identity Provider.Identity and cloud.Probe now take the pinned identity and the subprocess env explicitly instead of reading process-global env. The gcp and aws Identity implementations validate the resolved identity against the threaded expected value, dropping their os.Getenv reads. cloud.Server carries ExpectedIdentity (read once from TRIAGENT_CLOUD_EXPECTED_IDENTITY in the serve subprocess) and builds the probe env via subprocessEnv(). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(cloud): drop os.Setenv pinning in ProbeSource ProbeSource no longer mutates the launcher's process env (and its serializing mutex) to pin the per-provider expected identity. It now builds the subprocess credential env explicitly — base PATH/HOME plus the provider's declared config-dir passthrough names carried from os.Environ, with the source credential var overlaid — and threads the pinned identity into cloud.Probe. A test pins the no-mutation guarantee. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(preflight): uniform expected-identity env in mcpconfig and serve cloudSourceEnv now sets the uniform TRIAGENT_CLOUD_EXPECTED_IDENTITY for both providers in addition to the per-provider credential env the CLI authenticates with (gcp impersonation target, aws assume-role profile), replacing the aws-only expected-role-ARN env. runCloud reads the uniform env once and threads it into cloud.Options.ExpectedIdentity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 65 out of 65 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
pkg/mcp/cloud/providers/aws/identity.go:100
- The assumed-role parser only accepts the commercial
awspartition and truncates role paths at the first slash. That marks valid GovCloud/China callers (for examplearn:aws-us-gov:sts::...) or roles with IAM paths invalid even when they match the pinned role ARN. Parse the ARN partition and split the assumed-role resource at the last slash so the role path is preserved.
) * fix(preflight): omit degraded cloud sources from the session MCP config Probe cloud sources before writing the MCP config and wire only the sources whose probe is Valid. A failed probe now disables the source (absent from mcp.json) instead of merely reporting it, honoring the visible-degrade contract. All sources, valid and degraded, remain in Result.CloudSources so the status surface still shows the degraded ones with their hint. The probe still degrades, never blocks the session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(connections): carry the cloud source alias through /api/connections The cloud DTO exposed provider, identity, valid, and hint but not the alias, so two sources sharing a provider and identity but differing in scope were indistinguishable even though the MCP is keyed triagent-cloud-<alias>. Add alias to the DTO and the frontend CloudConnection type, and surface it as the pill heading so each source is identifiable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(cloud): correct the account-scope enforcement claim The Scope allowlist section implied run_cli enforces scope.accounts as an account allowlist. It does not: only --project and --region/--zone are argv-validated. AWS account reach is bounded by the pinned assume-role profile, not by scope.accounts. State that project and region/zone are enforced on argv, while account reach is governed by the pinned role, and mark scope.accounts as informational and reserved so operators do not rely on an allowlist the harness does not enforce. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oviders (#58) * fix(cloud): capture stderr in CLIResult execCLI used cmd.Output(), discarding the child's stderr where gcloud and aws write their error context. A non-zero run_cli returned an empty stdout and no explanation. Capture stderr into a capped buffer, surface it as CLIResult.Stderr (json "stderr,omitempty"), and truncate it at the same byte limit as stdout. The no-shell, minimal-env, closed-stdin guarantees and the "non-zero exit is a normal result" contract are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud): match allowlisted verb chains as a prefix and reject metacharacter tokens Allows exact-matched the positional subcommand path, so an allowlisted "compute instances describe" rejected "compute instances describe my-vm" — the resource operand made the path unequal, leaving most describe/get commands advertised but unusable. Match the allowlisted path as a token-wise prefix of argv's leading positionals so trailing resource operands ride through. There is no shell, so a trailing token is an inert argument. As defense in depth, validateArgv now rejects any argv token that is or contains a shell-control sequence (`;`, `|`, `&`, backtick, `$(`, `>`, `<`, newline). A metacharacter token like ["...","list",";","rm"] is refused by this check rather than by the allowlist; a literal resource name or a key=value filter passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud/gcp): validate impersonation instead of comparing the base account `gcloud auth list` reports the operator's base active account, not the SA selected by CLOUDSDK_AUTH_IMPERSONATE_SERVICE_ACCOUNT, so the old active==expected check marked correctly-configured impersonation invalid. Validity now means "impersonation is pinned to the expected SA and the pin works": read the in-process impersonation env, confirm it equals the expected target, then run a minimal impersonated read (gcloud auth print-access-token) to prove the grant is active. AssumedIdentity is the SA on success; failures degrade through Valid/Hint with the captured stderr. NOTE: needs verification against a live gcloud before relying on the exact print-access-token shape (flagged in a code comment). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud/gcp): check exit code before parsing projects list execCLI returns a non-zero exit as CLIResult{ExitCode:n} with err==nil, so a failed `gcloud projects list` was JSON-parsed and surfaced as a misleading parse error. Check ExitCode before unmarshalling and return the exit code plus captured stderr, mirroring the AWS provider. (The gcp identity probe added in the prior commit already checks ExitCode.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud/aws): fall back to caller account only on Organizations-unavailable Inventory fell back to the single-account projection on ANY non-zero exit or transport error, masking throttling, network faults, and other real failures. Now that stderr is captured, fall back only when the stderr names an Organizations-unavailable condition (AccessDenied, "not a member of an organization", or AWSOrganizationsNotInUseException). Any other non-zero exit returns the exit code plus stderr; a transport error is surfaced rather than silently degrading. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud/aws): parse assumed-role ARNs across partitions and IAM paths assumedRoleARN hardcoded arn:aws:sts:: and Cut at the first slash, so GovCloud (arn:aws-us-gov:) / China (arn:aws-cn:) ARNs and roles carrying an IAM path (assumed-role/path/to/Role/session) misparsed. Accept any partition, and split so the role path-and-name is everything between assumed-role/ and the final /<session>, rebuilding the IAM role ARN under the same partition. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(cloud): correct ScopeAllowlist account-scope claim The doc comment said the agent "cannot pivot to an un-allowlisted project, account, or region", but allowedFor only maps --project and --region/--zone — Accounts is not argv-enforced. State it accurately: project and region/zone are enforced against argv; account reach is constrained by the pinned identity/role and the deny-floored --account / --profile flags, not validated here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud): fail closed on a malformed cloud scope parseCloudScope swallowed a malformed TRIAGENT_CLOUD_SCOPE and returned an empty (unconstrained) ScopeAllowlist, failing OPEN and silently widening run_cli. It now returns an error, and runCloud parses the scope before resolving the provider and aborts startup on a malformed value, so a misconfigured scope can never silently drop the deployment's restrictions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…be (#64) * fix(cloud): drop allowlist entries that are a prefix of a denied path filterAllowlist dropped an override only when a deny-floor subcommand was a token-prefix of the entry (the entry sat UNDER a denied path). An override that was a PREFIX OF a denied path survived, yet Allows then re-admitted the floored nested command via its prefix match: a bare "s3" entry made Allows(["s3","cp", ...]) true again, re-enabling the floored "s3 cp". validateArgv never re-checks floored subcommands at runtime, so the load-time filter was the sole gate. Fold both directions into DenyFloor.blocks: drop an entry when it is prefix-comparable to any floor subcommand either way. Entries that share a leading token but diverge deeper (compute instances list vs floored compute ssh) are prefix-comparable to neither and stay allowed; no shipped default_commands.json entry is dropped by the stricter filter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud): bound the identity probe with a timeout so a hung CLI degrades ProbeSource ran the provider whoami under the caller's context with no bound, so a stale SSO flow, a slow network, or a wedged gcloud/aws blocked /api/connections and session preflight indefinitely — breaking the "degrade, never block" contract. Wrap the probe in a 15s timeout (comfortably above a normal 1-3s whoami, well below anything that would stall a request): the deadline kills the CLI exec, the provider surfaces the context error, and cloud.Probe degrades it to a Valid:false status with a hint rather than hanging. Extract probeProvider so the bound is observable without a real CLI; a blocking fake provider plus a shortened probeTimeout proves the deadline propagates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…IAM floor (#65) Addresses two review findings: scope only constrains explicit --project/--region values (omission falls back to the CLI default, so hard project confinement is the per-project IAM grant), and allowlist entries must be leaf read-verbs (an intermediate override would admit mutating siblings via prefix match; the no-write guarantee is the read-only IAM grant, not the allowlist alone). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing, absolute binary, doc/field cleanup (#66) * fix(cloud): cap harness output during the run instead of buffering unbounded execCLI buffered the full stdout/stderr in memory and truncated only after the process returned, so a command emitting a very large response could consume unbounded memory despite defaultOutputLimit. Capture stdout/stderr through a bounded limitedWriter that retains at most limit bytes each and records overflow, so the cap is effective during the run. Every existing guarantee is preserved: no shell, explicit minimal env, closed stdin, Truncated set on overflow, stderr captured and capped, non-zero exit as a normal CLIResult, real start/exec failure as a Go error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud): report the pinned identity on a degraded probe When the provider failed to resolve an identity, Probe returned Valid:false with an empty AssumedIdentity even though the caller passed the pinned identity in expected, so session_status no longer named which pinned identity was degraded. Fall back to expected whenever the resulting status has an empty AssumedIdentity, on both the degraded and valid paths, so the displayed identity is always the pinned one. Degrade-never-error semantics are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud): keep the pinned identity when provider construction fails A provider construction failure (e.g. a missing gcloud/aws binary) returned IdentityStatus{Provider, Valid:false, Hint} with no AssumedIdentity, so preflight and connections reported the degraded source without the identity the operator must fix. Carry src.AssumedIdentity through the construction-error status, mirroring the probe-path fallback so both ProbeSource exits name the pinned identity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud): resolve the provider CLI to an absolute path The harness relies on a fixed absolute binary path so a later subprocess env/PATH change cannot redirect what executes, but exec.LookPath returns a relative path (flagged with exec.ErrDot) when PATH carries relative entries. Pass the LookPath result through filepath.Abs in each provider's New(), recovering the relative path on ErrDot and erroring if it still cannot be made absolute. Applied identically to gcp and aws. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(cloud): correct CLIResult doc to say raw truncated output The comment claimed output was shaped/redacted, but run_cli returns the provider CLI's raw stdout/stderr, only truncated. State that CLIResult carries the raw CLI stdout (and stderr), capped at the output limit with Truncated set when exceeded, so callers do not assume shaping or redaction beyond truncation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(cloud): drop the unused Command.Redact field Command.Redact was advertised in the allowlist schema and documented as marking output for secret-scrubbing, but nothing read it before returning run_cli output, so it promised protection that did not exist. No shipped default_commands.json sets it. Remove the field and its doc; run_cli is the gated escape hatch returning raw (truncated) output by design, and typed tools are where projection lives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d-context MCP Amends the identity model from a single pinned target to a deployment-pinned SET the agent may select among (never beyond): a new set_active_target tool, applied as an MCP-controlled per-exec env var (CLOUDSDK_CORE_PROJECT for gcp, AWS_PROFILE for aws), with an AWS accounts list + generated profiles for multi-account. Records the rejected runtime-AssumeRole broker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines
+33
to
+36
| - `list_inventory` — projects / accounts and the accessible resources within an allowlisted scope, so the agent can orient. | ||
| - `session_status` — the read-only whoami: which pinned identity is active, in which target, and whether it is valid. | ||
| - `set_active_target` — choose which project (GCP) or account (AWS) subsequent `run_cli` commands run against, from the deployment-pinned set surfaced by `list_inventory`. The MCP applies the choice as a controlled environment variable; the agent never names an arbitrary target. | ||
| - `run_cli` — a gated, read-only `gcloud` / `aws` invocation for everything else, with argument tokens supplied as an array. |
Comment on lines
+69
to
+72
| func (a *apiHandlers) cloudConnections(ctx context.Context) []cloudConnection { | ||
| if a.prof == nil || len(a.prof.Cloud) == 0 { | ||
| return nil | ||
| } |
…n gets its own spec Reverts the set_active_target amendment to 2026-05-30-cloud-context-mcp-design.md. That doc is the durable ADR for the shipped base MCP; folding an unbuilt feature into it blurs what exists vs what is new. The active-target-selection design moves to its own ADR that references this one. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ext MCP) A separate ADR for the new capability: a set_active_target tool letting the agent choose a project (GCP) / account (AWS) from a deployment-pinned set, applied as an MCP-controlled per-exec env var, with AWS multi-account via generated profiles. References the base cloud-context spec, which ships unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR A (provider-agnostic core: Target, the two Provider methods, server active-target state + env apply, set_active_target tool, run_cli gating, session_status) and PR B (gcp/aws impls, aws accounts config + generated profiles, inventory honesty, launcher wiring, docs). Folds into #53. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines
+51
to
+55
| Flags: []string{ | ||
| "--impersonate-service-account", "--account", "--profile", | ||
| "--endpoint-url", "--cli-input-json", "--cli-input-yaml", "--configuration", | ||
| "--flags-file", "--access-token-file", | ||
| }, |
* feat(cloud): Target type and active-target provider methods (#47-followup) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): server active-target state, selectable set, and env application Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): set_active_target tool and spec Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): run_cli requires an active target when several are configured Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): session_status reports the active target Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): real providers satisfy the active-target interface The Provider interface gained ConfiguredTargets and ActiveTargetEnv, so the gcp and aws realizations and the providers-package probe double must implement them to keep the tree compiling. gcp pins CLOUDSDK_CORE_PROJECT and returns no configured set (its set is scope/inventory); aws pins AWS_PROFILE. The deployment-configured aws accounts list arrives with the AWS accounts config. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(profile): aws cloud accounts list + source_profile A multi-account aws cloud source carries a source_profile (the operator's SSO base) and an accounts list (one read-only role per account). Validation requires source_profile and at least one account with non-empty, source-unique account ids and role_arns when accounts is set; the single-assumed_identity profile form stays valid and is mutually exclusive with accounts. Towards #44 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud/aws): configured accounts, generated profiles, active-target env aws.New takes the source alias, source_profile, and account set. ConfiguredTargets surfaces the accounts as the agent's selectable targets; ActiveTargetEnv pins AWS_PROFILE to each account's generated profile name. profiles.go writes a delimited, idempotent managed block per alias into ~/.aws/config (or $AWS_CONFIG_FILE), one assume-role profile per account layering its role_arn over the operator's source_profile. The write is tmp-file-then-rename and replaces only the alias's own block, so operator-authored profiles and other aliases survive. New generates the block at construction, so the profiles exist before any probe or run_cli on both the serve subprocess and launcher-side paths. Towards #44 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cloud/aws): inventory reflects the configured accounts, not the whole org When the source carries a configured accounts list, Inventory returns exactly those accounts as the reachable set and shells nothing — each account is its own read-only role, so an org-wide list-accounts would advertise accounts run_cli cannot enter. The single-account form keeps the organizations list-accounts + caller-account fallback. Towards #44 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cloud): wire aws accounts + source_profile through serve and mcpconfig Adds cloud.EnvAWSAccounts (JSON), cloud.EnvAWSSourceProfile, and cloud.EnvAWSAlias. cloudSourceEnv emits them for a multi-account aws source (and no static AWS_PROFILE, since the server pins it per-exec from the active target); the single-account form is unchanged. runCloud decodes them and builds the provider through the factory. The factory (providers.New) and ProbeSource gain an Options/Source path carrying the alias, source_profile, and accounts, so the launcher-side probe builds the aws provider with its profile map — generating the same ~/.aws/config block the serve subprocess does, before any whoami. The launcher probe targets the default (first) account's generated profile; per-account validity is out of scope for v1. Interface changes beyond the plan: providers.New gained a variadic Options arg and providers.Source gained Alias/SourceProfile/Accounts, both required so the launcher-side provider has the profile map the plan called out as under-specified; cloud.EnvAWSAlias was added so serve and the launcher namespace generated profiles identically; profile.CloudAccount gained snake_case json tags to fix the env wire shape. Towards #44 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(cloud): document multi-account/project active-target selection Adds the set_active_target tool, the AWS accounts + source_profile multi-account config (with a generated-profiles explanation and example), the GCP-one-identity- many-projects vs AWS-one-account-per-role model, and the run_cli-requires-an-active- target rule. Reconciles the pinned-identity, scope-by-omission, and cloud-block sections with the new bounded-selection behavior. Towards #44 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines
+107
to
+113
| func defaultCloudProbe(ctx context.Context, src profile.CloudSource) cloud.IdentityStatus { | ||
| return providers.ProbeSource(ctx, providers.Source{ | ||
| Provider: src.Provider, | ||
| AssumedIdentity: src.AssumedIdentity, | ||
| Profile: src.Profile, | ||
| }) | ||
| } |
| } | ||
| return out | ||
| } | ||
| inv, err := s.provider.Inventory(ctx, s.run) |
Comment on lines
+51
to
+55
| Flags: []string{ | ||
| "--impersonate-service-account", "--account", "--profile", | ||
| "--endpoint-url", "--cli-input-json", "--cli-input-yaml", "--configuration", | ||
| "--flags-file", "--access-token-file", | ||
| }, |
Comment on lines
+188
to
+190
| func (p *Provider) ActiveTargetEnv(id string) []string { | ||
| if len(p.accounts) == 0 { | ||
| return []string{EnvProfile + "=" + id} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #44
Adds a read-only cloud-context MCP that lets the operator agent investigate the GCP/AWS layer beneath a Kubernetes incident — reachability, IAM, managed-cluster config, logs, and audit trail — without a human leaving the loop. One package serves both clouds behind a
Providerinterface selected by--provider, aliasedtriagent-cloud-<alias>; the agent reaches the long tail through a gated, no-shell CLI and orients through two typed tools. Read-only is guaranteed by construction (no shell, positive allowlist, hardcoded deny floor, scope check) and backstopped by a deployment-pinned identity the agent can neither select nor escalate. Cloud auth readiness is visible before a session starts, and a stale cloud credential degrades that source visibly rather than blocking Kubernetes triage.Changes
pkg/mcp/cloud/scaffold + safety harness — theProviderinterface, the no-shellexecCLIcore, the command allowlist with a hardcoded deny floor config can never re-enable, argv validation (allowlist + deny floor + scope), the shared identity probe, the four tools (list_inventory,session_status,run_cli,list_allowed_commands), andserve --kind=cloud --provider=<gcp|aws>wiring.providers/gcp) overgcloudand AWS provider (providers/aws) overaws— each implements theProviderinterface with an embedded read-only allowlist across the investigative axes, provider-specific deny-floor additions, an identity whoami (GCP impersonation target / AWS assumed-role), and inventory projection (GCP projects / AWS org-accounts with single-account fallback).pkg/mcp/cloud/providers.New) — the single construction site importing both providers, used byserve.go, preflight, and connections.cloud:block, per-sessiontriagent-cloud-<alias>servers with pinned-identity env injection, a preflight identity probe that degrades-not-blocks, a read-onlycloudarray inGET /api/connections, and read-only cloud pills in the connections panel.run_cliharness and the identity probe forward onlyPATH/HOMEplus each provider's declared credential/impersonation env, never the launcher's full environment.Challenges
Read-only by construction, not by string-filtering.
run_clinever touches a shell: input is a typed argv array, exec is a directexecveagainst a binary resolved to an absolute path once at startup, and shell metacharacters handed in as tokens are inert (a surplus token simply fails the exact-match allowlist). A hardcoded deny floor (credential-reading subcommands, identity/endpoint flags,file:///@/http(s)://arg prefixes) sits below the configurable allowlist and can never be re-enabled, mirroring the k8sSecretfilter. The pinned read-only IAM identity is the outermost backstop.Identity pinning differs by cloud, so the model isn't symmetric. GCP impersonates the assumed identity directly (
CLOUDSDK_AUTH_IMPERSONATE_SERVICE_ACCOUNTis both the mechanism and the expected identity); AWS selects an assume-role profile (AWS_PROFILE) and checks the expected role ARN (TRIAGENT_CLOUD_AWS_EXPECTED_ROLE_ARN).CloudSourcekeepsAssumedIdentityas the canonical displayed identity and adds an aws-onlyProfileselector;mcpconfig.goinjects the right env per provider via provider-package constants.The launcher must construct a provider to probe it.
cloud.Probeneeds a concreteProvider, which thecloudpackage can't build without an import cycle — so a neutral factory package imports gcp+aws, mirroring how the launcher already buildsauth.Providerfrompkg/auth/teleport/pkg/auth/kubeconfig. The launcher-side probe pins per-provider identity env around the whoami (mutex-serialized). Known v1 limitation: that pin mutates process-global env for the probe window; a future cleanup would thread the expected identity explicitly rather than via env.Related
Shipped as five sub-PRs into this branch: #48 (scaffold + harness), #49 (GCP provider), #50 (AWS provider), #51 (probe minimal-env fix), #52 (launcher integration). Design spec:
docs/superpowers/specs/2026-05-30-cloud-context-mcp-design.md.Testing
TDD throughout (each task's test watched fail for the right reason first). The security boundary is covered by
harness_security_test.go(nosh -csource scan; metacharacter argv proven inert; deny floor, scope, and truncation enforced) and the argv-validation table. Providers are table-tested against captured CLI-output fixtures (no live cloud). Verified on the integrated branch:make test-gorace-clean and green,make lint0 issues,frontendtypecheck clean and vitest 195/195 passing (incl. the new connections-panel cloud-pill spec), andmake buildproduces both binaries with a fresh embedded bundle. Reviewers may want to poke at the deny-floor coverage in each provider'sdefault_commands.jsonand the exact-match allowlist reasoning, and confirm the visible-degrade path (a failed cloud probe must never block a Kubernetes session).🤖 Generated with Claude Code