From bfcc71d389d3cd7d8e612d0ac3925d0569980006 Mon Sep 17 00:00:00 2001 From: David Cramer Date: Fri, 8 May 2026 22:19:47 -0700 Subject: [PATCH] feat(datadog): Use Pup CLI for telemetry workflows Switch the Datadog plugin from the hosted MCP server to Datadog's Pup CLI so it can use deployment-managed API and application keys through host-managed header transforms. Add plugin command-env support for non-secret CLI placeholders, update the Datadog skill/docs/specs, and cover the packaged Datadog manifest in unit tests. Co-Authored-By: GPT-5 Codex --- README.md | 20 +- .../src/content/docs/extend/datadog-plugin.md | 79 ++++---- .../docs/src/content/docs/extend/index.md | 35 +++- packages/junior-datadog/README.md | 41 ++-- packages/junior-datadog/plugin.yaml | 75 ++++--- .../junior-datadog/skills/datadog/SKILL.md | 48 ++--- .../skills/datadog/references/api-surface.md | 88 ++++----- .../datadog/references/common-use-cases.md | 75 +++---- .../skills/datadog/references/query-syntax.md | 90 +++++---- .../references/troubleshooting-workarounds.md | 33 ++-- .../junior/src/chat/capabilities/factory.ts | 6 + .../src/chat/credentials/test-broker.ts | 9 +- .../chat/plugins/auth/api-headers-broker.ts | 2 +- .../chat/plugins/auth/github-app-broker.ts | 4 +- .../chat/plugins/auth/oauth-bearer-broker.ts | 5 +- packages/junior/src/chat/plugins/manifest.ts | 64 ++++++ packages/junior/src/chat/plugins/types.ts | 1 + packages/junior/src/chat/skills.ts | 3 +- .../capabilities/capability-factory.test.ts | 7 +- .../unit/plugins/api-headers-broker.test.ts | 16 ++ .../plugin-manifest-api-headers.test.ts | 85 ++++++++ .../tests/unit/plugins/test-broker.test.ts | 7 + .../junior/tests/unit/skills/skills.test.ts | 2 +- specs/plugin-spec.md | 184 +++++++++++------- specs/security-policy.md | 7 +- specs/skill-capabilities-spec.md | 12 +- 26 files changed, 668 insertions(+), 330 deletions(-) diff --git a/README.md b/README.md index ae43cd6e..7b214de2 100644 --- a/README.md +++ b/README.md @@ -17,13 +17,13 @@ Start here: ## Packages -| Package | Purpose | -| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `@sentry/junior` | Core Slack bot runtime | -| `@sentry/junior-agent-browser` | Agent Browser plugin package for browser automation | -| `@sentry/junior-datadog` | Datadog plugin package for observability workflows (**non-functional**: Datadog has DCR locked down, see the [package README](./packages/junior-datadog/README.md)) | -| `@sentry/junior-github` | GitHub plugin package for issue workflows | -| `@sentry/junior-hex` | Hex plugin package for data warehouse query workflows | -| `@sentry/junior-linear` | Linear plugin package for issue workflows | -| `@sentry/junior-notion` | Notion plugin package for page search workflows | -| `@sentry/junior-sentry` | Sentry plugin package for issue workflows | +| Package | Purpose | +| ------------------------------ | ---------------------------------------------------------------------------- | +| `@sentry/junior` | Core Slack bot runtime | +| `@sentry/junior-agent-browser` | Agent Browser plugin package for browser automation | +| `@sentry/junior-datadog` | Datadog plugin package for observability workflows through Datadog's Pup CLI | +| `@sentry/junior-github` | GitHub plugin package for issue workflows | +| `@sentry/junior-hex` | Hex plugin package for data warehouse query workflows | +| `@sentry/junior-linear` | Linear plugin package for issue workflows | +| `@sentry/junior-notion` | Notion plugin package for page search workflows | +| `@sentry/junior-sentry` | Sentry plugin package for issue workflows | diff --git a/packages/docs/src/content/docs/extend/datadog-plugin.md b/packages/docs/src/content/docs/extend/datadog-plugin.md index b62d2c8f..de6fcf06 100644 --- a/packages/docs/src/content/docs/extend/datadog-plugin.md +++ b/packages/docs/src/content/docs/extend/datadog-plugin.md @@ -1,6 +1,6 @@ --- title: Datadog Plugin -description: Configure the hosted Datadog MCP server for read-only observability workflows (logs, metrics, traces, monitors, incidents, dashboards). +description: Configure Datadog's Pup CLI for read-only observability workflows (logs, metrics, traces, monitors, incidents, dashboards). type: tutorial prerequisites: - /extend/ @@ -9,17 +9,11 @@ related: - /operate/security-hardening/ --- -:::danger[This plugin does not currently work] -Datadog's hosted MCP server requires OAuth Dynamic Client Registration ([DCR, RFC 7591](https://www.rfc-editor.org/rfc/rfc7591)) for third-party clients like Junior, and **DCR is locked down on Datadog's side**. Until Datadog exposes DCR (or an equivalent registration path) on `mcp.datadoghq.com`, Junior cannot complete the OAuth handshake and every Datadog tool call will fail at connect time. +The Datadog plugin installs Datadog's Pup CLI so Slack users can query Datadog telemetry from Junior: logs, metrics, APM traces/spans, monitors, incidents, dashboards, hosts, services, and RUM. -The `@sentry/junior-datadog` package is kept in-tree so the integration is ready to ship the moment Datadog unblocks DCR. **Do not add it to a production deployment in the meantime.** The rest of this page documents how the plugin will behave once Datadog unblocks DCR. -::: +Junior intentionally keeps this plugin read-only. The packaged manifest sets Pup read-only mode and the bundled skill instructs Junior to run `pup --read-only --agent` commands. It is for search, fetch, and analytics workflows, not Datadog mutations. -The Datadog plugin uses Datadog's hosted MCP server so Slack users can query their own Datadog account context — logs, metrics, APM traces, monitors, incidents, dashboards, and RUM — without sharing a workspace API key. - -Junior intentionally keeps this plugin read-only. The packaged manifest exposes only search-, fetch-, and analytics-oriented Datadog MCP tools. It does not expose notebook, monitor, SLO, or incident mutations, even though Datadog's MCP server supports some of them. - -The packaged plugin defaults to Datadog's US1 endpoint and enables the `core`, `apm`, and `error-tracking` toolsets. A Junior deployment points at exactly one Datadog org (which is pinned to exactly one region), so the site is a deployment-level setting, not a per-user or per-channel one. Operators on other sites select their site with the `DATADOG_SITE` env var — see [Non-US1 sites](#non-us1-sites) below. +The packaged plugin defaults to Datadog's US1 endpoint. A Junior deployment points at one Datadog org/site, so the site is a deployment-level setting, not a per-user or per-channel one. Operators on other sites select their site with the `DATADOG_SITE` env var; see [Non-US1 sites](#non-us1-sites). ## Install @@ -39,6 +33,18 @@ juniorNitro({ }); ``` +Set Datadog credentials in your Junior deployment environment: + +```bash +DATADOG_API_KEY=... +DATADOG_APP_KEY=... +DATADOG_SITE=datadoghq.com # optional; defaults to US1 +``` + +Use `DATADOG_API_KEY`, `DATADOG_APP_KEY`, and `DATADOG_SITE` in the Junior deployment environment. The plugin maps those host-side `DATADOG_*` values to Datadog API headers and Pup's sandbox `DD_*` env values. + +Use a Datadog application key with the smallest read scopes/role that covers the telemetry users need. + ## Optional channel defaults If a Slack channel usually investigates the same Datadog environment or service, store that as a conversation-scoped default: @@ -52,35 +58,37 @@ These defaults are optional fallbacks. If a user names a different env or servic ## Auth model -- No `DD_API_KEY`, `DD_APP_KEY`, or shared workspace integration secret is required. -- Each user completes OAuth the first time Junior calls a Datadog MCP tool on their behalf. -- Junior sends the authorization link privately, then resumes the same thread automatically after the user authorizes. -- Datadog MCP requires user-based OAuth (OAuth 2.1 + PKCE) and does not accept shared bearer tokens here, so this plugin is not suitable for fully headless automation. +- The plugin uses deployment-level Datadog API and application keys, not per-user OAuth. +- Junior keeps the real `DATADOG_API_KEY` and `DATADOG_APP_KEY` values host-side. +- Matching Datadog API requests from Pup receive host-managed `DD-API-KEY` and `DD-APPLICATION-KEY` headers. +- The sandbox receives only non-secret placeholder env values so Pup can perform its normal credential checks before making requests. +- Users do not connect or disconnect individual Datadog accounts from Junior App Home for this plugin. ## What users can do -- Search logs, events, RUM sessions, spans, and hosts scoped by env/service/time window. -- Run SQL-style log analytics (counts, top-N, group-bys) with `analyze_datadog_logs`. +- Search raw logs and aggregate log counts/top-N buckets. +- Search spans and aggregate latency/error buckets. +- Query metrics, find metric names, and inspect metric metadata/tag dimensions. - Inspect monitors and incidents to answer "is this alerting?" and "what is INC-123?". -- Fetch a trace or a notebook by ID. -- List services and their upstream/downstream dependencies from the Software Catalog. -- Query a metric by name and inspect its available tag dimensions before querying. -- Disconnect their account later from Junior App Home with `Unlink`. +- List APM services and service dependencies. +- List hosts and inspect host details. +- Fetch dashboards and notebooks by ID. +- Query RUM events, sessions, and frontend aggregates. ## Non-US1 sites -Datadog customers are region-pinned. The packaged manifest declares `DATADOG_SITE` in its `env-vars` block with a default of `datadoghq.com` (US1) and references it from `mcp.url`: +Datadog customers are region-pinned. The packaged manifest declares `DATADOG_SITE` with a default of `datadoghq.com` (US1), then exposes it to Pup as `DD_SITE`: ```yaml env-vars: DATADOG_SITE: default: datadoghq.com -mcp: - url: https://mcp.${DATADOG_SITE}/api/unstable/mcp-server/mcp?toolsets=core,apm,error-tracking +command-env: + DD_SITE: ${DATADOG_SITE} ``` -Set `DATADOG_SITE` in your Junior deployment env (e.g. Vercel project settings) to the hostname portion of your Datadog site: +Set `DATADOG_SITE` in your Junior deployment env (for example Vercel project settings) to the hostname portion of your Datadog site: | Datadog site | `DATADOG_SITE` value | | ------------ | ------------------------------------ | @@ -92,27 +100,26 @@ Set `DATADOG_SITE` in your Junior deployment env (e.g. Vercel project settings) | AP2 | `ap2.datadoghq.com` | | GovCloud | `ddog-gov.com` | -No code changes, no app-local plugin copy, no rebuild. Junior reads the variable at plugin-discovery time and hits the correct regional MCP endpoint. +The packaged API allowlist covers those standard Datadog sites. Custom or staging Datadog domains require a manifest change so Junior is allowed to inject headers for that host. ## Verify -Confirm a real user can connect and query successfully: +Confirm Junior can query Datadog successfully: 1. Ask Junior a Datadog question in a channel, for example: `What monitors are alerting for service checkout in prod right now?` -2. Complete the private OAuth flow when Junior prompts for it. -3. Confirm the thread resumes automatically with the monitor state (or incident / log / trace detail) and a Datadog deep link. -4. Open Junior App Home and confirm Datadog appears under `Connected accounts`. +2. Confirm the thread returns monitor state, incident/log/trace detail, or a clear "no results" answer. +3. Confirm the answer includes the query/time window used and a Datadog deep link when one is available. ## Failure modes -- No auth prompt or no resume: the user still needs to complete the OAuth flow. Retry the request and finish the private authorization flow when prompted. -- `401` mid-session: the Datadog OAuth token expired or was revoked; the runtime will resurface the authorization flow. Finish it and retry. -- `403 Forbidden` or `permission denied`: the user's Datadog role cannot read the requested resource. Verify their Datadog team/role assignments. -- `429 Too Many Requests`: the Datadog MCP endpoint is throttling. Junior retries once. If it still fails, the user should retry again shortly. +- `DATADOG_API_KEY` or `DATADOG_APP_KEY` missing: add both env vars to the Junior deployment and redeploy. +- `401 Unauthorized`: the API key or application key is invalid, revoked, or not being injected for the selected Datadog site. +- `403 Forbidden` or `permission denied`: the Datadog application key cannot read the requested resource. Verify its scopes/role. +- `429 Too Many Requests`: Datadog is throttling. Retry the request later or narrow the query. - Empty query results: env/service tag values are case-sensitive. Confirm the tag values exist and try a wider time window before widening the filter. -- Truncated trace response: very large traces are reported as truncated; the displayed spans are not the full trace. -- Mutation requests (create notebook, edit monitor, resolve incident): the plugin intentionally does not expose write tools. The skill will decline these. -- Wrong Datadog site: the packaged manifest defaults to US1. Operators on other sites must set `DATADOG_SITE` in the deployment env (see [Non-US1 sites](#non-us1-sites)). +- Partial span/trace output: Pup exposes span search; a trace ID search may not prove that every span in the trace was returned. +- Mutation requests (create notebook, edit monitor, submit metric, resolve incident): the plugin is read-only and the skill will decline these. +- Wrong Datadog site: set `DATADOG_SITE` in the deployment env (see [Non-US1 sites](#non-us1-sites)). ## Next step diff --git a/packages/docs/src/content/docs/extend/index.md b/packages/docs/src/content/docs/extend/index.md index 01279b7a..a50b8714 100644 --- a/packages/docs/src/content/docs/extend/index.md +++ b/packages/docs/src/content/docs/extend/index.md @@ -159,27 +159,28 @@ runtime-postinstall: - `capabilities`: actions the plugin’s skills may request, qualified as `.` - `config-keys`: provider-specific configuration keys, qualified as `.` - `api-domains` and `api-headers`: optional host-managed HTTP headers injected for matching sandbox requests +- `command-env`: optional non-secret sandbox env vars injected when provider credentials or API headers are enabled; use it for CLI placeholders and deployment defaults - `credentials`: how token auth is delivered to tools; current types are `oauth-bearer` and `github-app` - `oauth`: user OAuth setup; use it with `credentials.type: oauth-bearer` - `target`: optional credential target scope tied to a declared config key - `runtime-dependencies`: sandbox dependencies required by the plugin’s tools - `runtime-postinstall`: commands that run after dependency install and before snapshot capture - `mcp`: optional MCP server configuration for provider-scoped tool sources; `mcp.url` implies hosted HTTP transport, so `mcp.transport: http` is optional -- `env-vars`: optional map of deployment env vars the manifest may reference from `mcp.url` or `api-headers`. Each key names an env var (uppercase, `[A-Z_][A-Z0-9_]*`) and may declare a `default` for `mcp.url`; API header references cannot use defaults. +- `env-vars`: optional map of deployment env vars the manifest may reference from `mcp.url`, `api-headers`, or `command-env`. Each key names an env var (uppercase, `[A-Z_][A-Z0-9_]*`) and may declare a `default` for `mcp.url` and `command-env`; API header references cannot use defaults. - `mcp.url`: supports `${VAR}` placeholders that must be declared in `env-vars`. This lets region-pinned providers pick the right host at deploy time without a manifest fork. - `mcp.allowed-tools`: optional raw MCP tool-name allowlist when a plugin should expose only part of a provider's tool surface ### Env-var expansion in `mcp.url` -Some providers (Datadog, Sentry self-hosted, GitHub Enterprise, Linear EU, ...) have different hostnames per region or deployment. The packaged plugin manifest keeps a single `mcp.url` and declares the deployment-level env vars it may read in an `env-vars` block. Defaults live in the declaration, not inline in the URL: +Some providers (Sentry self-hosted, GitHub Enterprise, Linear EU, ...) have different hostnames per region or deployment. The packaged plugin manifest keeps a single `mcp.url` and declares the deployment-level env vars it may read in an `env-vars` block. Defaults live in the declaration, not inline in the URL: ```yaml env-vars: - DATADOG_SITE: - default: datadoghq.com + EXAMPLE_SITE: + default: example.com mcp: - url: https://mcp.${DATADOG_SITE}/api/unstable/mcp-server/mcp?toolsets=core,apm,error-tracking + url: https://mcp.${EXAMPLE_SITE}/mcp ``` The only supported placeholder form is `${NAME}` — replaced with `process.env[NAME]`, falling back to the declared `default`. Plugin discovery fails loudly at load time if `NAME` is not listed in `env-vars`, or if it is listed without a default and the env var is unset. @@ -212,6 +213,30 @@ api-headers: X-Api-Version: "2026-01-01" ``` +### Command env + +Use top-level `command-env` when a sandbox CLI needs non-secret env vars. This is commonly used for placeholder auth env vars so the CLI proceeds to make HTTP requests while Junior injects the real credentials as host-managed headers. + +`command-env` values may be literals or `${NAME}` placeholders declared in `env-vars`. Referenced env vars must declare defaults, because command env values are visible inside the sandbox and must not depend on secret deployment env vars. + +Manifests with `command-env` must also declare `credentials` or `api-headers`, since command env is delivered with the provider credential lease. + +```yaml +env-vars: + EXAMPLE_AUTH_HEADER: + EXAMPLE_SITE: + default: example.com + +api-domains: + - api.example.com +api-headers: + Authorization: ${EXAMPLE_AUTH_HEADER} + +command-env: + EXAMPLE_API_KEY: host_managed_credential + EXAMPLE_SITE: ${EXAMPLE_SITE} +``` + ### Add skills to the plugin Put at least one skill under `skills//SKILL.md`. Provider config keys belong in `plugin.yaml`, not in skill frontmatter. diff --git a/packages/junior-datadog/README.md b/packages/junior-datadog/README.md index 6d97e41d..d0476cf9 100644 --- a/packages/junior-datadog/README.md +++ b/packages/junior-datadog/README.md @@ -1,11 +1,6 @@ # @sentry/junior-datadog -> [!WARNING] -> **This plugin does not currently work.** Datadog's hosted MCP server requires OAuth Dynamic Client Registration (DCR, [RFC 7591](https://www.rfc-editor.org/rfc/rfc7591)) for third-party clients like Junior, and DCR is locked down on Datadog's side. Until Datadog exposes DCR (or an equivalent registration path) on `mcp.datadoghq.com`, Junior cannot complete the OAuth handshake and every Datadog tool call will fail. -> -> The package is kept in-tree so the integration is ready to ship the moment Datadog unblocks DCR. Do not add it to a production deployment in the meantime. - -`@sentry/junior-datadog` adds read-only Datadog telemetry workflows to Junior through Datadog's hosted MCP server. +`@sentry/junior-datadog` adds read-only Datadog telemetry workflows to Junior through Datadog's Pup CLI. Install it alongside `@sentry/junior`: @@ -21,13 +16,35 @@ juniorNitro({ }); ``` -This package does not use `DD_API_KEY`, `DD_APP_KEY`, or a shared workspace integration. Each user connects their own Datadog account the first time Junior calls a Datadog MCP tool. Junior sends the OAuth link privately and resumes the thread automatically after the user authorizes. +Set Datadog credentials in the Junior deployment environment: + +```bash +DATADOG_API_KEY=... +DATADOG_APP_KEY=... +DATADOG_SITE=datadoghq.com # optional; defaults to US1 +``` + +Use `DATADOG_API_KEY`, `DATADOG_APP_KEY`, and `DATADOG_SITE` in the Junior deployment environment. The plugin maps those host-side `DATADOG_*` values to Datadog API headers and Pup's sandbox `DD_*` env values. + +The real API and application keys stay host-side. Junior injects them into matching Datadog API requests as `DD-API-KEY` and `DD-APPLICATION-KEY` headers; the sandbox only receives non-secret placeholder values so Pup can perform its normal auth checks. -Junior intentionally keeps this package read-only by limiting the MCP tool surface to search, fetch, and log analytics tools. The plugin does not expose notebook writes, monitor edits, or other mutating Datadog tools. +Junior keeps this package read-only by setting Pup's read-only mode and by guiding the skill to use `pup --read-only --agent` commands. The plugin is intended for searches, fetches, and analytics across logs, metrics, traces/spans, monitors, incidents, dashboards, hosts, services, and RUM. ## Datadog site -The packaged manifest defaults to the US1 endpoint (`mcp.datadoghq.com`) and enables the `core`, `apm`, and `error-tracking` toolsets. Teams on other Datadog sites (US3, US5, EU, AP1, AP2, GovCloud) set `DATADOG_SITE` in their Junior deployment env to their site host (e.g. `us5.datadoghq.com`, `datadoghq.eu`, `ddog-gov.com`). No code changes or plugin copy needed. See the [Datadog plugin docs](https://junior.sentry.dev/extend/datadog-plugin/) for the full site table. +The packaged manifest defaults to the US1 API endpoint. Teams on other Datadog sites set `DATADOG_SITE` in their Junior deployment env to their site host. Setting deployment `DD_SITE` alone has no effect. + +| Datadog site | `DATADOG_SITE` value | +| ------------ | ------------------------------------ | +| US1 | _unset_ (default) or `datadoghq.com` | +| US3 | `us3.datadoghq.com` | +| US5 | `us5.datadoghq.com` | +| EU | `datadoghq.eu` | +| AP1 | `ap1.datadoghq.com` | +| AP2 | `ap2.datadoghq.com` | +| GovCloud | `ddog-gov.com` | + +The packaged API allowlist covers those standard Datadog sites. Custom or staging Datadog domains require a manifest change so the sandbox network header transform is allowed for that host. ## Optional channel defaults @@ -42,8 +59,8 @@ These defaults are optional fallbacks. If a user names a different env or servic ## Auth model -- Datadog MCP requires user-based OAuth (OAuth 2.1 + PKCE) and does not accept shared bearer tokens here. -- This package is not suitable for fully headless or unattended automation. -- Users can disconnect from Junior App Home with `Unlink`, or by asking Junior to disconnect Datadog. +- This package uses deployment-level Datadog API and application keys, not per-user OAuth. +- Use a Datadog application key with the smallest read scopes/role that covers the telemetry users need. +- Real key values never enter the sandbox env, files, or command arguments. Full setup guide: https://junior.sentry.dev/extend/datadog-plugin/ diff --git a/packages/junior-datadog/plugin.yaml b/packages/junior-datadog/plugin.yaml index 90e54dcf..0b06627c 100644 --- a/packages/junior-datadog/plugin.yaml +++ b/packages/junior-datadog/plugin.yaml @@ -1,36 +1,59 @@ name: datadog -description: Query Datadog telemetry (logs, metrics, traces, monitors, incidents, dashboards) via Datadog's hosted MCP server +description: Query Datadog telemetry (logs, metrics, traces, monitors, incidents, dashboards) with Datadog's Pup CLI config-keys: - env - service -# Datadog orgs are region-pinned. The MCP hostname must match the customer's -# Datadog site. Non-US1 operators set DATADOG_SITE to their site host (e.g. -# `us5.datadoghq.com`, `datadoghq.eu`, `ap1.datadoghq.com`, `ddog-gov.com`). -# US1 operators can leave DATADOG_SITE unset and the default applies. +capabilities: + - api + +# Datadog orgs are region-pinned. Pup routes requests to api.${DATADOG_SITE}. +# Deployment env vars use DATADOG_* names; Pup receives DD_* command env. +# Non-US1 operators set DATADOG_SITE to their site host (e.g. us5.datadoghq.com, +# datadoghq.eu, ap1.datadoghq.com, ddog-gov.com). US1 operators can leave +# DATADOG_SITE unset and the default applies. env-vars: + DATADOG_API_KEY: + DATADOG_APP_KEY: DATADOG_SITE: default: datadoghq.com -mcp: - url: https://mcp.${DATADOG_SITE}/api/unstable/mcp-server/mcp?toolsets=core,apm,error-tracking - allowed-tools: - - analyze_datadog_logs - - get_datadog_incident - - get_datadog_metric - - get_datadog_metric_context - - get_datadog_notebook - - get_datadog_trace - - search_datadog_dashboards - - search_datadog_events - - search_datadog_hosts - - search_datadog_incidents - - search_datadog_logs - - search_datadog_metrics - - search_datadog_monitors - - search_datadog_notebooks - - search_datadog_rum_events - - search_datadog_service_dependencies - - search_datadog_services - - search_datadog_spans +api-domains: + - api.datadoghq.com + - api.us3.datadoghq.com + - api.us5.datadoghq.com + - api.ap1.datadoghq.com + - api.ap2.datadoghq.com + - api.datadoghq.eu + - api.ddog-gov.com + +api-headers: + DD-API-KEY: ${DATADOG_API_KEY} + DD-APPLICATION-KEY: ${DATADOG_APP_KEY} + +command-env: + DD_API_KEY: host_managed_credential + DD_APP_KEY: host_managed_credential + DD_SITE: ${DATADOG_SITE} + DD_READ_ONLY: "1" + FORCE_AGENT_MODE: "1" + +runtime-postinstall: + - cmd: bash + args: + - -lc + - | + set -euo pipefail + version=0.58.5 + archive="pup_${version}_Linux_x86_64.tar.gz" + url="https://github.com/DataDog/pup/releases/download/v${version}/${archive}" + sha256="9543d968a6bd3b00da7ef20053717494beba7962e6cea01368d82857c8ea926b" + tmp="$(mktemp -d)" + trap 'rm -rf "$tmp"' EXIT + curl -fsSL "$url" -o "$tmp/$archive" + echo "${sha256} $tmp/$archive" | sha256sum -c - + tar -xzf "$tmp/$archive" -C "$tmp" + mkdir -p /vercel/sandbox/.junior/bin + install -m 0755 "$tmp/pup" /vercel/sandbox/.junior/bin/pup + pup --version diff --git a/packages/junior-datadog/skills/datadog/SKILL.md b/packages/junior-datadog/skills/datadog/SKILL.md index cbced9e1..c16a7cb3 100644 --- a/packages/junior-datadog/skills/datadog/SKILL.md +++ b/packages/junior-datadog/skills/datadog/SKILL.md @@ -1,11 +1,11 @@ --- name: datadog -description: Query live Datadog telemetry (logs, metrics, traces, spans, monitors, incidents, dashboards, services, hosts) through Datadog's hosted MCP server. Use when users ask to investigate production behavior in Datadog — searching logs, checking monitor status, inspecting traces or spans, looking up incidents, finding services, or correlating metrics. Do not use it for Sentry issues, repository/source-code work, or ticketing. +description: Query live Datadog telemetry (logs, metrics, traces, spans, monitors, incidents, dashboards, services, hosts) through Datadog's Pup CLI. Use when users ask to investigate production behavior in Datadog, including searching logs, checking monitor status, inspecting traces or spans, looking up incidents, finding services, or correlating metrics. Do not use it for Sentry issues, repository/source-code work, or ticketing. --- # Datadog Operations -Use this skill for Datadog observability investigations. +Use this skill for read-only Datadog observability investigations. ## Reference loading @@ -25,41 +25,43 @@ Load references conditionally based on the request: - Prefer explicit env, service, host, monitor/incident IDs, trace IDs, or Datadog URLs when the user provides them. - When the user did not specify a scope, treat `datadog.env` and `datadog.service` conversation config as optional defaults. Explicit user input always wins over config. - Only set or change `datadog.env` and `datadog.service` when the user explicitly asks to store a default for this conversation or channel. -- If the request refers to an earlier telemetry item indirectly (an incident, trace, or monitor already mentioned in the thread), inspect the current thread for the existing ID or URL before asking the user to restate it. +- If the request refers to an earlier telemetry item indirectly, inspect the current thread for the existing ID or URL before asking the user to restate it. - Ask one concise follow-up only when a search is genuinely under-specified, for example when the user asks about "errors" with no env, service, or time window hint and the thread has no prior context. -2. Use the active Datadog tools: - -- Start narrow: pick the single most direct tool for the request before reaching for broader search. - - Known incident ID → `get_datadog_incident` - - Known trace ID → `get_datadog_trace` - - Known notebook ID → `get_datadog_notebook` - - Known metric name → `get_datadog_metric` (and `get_datadog_metric_context` when the user wants available tags or dimensions) -- For exploratory questions, prefer one `search_datadog_*` call with a tight query, then one follow-up fetch if needed. -- For "what is the current error rate / log volume / top offenders" style questions, prefer `analyze_datadog_logs` (SQL-style aggregation) over pulling raw log pages back through `search_datadog_logs`. -- For service-topology questions ("what calls checkout?", "what does the payment API depend on?"), prefer `search_datadog_service_dependencies` over manually stitching spans together. -- Use `search_datadog_monitors` for "is this alerting?" or "what is monitor X doing?"; use `search_datadog_incidents` / `get_datadog_incident` for incident context. -- Use `search_datadog_rum_events` only when the user asks about real-user / browser telemetry, not for backend issues. +2. Use Pup: + +- Run Datadog commands with `pup --read-only --agent ...`. The plugin also sets read-only/agent env vars, but include the flags so command transcripts show the intended mode. +- If you are unsure about a command or flag, inspect Pup's schema with `pup --read-only --agent agent schema --compact` or the relevant `pup --read-only --agent --help` output before guessing. +- Start narrow: pick the single most direct command for the request before broader search. + - Known incident ID: `pup --read-only --agent incidents get ` + - Known monitor ID: `pup --read-only --agent monitors get ` + - Known notebook ID: `pup --read-only --agent notebooks get ` + - Known metric name: `pup --read-only --agent metrics query --query="avg:{...}" --from="15m" --to="now"`; use `metrics metadata get` or `metrics tags list` when the user wants available tags or dimensions. +- For exploratory questions, prefer one focused Pup search/list/aggregate command, then one follow-up fetch if needed. +- For "current error rate / log volume / top offenders" questions, prefer `pup logs aggregate` over pulling raw log pages back through `pup logs search`. +- For service-topology questions ("what calls checkout?", "what does the payment API depend on?"), prefer `pup apm dependencies list` or `pup apm flow-map` over stitching spans together manually. +- Use `pup monitors search` or `pup monitors list` for "is this alerting?" and `pup incidents list` / `pup incidents get` for incident context. +- Use RUM commands only when the user asks about real-user / browser telemetry, not for backend issues. 3. Bound every query: - Always constrain time windows. Default to the last 15 minutes for "right now" questions and the last 24 hours for retrospective questions; otherwise use the window the user named. - Always include `env:` when `datadog.env` is set or the user named an env. -- Always include `service:` when the user named a service or `datadog.service` is set and the tool is service-scoped. -- Cap result size. Prefer the default or small page sizes; do not page through thousands of logs when an aggregate tool answers the question. +- Always include `service:` when the user named a service or `datadog.service` is set and the command is service-scoped. +- Cap result size. Prefer the default or small page sizes; do not page through thousands of logs when an aggregate command answers the question. 4. Report the result: - Return the concrete answer first (counts, status, incident severity, trace timing, top offenders), then a short evidence block. -- Include Datadog deep links (e.g. `https://app.datadoghq.com/logs?query=...`, `https://app.datadoghq.com/apm/trace/`, `https://app.datadoghq.com/incidents/`) so Slack users can click through. -- Preserve interesting spans, log lines, or metric values inline only when they are the evidence for the answer. Do not dump raw tool output. -- Keep routine tool chatter silent. Do not narrate each MCP search or fetch step. +- Include Datadog deep links when Pup returns them or when you can construct a stable app link from an ID. Do not fabricate links from incomplete identifiers. +- Preserve interesting spans, log lines, or metric values inline only when they are evidence for the answer. Do not dump raw command output. +- Keep routine tool chatter silent. Do not narrate every Pup search or fetch step. ## Guardrails -- Read-only only in this skill. Do not create, edit, mute, or resolve monitors, incidents, notebooks, dashboards, SLOs, or feature flags — the plugin intentionally does not expose those tools. +- Read-only only in this skill. Do not create, edit, mute, delete, import, submit, or resolve monitors, incidents, notebooks, dashboards, SLOs, metrics, API keys, RUM resources, or other Datadog objects. - Log, RUM, APM, and incident payloads can contain PII or sensitive customer data. Quote only the minimum needed to answer the question. Do not paste full raw log bodies or span payloads when a summary plus a deep link is enough. -- If a Datadog tool returns a generic `403`, `permission denied`, or similar, stop and tell the user the current Datadog connection could not access the requested resource. Do not guess at missing RBAC scopes. +- If Pup returns `403`, `permission denied`, or similar, stop and tell the user the Datadog credentials could not access the requested resource. Do not guess at missing RBAC scopes. - If Datadog responds with `429 Too Many Requests`, wait briefly and retry the same query once. If it still fails, report the throttle and stop. -- For large traces that the server marks as truncated, report that fact; do not pretend the shown spans are complete. +- For large traces or span responses that are incomplete, report that fact; do not pretend the shown spans are complete. - Do not use this skill for Sentry issues, Linear/GitHub ticketing, or source-code investigation. Hand those off to the matching skill. diff --git a/packages/junior-datadog/skills/datadog/references/api-surface.md b/packages/junior-datadog/skills/datadog/references/api-surface.md index 026bea6c..3e1c7b9a 100644 --- a/packages/junior-datadog/skills/datadog/references/api-surface.md +++ b/packages/junior-datadog/skills/datadog/references/api-surface.md @@ -4,56 +4,54 @@ Use this reference for any Datadog operation. ## Provider surface -The packaged plugin points at Datadog's hosted remote MCP server and enables the `core`, `apm`, and `error-tracking` toolsets. Tool exposure is intentionally limited to the read-oriented surface below. - -### Tools exposed in this skill - -| Tool | Intent | -| ------------------------------------- | ----------------------------------------------------------------------------------- | -| `search_datadog_logs` | Search raw log events by filter (service, host, env, status, query, time window). | -| `analyze_datadog_logs` | SQL-style aggregation over logs for counts, group-bys, top-N, and numeric analysis. | -| `search_datadog_events` | Datadog Events API: deployments, infra changes, alerts, status events. | -| `search_datadog_metrics` | List available metrics by name pattern, tag, or service. | -| `get_datadog_metric` | Query a specific metric time series over a time window. | -| `get_datadog_metric_context` | Fetch metadata and available tag dimensions for a metric. | -| `search_datadog_spans` | Search APM spans by service, operation, tags, time, error state. | -| `get_datadog_trace` | Fetch a full trace by trace ID. | -| `search_datadog_services` | List services from the Software Catalog with ownership and tag metadata. | -| `search_datadog_service_dependencies` | Upstream/downstream service map for a service, or services owned by a team. | -| `search_datadog_hosts` | List monitored hosts with tags and health state. | -| `search_datadog_monitors` | List monitors, their statuses, and alert conditions. | -| `search_datadog_incidents` | List incidents with severity, state, and metadata. | -| `get_datadog_incident` | Retrieve a specific incident by ID (timeline detail may be absent). | -| `search_datadog_dashboards` | List available dashboards. | -| `search_datadog_notebooks` | List Datadog notebooks by author, tag, or content. | -| `get_datadog_notebook` | Fetch a notebook by ID. | -| `search_datadog_rum_events` | Search Datadog RUM (Real User Monitoring) events for browser / frontend issues. | - -### Tools intentionally not exposed - -- Notebook mutations (`create_datadog_notebook`, `edit_datadog_notebook`). -- Monitor, SLO, or incident mutations. -- Feature-flag, DBM, and security toolsets (the packaged URL does not request them). +The packaged plugin installs Datadog's `pup` CLI and configures it for agent-mode, read-only Datadog API access. Pup defaults to JSON output, which is the right format for analysis. + +Run commands as `pup --read-only --agent ...`. If a command surface is unclear, inspect `pup --read-only --agent agent schema --compact` or `pup --read-only --agent --help` before guessing. + +### Read-oriented commands + +| Need | Pup command pattern | +| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| Raw logs | `pup --read-only --agent logs search --query="service:checkout env:prod status:error" --from="15m" --limit=20` | +| Log aggregation | `pup --read-only --agent logs aggregate --query="service:checkout env:prod" --compute=count --group-by=status` | +| Metrics | `pup --read-only --agent metrics list`, `metrics search`, `metrics query`, `metrics metadata get`, `metrics tags list` | +| Spans / traces | `pup --read-only --agent traces search --query="service:checkout status:error" --from="15m" --limit=20` | +| Span aggregation | `pup --read-only --agent traces aggregate --query="service:checkout" --compute="percentile(@duration, 95)" --group-by=resource_name` | +| APM services | `pup --read-only --agent apm services list --env prod`, `apm services stats --env prod` | +| Service dependencies | `pup --read-only --agent apm dependencies list --env prod` or `apm flow-map --query="service:checkout"` | +| Monitors | `pup --read-only --agent monitors search --query="service:checkout"`, `monitors list --tags=service:checkout`, `monitors get ` | +| Incidents | `pup --read-only --agent incidents list --query="state:active" --limit=20`, `incidents get ` | +| Hosts | `pup --read-only --agent infrastructure hosts list --filter="env:prod" --count=50`, `infrastructure hosts get ` | +| Dashboards | `pup --read-only --agent dashboards list`, `dashboards get `, `dashboards url ` | +| Notebooks | `pup --read-only --agent notebooks list`, `notebooks get ` | +| RUM events and sessions | `pup --read-only --agent rum events --query='@type:error'`, `rum aggregate`, `rum sessions search` | + +### Commands to avoid + +Do not run write commands, even with `--read-only` present: + +- `create`, `update`, `delete`, `import`, `submit`, `cancel`, `mute`, `resolve`, or any command that writes a JSON file to Datadog. +- API key, app key, user, org policy, security, SLO, dashboard, monitor, incident, notebook, RUM metric, retention filter, playlist, or workflow mutations. If a user asks for a mutation, stop and explain that this skill is read-only. ## Operation patterns -| Intent | Minimum tool pattern | -| ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------- | -| "Why is service X failing right now?" | `search_datadog_monitors` + `analyze_datadog_logs` (top error counts by status or message) + optionally `get_datadog_trace` for one failing trace. | -| "Show me errors for service X in the last hour." | `analyze_datadog_logs` for counts/top-N first; only fall back to `search_datadog_logs` if the user asked for specific log lines. | -| "What is the status of monitor X?" | `search_datadog_monitors` with the monitor name/tag, then cite state + last transition time. | -| "Tell me about incident INC-123." | `get_datadog_incident` directly. Only fall back to `search_datadog_incidents` if no ID is known. | -| "What depends on the checkout service?" | `search_datadog_service_dependencies` scoped to that service. | -| "How did this trace spend its time?" | `get_datadog_trace` by ID; cite the slowest spans. | -| "What tag values are valid for this metric?" | `get_datadog_metric_context` before `get_datadog_metric`. | -| "Which hosts are unhealthy?" | `search_datadog_hosts` filtered by health/tags. | -| "Find slow page loads." | `search_datadog_rum_events` with a page/speed filter. | +| Intent | Minimum command pattern | +| ------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | +| "Why is service X failing right now?" | `monitors search/list` + `logs aggregate` for top errors + optionally `traces search` for representative failing spans. | +| "Show me errors for service X in the last hour." | `logs aggregate` for counts/top-N first; only use `logs search` if the user asked for specific log lines. | +| "What is the status of monitor X?" | `monitors search --query=...` or `monitors get `, then cite state and last transition if present. | +| "Tell me about incident INC-123." | `incidents get ` directly. Only fall back to `incidents list --query=...` if no ID is known. | +| "What depends on checkout?" | `apm dependencies list --env ` or `apm flow-map --query="service:checkout" --env `. | +| "How did this trace spend its time?" | `traces search --query="trace_id:"`; cite slowest/error spans. Pup exposes span search, not a guaranteed full tree. | +| "What tag values are valid for this metric?" | `metrics metadata get ` and `metrics tags list --from=... --to=...` before `metrics query`. | +| "Which hosts are unhealthy?" | `infrastructure hosts list --filter=...` with env/service/role filters. | +| "Find slow page loads." | `rum aggregate` or `rum events` with RUM facets and a bounded time window. | ## Content expectations -- Translate Slack-thread wording into stable observability language (env, service, status, span, monitor, incident, host). -- Preserve material URLs present in the conversation (Sentry, GitHub, dashboards, prior Datadog links) when they add evidence. -- Include Datadog deep links (`https://app.datadoghq.com/...`) with the answer so users can click through. -- Label assumptions clearly when the thread leaves important details uncertain (chosen env, chosen time window, chosen service). +- Translate Slack-thread wording into stable observability language: env, service, status, span, monitor, incident, host. +- Preserve material URLs present in the conversation when they add evidence. +- Include Datadog deep links when Pup returns them or when a stable ID-specific link is obvious. +- Label assumptions clearly when the thread leaves important details uncertain: chosen env, chosen time window, chosen service. diff --git a/packages/junior-datadog/skills/datadog/references/common-use-cases.md b/packages/junior-datadog/skills/datadog/references/common-use-cases.md index 086c2a7a..b4140740 100644 --- a/packages/junior-datadog/skills/datadog/references/common-use-cases.md +++ b/packages/junior-datadog/skills/datadog/references/common-use-cases.md @@ -5,77 +5,82 @@ Use these patterns to shape concrete Datadog requests. ## 1. Triage "service X is failing right now" - Default the time window to the last 15 minutes unless the user gave a different one. -- Constrain by `service:` and `env:` (explicit user input wins; fall back to `datadog.service` / `datadog.env`). -- `search_datadog_monitors` for `service:` first — a firing monitor usually names the failure mode. -- Then `analyze_datadog_logs` to aggregate by status/level/message to find the top error shape. -- If the user asks "why", fetch one representative failing trace with `get_datadog_trace` or `search_datadog_spans` filtered to `service: status:error`. -- Report monitor state, top error, and one failing trace link — not a dump. +- Constrain by `service:` and `env:`. Explicit user input wins; fall back to `datadog.service` / `datadog.env`. +- Run `pup --read-only --agent monitors search --query="service:"` or `monitors list --tags=service:,env:` first; a firing monitor usually names the failure mode. +- Then run `pup --read-only --agent logs aggregate --query="service: env:" --from="15m" --to="now" --compute=count --group-by=status` or group by an error facet such as `@error.kind`. +- If the user asks "why", search representative failing spans with `pup --read-only --agent traces search --query="service: env: status:error" --from="15m" --limit=20`. +- Report monitor state, top error, and one representative trace/span link when available. ## 2. "Is this monitor alerting?" -- Use `search_datadog_monitors` with the monitor name, tag, or ID. -- Report state (`OK`, `Warn`, `Alert`, `No Data`), last transition, and the monitor link. -- If the monitor is in `No Data`, note that explicitly — it is not the same as healthy. +- If the user gave a monitor ID, run `pup --read-only --agent monitors get `. +- Otherwise run `pup --read-only --agent monitors search --query=""` or `monitors list --name="" --tags=...`. +- Report state (`OK`, `Warn`, `Alert`, `No Data`), last transition if present, and the monitor link. +- If the monitor is in `No Data`, note that explicitly; it is not the same as healthy. ## 3. "Tell me about incident INC-123" or "What is the status of the Redis incident?" -- If the user named the incident ID, go straight to `get_datadog_incident`. -- If only a topic was named, use `search_datadog_incidents` filtered by active/severity and scan for a match in the thread's time window. -- Report severity, state, owner, and link to the incident. -- Note that incident timeline detail may be absent from the MCP response; do not fabricate timeline entries. +- If the user named the incident ID, run `pup --read-only --agent incidents get `. +- If only a topic was named, run `pup --read-only --agent incidents list --query="state:active " --limit=20` and scan for a match in the thread's time window. +- Report severity, state, owner/team if present, and link to the incident. +- Do not fabricate timeline entries if Pup does not return them. ## 4. Log search with a specific query -- Default to `search_datadog_logs` only when the user explicitly wants raw log lines. -- Constrain with `service:`, `env:`, `status:`, `host:`, or `@:` as appropriate (see `query-syntax.md`). +- Use `pup --read-only --agent logs search` only when the user explicitly wants raw log lines. +- Constrain with `service:`, `env:`, `status:`, `host:`, or `@:` as appropriate. - Cap page size and time window to avoid huge responses. -- Report a short summary plus a Datadog logs deep link. Quote only the minimum log content. +- Report a short summary plus a Datadog logs deep link when available. Quote only the minimum log content. ## 5. "What are the top errors for service X right now?" -- Prefer `analyze_datadog_logs` with a SQL-style `GROUP BY status` or `GROUP BY @http.status_code` / `GROUP BY @error.kind`. +- Prefer `pup --read-only --agent logs aggregate --query="service: env: status:error" --compute=count --group-by=@error.kind --limit=10`. +- Use `--group-by=@http.status_code`, `status`, `service`, `host`, or another facet when it better matches the question. - Report the top 3-5 buckets with counts, not an exhaustive table. -- Include the aggregated query link so the user can open the same view in Datadog. ## 6. Trace inspection by ID -- Use `get_datadog_trace` with the trace ID. -- Cite the top 3 slowest or error-tagged spans (service, operation, duration, error state). -- If the server marks the trace as truncated, say so — some spans are not present. +- Pup exposes span search. Use `pup --read-only --agent traces search --query="trace_id:" --from= --to= --limit=100`. +- Cite the top 3 slowest or error-tagged spans: service, resource/operation, duration, error state. +- If the returned spans look partial, say so. Do not claim a complete trace tree unless the output proves it. ## 7. Span search for a known error pattern -- Use `search_datadog_spans` with explicit filters like `service: status:error resource_name:"..."` and a bounded time window. -- Report span counts plus the most illustrative span's trace link. +- Use `pup --read-only --agent traces search --query='service: env: status:error resource_name:"..."' --from=... --to=...`. +- For counts or latency buckets, use `pup --read-only --agent traces aggregate --query="service: env:" --compute=count --group-by=resource_name`. +- Report counts plus the most illustrative span's trace link when available. ## 8. Service topology lookup -- Use `search_datadog_service_dependencies` to answer "what calls X?" or "what does X depend on?" or "what does team Y own?". -- Return the dependency list with service names and link back to the Service Catalog page. +- Use `pup --read-only --agent apm dependencies list --env --from=... --to=...` to answer dependency questions. +- Use `pup --read-only --agent apm flow-map --query="service:" --env --from=... --to=...` when the question is centered on one service. +- Return the dependency list with service names and a Service Catalog/APM link when available. ## 9. Metric lookup -- Use `search_datadog_metrics` when the user is unsure of the metric name. -- Once the metric name is known, use `get_datadog_metric` with the time window and tag filters. -- Use `get_datadog_metric_context` before querying if the user wants to know which tags (`env`, `service`, `host`, ...) are usable. -- Report headline numbers (current, peak, delta) plus a metric explorer link. +- Use `pup --read-only --agent metrics search --query=""` or `metrics list --filter=""` when the user is unsure of the metric name. +- Once the metric name is known, use `pup --read-only --agent metrics query --query="avg:{env:,service:}" --from=... --to=...`. +- Use `pup --read-only --agent metrics metadata get ` and `metrics tags list --from=... --to=...` before querying if the user wants valid tags. +- Report headline numbers: current, peak, delta, or bucketed values as appropriate. ## 10. Host health -- Use `search_datadog_hosts` filtered by tag, role, or `down:true`. -- Return counts, the list of unhealthy hosts (names + tags), and a host map link. +- Use `pup --read-only --agent infrastructure hosts list --filter="env: " --count=50`. +- Use `pup --read-only --agent infrastructure hosts get ` for a specific host. +- Return counts, unhealthy host names/tags, and a host map link when available. ## 11. RUM / frontend slowness -- Use `search_datadog_rum_events` only when the user asked about end-user / browser experience. +- Use `pup --read-only --agent rum aggregate` for top views/errors and `rum events` only when the user needs example events. +- Use `pup --read-only --agent rum sessions search` for session questions. - Constrain to `@type:error`, slow page loads, or specific views; bound the time window. -- Do not use RUM for backend errors — those live in logs/APM. +- Do not use RUM for backend errors; those live in logs/APM. ## 12. Dashboards and notebooks -- `search_datadog_dashboards` to list dashboards by topic, team, or tag — useful for "do we already have a dashboard for X?". -- `search_datadog_notebooks` + `get_datadog_notebook` for reading existing investigation notebooks. -- This skill does not create or edit dashboards or notebooks. If the user asks, stop and say so. +- `pup --read-only --agent dashboards list` and `dashboards get ` are useful for "do we already have a dashboard for X?". +- `pup --read-only --agent notebooks list` and `notebooks get ` are for reading investigation notebooks. +- This skill does not create or edit dashboards or notebooks. ## 13. Storing channel defaults diff --git a/packages/junior-datadog/skills/datadog/references/query-syntax.md b/packages/junior-datadog/skills/datadog/references/query-syntax.md index 6ee2c6f7..ba961bbf 100644 --- a/packages/junior-datadog/skills/datadog/references/query-syntax.md +++ b/packages/junior-datadog/skills/datadog/references/query-syntax.md @@ -1,22 +1,22 @@ # Query Syntax -Use this reference when forming Datadog log queries, span queries, and log analytics (`analyze_datadog_logs`) SQL. +Use this reference when forming Datadog log queries, span queries, RUM queries, and Pup aggregate commands. ## Log search query syntax Datadog log search queries are tag-and-facet based. Core building blocks: -| Form | Meaning | -| ------------------ | -------------------------------------------------------------------- | -| `service:` | Reserved attribute — service emitting the log. | -| `env:` | Reserved attribute — deployment environment tag. | -| `host:` | Reserved attribute — emitting host. | -| `status:` | Log level: `error`, `warn`, `info`, `debug`, etc. | -| `source:` | Log source integration (e.g. `nginx`, `python`). | -| `@:` | Faceted attribute (custom JSON field), e.g. `@http.status_code:500`. | -| `"some phrase"` | Free-text phrase search. | -| `AND`, `OR`, `-` | Boolean ops; `-` negates. Default operator between terms is `AND`. | -| `(a OR b) AND c` | Parenthesized boolean expression. | +| Form | Meaning | +| ------------------ | ------------------------------------------------------------------- | +| `service:` | Reserved attribute: service emitting the log. | +| `env:` | Reserved attribute: deployment environment tag. | +| `host:` | Reserved attribute: emitting host. | +| `status:` | Log level: `error`, `warn`, `info`, `debug`, etc. | +| `source:` | Log source integration, for example `nginx` or `python`. | +| `@:` | Faceted attribute: custom JSON field, e.g. `@http.status_code:500`. | +| `"some phrase"` | Free-text phrase search. | +| `AND`, `OR`, `-` | Boolean ops; `-` negates. Default operator between terms is `AND`. | +| `(a OR b) AND c` | Parenthesized boolean expression. | Common examples: @@ -31,47 +31,65 @@ Tips: - `status` and `@http.status_code` are different. `status` is the log level; `@http.status_code` is the HTTP response code. - Reserved attributes (`service`, `env`, `host`, `status`, `source`) do not take the `@` prefix. Custom fields do. +## Pup log commands + +- Raw logs: `pup --read-only --agent logs search --query="service:checkout env:prod status:error" --from="15m" --to="now" --limit=20` +- Alternate v2 listing: `pup --read-only --agent logs list --query="service:checkout env:prod" --from="1h" --limit=20` +- Aggregation: `pup --read-only --agent logs aggregate --query="service:checkout env:prod status:error" --compute=count --group-by=@error.kind --limit=10` + +`logs aggregate` options to prefer for analytics: + +- `--compute=count` for volume. +- `--compute="avg(@duration)"`, `sum(...)`, `min(...)`, `max(...)`, or `percentile(@duration, 95)` for numeric fields. +- `--group-by=status`, `service`, `host`, `@http.status_code`, `@error.kind`, or another facet. +- `--limit=10` unless the user needs more. + ## Span / APM search APM span search shares the same query language, plus a few APM-specific attributes: -| Attribute | Meaning | -| ------------------ | ------------------------------------------ | -| `service:` | Service emitting the span. | -| `env:` | Deployment environment tag. | -| `operation_name:X` | Span operation name (e.g. `http.request`). | -| `resource_name:X` | Endpoint or handler. | -| `status:error` | Span is marked as an error. | -| `duration:>500ms` | Range filter on span duration. | +| Attribute | Meaning | +| ------------------ | ----------------------------------------- | +| `service:` | Service emitting the span. | +| `env:` | Deployment environment tag. | +| `operation_name:X` | Span operation name, e.g. `http.request`. | +| `resource_name:X` | Endpoint or handler. | +| `status:error` | Span is marked as an error. | +| `@duration:>...` | Duration filter in nanoseconds. | + +Commands: + +- `pup --read-only --agent traces search --query="service:checkout env:prod status:error" --from="15m" --limit=20` +- `pup --read-only --agent traces aggregate --query="service:checkout env:prod" --compute="percentile(@duration, 95)" --group-by=resource_name` +- For a trace ID, use `traces search --query="trace_id:"` with a window that brackets the trace. Pup returns matching spans; do not assume it returned a complete tree unless the output proves it. + +## RUM queries -## `analyze_datadog_logs` SQL +Use RUM only for browser/user-experience questions: -`analyze_datadog_logs` takes SQL-like aggregations over the same log data. Prefer it for counts, top-N, group-bys, and time-bucketed analytics instead of paging raw logs. +- `pup --read-only --agent rum events --query='@type:error @application.name:"Web"' --from="1h" --limit=20` +- `pup --read-only --agent rum aggregate --query='@type:view' --compute="percentile(@view.loading_time, 95)" --group-by=@view.name` +- `pup --read-only --agent rum sessions search --query='@session.type:user' --from="1h" --limit=20` -Conventions: +## Metric queries -- Wrap log query filters in a `WHERE` clause using the same log-search query syntax (quoted as a string). -- Use `COUNT(*)` for volume, `COUNT(DISTINCT )` for unique cardinality. -- `GROUP BY` faceted fields (without `@` in the SQL form — the tool's schema specifies how to reference them; follow the tool's input schema exactly). -- Cap with `ORDER BY ... DESC LIMIT N` — top 5-10 is usually enough. +Datadog metric query strings follow the usual metric explorer shape: -Example intents (shape — not a literal string; call the tool with the input schema it advertises): +- `avg:system.cpu.user{env:prod,service:checkout}` +- `sum:trace.http.request.errors{env:prod,service:checkout}.as_count()` +- `p95:trace.http.request.duration{env:prod,service:checkout}` -- Top 10 services by error count in the last hour. -- HTTP 5xx count by status code in the last 15 minutes, grouped by `@http.status_code`. -- Log volume by `host` over the last hour to spot a noisy emitter. +Use `metrics search` or `metrics list` to find names, `metrics metadata get` for metadata, and `metrics tags list` for tag dimensions before querying when needed. ## Time windows - For "right now" questions, default to the last 15 minutes. - For "what happened earlier today" questions, default to the last 24 hours. - For incident-linked questions, prefer a window that brackets the incident `created` time. -- Always include a time window — unbounded queries are slow and easy to misinterpret. +- Always include a time window. Unbounded queries are slow and easy to misinterpret. ## What to cite back -- The exact query string used (`service:checkout env:prod status:error`) — users often want to click through. -- A Datadog deep link that encodes the same filter: - - `https://app.datadoghq.com/logs?query=&from_ts=&to_ts=` - - `https://app.datadoghq.com/apm/traces?query=` +- The exact query string used, for example `service:checkout env:prod status:error`. - The time window you used. +- A Datadog deep link when Pup returns one or when a stable ID-specific app link is available. diff --git a/packages/junior-datadog/skills/datadog/references/troubleshooting-workarounds.md b/packages/junior-datadog/skills/datadog/references/troubleshooting-workarounds.md index 913b488d..f0e6f80b 100644 --- a/packages/junior-datadog/skills/datadog/references/troubleshooting-workarounds.md +++ b/packages/junior-datadog/skills/datadog/references/troubleshooting-workarounds.md @@ -1,17 +1,23 @@ # Troubleshooting and Workarounds -Use this reference when Datadog MCP calls fail or return unexpected results. +Use this reference when Pup commands fail or return unexpected results. ## Permission and scope errors -- A Datadog API returning `403 Forbidden` or `permission denied` means the user's Datadog role cannot read that resource (metrics, APM, incidents, RUM, etc.). -- Stop and tell the user the current Datadog connection could not access the requested data. Suggest they verify their Datadog role/team. +- A `403 Forbidden` or `permission denied` response means the configured Datadog API/application keys cannot read that resource: metrics, APM, incidents, RUM, and so on. +- Stop and tell the user the current Datadog integration could not access the requested data. Suggest the operator verify the Datadog application key scopes/role. - Do not guess specific missing permission names unless Datadog explicitly named one in the error. - Do not loop retrying a 403. +## Authentication errors + +- A `401 Unauthorized`, `missing API key`, or `missing application key` error usually means `DATADOG_API_KEY` or `DATADOG_APP_KEY` is missing from the Junior deployment env, or the key was revoked. +- Pup receives placeholder env values in the sandbox so it will make HTTP requests; the host injects the real `DD-API-KEY` and `DD-APPLICATION-KEY` headers for Datadog API domains. +- Do not ask the user to paste keys into Slack or the sandbox. Tell the operator to fix the deployment env and retry. + ## Rate limits -- Datadog throttles the unstable MCP endpoint. A `429 Too Many Requests` response is expected under load. +- Datadog API endpoints can return `429 Too Many Requests`. - Retry the same query once after a short wait. - If it fails again, report the throttle and stop. Do not fall back to larger scans that will throttle harder. @@ -19,21 +25,24 @@ Use this reference when Datadog MCP calls fail or return unexpected results. - Double-check that `env:` and `service:` match real values. Datadog tag values are case-sensitive. - Widen the time window before widening the filter. Many "no results" cases are just too narrow a window. -- If searching logs with `@:value`, confirm the field exists as a facet; custom log attributes must be facetized in Datadog to be searchable. -- If an expected monitor or incident is missing, the user's account may not have access to that workspace or team. +- If searching logs or RUM with `@:value`, confirm the field exists as a facet. +- If an expected monitor or incident is missing, the application key may not have access to that team/resource. ## Too many results / large payloads -- Prefer `analyze_datadog_logs` with `GROUP BY` + `LIMIT` over paging raw logs. -- For traces marked truncated by the server, say so in the reply. Do not pretend the shown spans are complete. +- Prefer `pup --read-only --agent logs aggregate` or `traces aggregate` with `--group-by` + `--limit` over paging raw events. +- For span/trace responses that look partial, say so in the reply. Do not pretend the shown spans are complete. - Quote only the minimum log / span / metric content needed as evidence. Link to Datadog for the rest. ## Multiple Datadog sites -- The packaged plugin defaults to the US1 endpoint (`mcp.datadoghq.com`). The manifest declares `DATADOG_SITE` in its `env-vars` block with a default of `datadoghq.com` and references it from `mcp.url` as `${DATADOG_SITE}`, so non-US1 operators (US3, US5, EU, AP1, AP2, GovCloud) set `DATADOG_SITE` in their Junior deployment env to their site host (e.g. `us5.datadoghq.com`, `datadoghq.eu`, `ddog-gov.com`). Users hitting auth failures against the wrong regional endpoint should have the operator confirm `DATADOG_SITE` is set correctly. -- If the user's Datadog account lives on a different site than the deployment is configured for, advise the operator to update the `DATADOG_SITE` environment variable. Do not try to work around this silently inside a turn. +- The packaged plugin defaults to US1 (`datadoghq.com`) and sets Pup's `DD_SITE` from the manifest `DATADOG_SITE` env var. +- Non-US1 operators set `DATADOG_SITE` in their Junior deployment env to their site host, for example `us5.datadoghq.com`, `datadoghq.eu`, or `ddog-gov.com`. +- Setting deployment `DD_SITE` alone has no effect; the plugin owns Pup's sandbox `DD_SITE` through `DATADOG_SITE`. +- The packaged plugin allows the standard Datadog API hosts for US1, US3, US5, EU, AP1, AP2, and GovCloud. A custom or staging Datadog domain needs a manifest change so the API domain allowlist matches. +- If the user's Datadog account lives on a different site than the deployment is configured for, advise the operator to update `DATADOG_SITE`. Do not try to work around this silently inside a turn. ## Read-only scope -- This skill intentionally exposes only read-oriented Datadog tools. -- If the user asks to create a notebook, edit a monitor, mute an alert, or resolve an incident, stop and tell them those actions are not in scope. Do not attempt to approximate the mutation from read tools. +- This skill intentionally uses only read-oriented Pup commands. +- If the user asks to create a notebook, edit a monitor, mute an alert, submit a metric, or resolve an incident, stop and tell them those actions are not in scope. diff --git a/packages/junior/src/chat/capabilities/factory.ts b/packages/junior/src/chat/capabilities/factory.ts index 2d599ee0..6a69cf73 100644 --- a/packages/junior/src/chat/capabilities/factory.ts +++ b/packages/junior/src/chat/capabilities/factory.ts @@ -67,6 +67,9 @@ export function createSkillCapabilityRuntime( provider: name, headerTransforms: () => resolveTestApiHeaderTransforms(plugin.manifest), + ...(plugin.manifest.commandEnv + ? { env: plugin.manifest.commandEnv } + : {}), }) : createPluginBroker(name, { userTokenStore }); continue; @@ -86,6 +89,9 @@ export function createSkillCapabilityRuntime( resolveTestApiHeaderTransforms(plugin.manifest), } : {}), + ...(plugin.manifest.commandEnv + ? { env: plugin.manifest.commandEnv } + : {}), envKey: credentials.authTokenEnv, placeholder, }) diff --git a/packages/junior/src/chat/credentials/test-broker.ts b/packages/junior/src/chat/credentials/test-broker.ts index 95625d5d..1dd84f97 100644 --- a/packages/junior/src/chat/credentials/test-broker.ts +++ b/packages/junior/src/chat/credentials/test-broker.ts @@ -11,6 +11,7 @@ interface TestBrokerConfig { domains?: string[]; apiHeaders?: Record; headerTransforms?: () => CredentialHeaderTransform[]; + env?: Record; envKey?: string; placeholder?: string; } @@ -27,10 +28,12 @@ export class TestCredentialBroker implements CredentialBroker { const token = process.env.EVAL_TEST_CREDENTIAL_TOKEN?.trim() || "eval-test-token"; const expiresAt = new Date(Date.now() + 5 * 60 * 1000).toISOString(); - const env = - this.config.envKey && this.config.placeholder + const env = { + ...(this.config.env ?? {}), + ...(this.config.envKey && this.config.placeholder ? { [this.config.envKey]: this.config.placeholder } - : {}; + : {}), + }; const tokenTransforms = this.config.domains?.map((domain) => ({ domain, diff --git a/packages/junior/src/chat/plugins/auth/api-headers-broker.ts b/packages/junior/src/chat/plugins/auth/api-headers-broker.ts index ef35550e..8c659fef 100644 --- a/packages/junior/src/chat/plugins/auth/api-headers-broker.ts +++ b/packages/junior/src/chat/plugins/auth/api-headers-broker.ts @@ -60,7 +60,7 @@ export function createApiHeadersBroker( return { id: randomUUID(), provider, - env: {}, + env: { ...(manifest.commandEnv ?? {}) }, headerTransforms, expiresAt: new Date(Date.now() + MAX_LEASE_MS).toISOString(), metadata: { diff --git a/packages/junior/src/chat/plugins/auth/github-app-broker.ts b/packages/junior/src/chat/plugins/auth/github-app-broker.ts index eddf765e..697661b3 100644 --- a/packages/junior/src/chat/plugins/auth/github-app-broker.ts +++ b/packages/junior/src/chat/plugins/auth/github-app-broker.ts @@ -284,7 +284,7 @@ export function createGitHubAppBroker( return { id: randomUUID(), provider, - env: { [authTokenEnv]: placeholder }, + env: { ...(manifest.commandEnv ?? {}), [authTokenEnv]: placeholder }, headerTransforms: mergeHeaderTransforms([ ...pluginHeaderTransforms(), ...leaseDomains.map((domain) => ({ @@ -338,7 +338,7 @@ export function createGitHubAppBroker( return { id: randomUUID(), provider, - env: { [authTokenEnv]: placeholder }, + env: { ...(manifest.commandEnv ?? {}), [authTokenEnv]: placeholder }, headerTransforms: mergeHeaderTransforms([ ...pluginHeaderTransforms(), ...leaseDomains.map((domain) => ({ diff --git a/packages/junior/src/chat/plugins/auth/oauth-bearer-broker.ts b/packages/junior/src/chat/plugins/auth/oauth-bearer-broker.ts index 8da97783..449bc5bb 100644 --- a/packages/junior/src/chat/plugins/auth/oauth-bearer-broker.ts +++ b/packages/junior/src/chat/plugins/auth/oauth-bearer-broker.ts @@ -84,7 +84,10 @@ export function createOAuthBearerBroker( return { id: randomUUID(), provider, - env: { [authTokenEnv]: authTokenPlaceholder }, + env: { + ...(manifest.commandEnv ?? {}), + [authTokenEnv]: authTokenPlaceholder, + }, headerTransforms: mergeHeaderTransforms([ ...pluginHeaderTransforms(), ...apiDomains.map((domain) => ({ diff --git a/packages/junior/src/chat/plugins/manifest.ts b/packages/junior/src/chat/plugins/manifest.ts index 3bc572c4..979ca67d 100644 --- a/packages/junior/src/chat/plugins/manifest.ts +++ b/packages/junior/src/chat/plugins/manifest.ts @@ -268,6 +268,7 @@ const manifestSourceSchema = z .optional(), "api-domains": apiDomainsSchema.optional(), "api-headers": stringMapSchema.optional(), + "command-env": stringMapSchema.optional(), credentials: z .record(z.string(), z.unknown(), { error: "must be an object when provided", @@ -383,6 +384,51 @@ function normalizeRequiredApiHeaders( return apiHeaders; } +function assertCommandEnvReferencesArePublic( + value: string, + envVars: Record, + context: string, +): void { + for (const match of value.matchAll(ENV_PLACEHOLDER_RE)) { + const name = match[1] as string; + if (!Object.prototype.hasOwnProperty.call(envVars, name)) { + throw new Error( + `${context} references env var ${name} which is not declared in env-vars`, + ); + } + if (envVars[name]?.default === undefined) { + throw new Error( + `${context} references env var ${name}, but command-env env vars must declare defaults`, + ); + } + } +} + +function normalizeCommandEnv( + value: Record, + prefix: string, + envVars: Record, +): Record { + const env = normalizeStringMap(value, prefix); + if (!env) { + throw new Error(`${prefix} must contain at least one env var`); + } + + for (const [key, envValue] of Object.entries(env)) { + if (!ENV_VAR_NAME_RE.test(key)) { + throw new Error(`${prefix}.${key} must be an uppercase env var name`); + } + assertCommandEnvReferencesArePublic(envValue, envVars, `${prefix}.${key}`); + } + + return Object.fromEntries( + Object.entries(env).map(([key, envValue]) => [ + key, + expandEnvPlaceholders(envValue, envVars, `${prefix}.${key}`), + ]), + ); +} + function normalizeCredentials( data: Record, name: string, @@ -759,6 +805,11 @@ export function parsePluginManifest(raw: string, dir: string): PluginManifest { `Plugin ${(parsedYaml as { name?: string }).name ?? "unknown"} api-headers must be an object when provided`, ); } + if (path === "command-env") { + throw new Error( + `Plugin ${(parsedYaml as { name?: string }).name ?? "unknown"} command-env must be an object when provided`, + ); + } if (path === "credentials") { throw new Error( `Plugin ${(parsedYaml as { name?: string }).name ?? "unknown"} credentials must be an object when provided`, @@ -830,10 +881,22 @@ export function parsePluginManifest(raw: string, dir: string): PluginManifest { if (data["api-domains"] && !apiHeaders) { throw new Error(`Plugin ${data.name} api-domains requires api-headers`); } + const commandEnv = data["command-env"] + ? normalizeCommandEnv( + data["command-env"], + `Plugin ${data.name} command-env`, + envVars, + ) + : undefined; const credentials = data.credentials ? normalizeCredentials(data.credentials, data.name) : undefined; + if (commandEnv && !credentials && !apiHeaders) { + throw new Error( + `Plugin ${data.name} command-env requires credentials or api-headers`, + ); + } const runtimeDependencies = data["runtime-dependencies"] ? normalizeRuntimeDependencies(data["runtime-dependencies"], data.name) : undefined; @@ -849,6 +912,7 @@ export function parsePluginManifest(raw: string, dir: string): PluginManifest { configKeys, ...(data["api-domains"] ? { apiDomains: data["api-domains"] } : {}), ...(apiHeaders ? { apiHeaders } : {}), + ...(commandEnv ? { commandEnv } : {}), ...(Object.keys(envVars).length > 0 ? { envVars } : {}), ...(credentials ? { credentials } : {}), ...(runtimeDependencies ? { runtimeDependencies } : {}), diff --git a/packages/junior/src/chat/plugins/types.ts b/packages/junior/src/chat/plugins/types.ts index f376a19a..f5c8a5c3 100644 --- a/packages/junior/src/chat/plugins/types.ts +++ b/packages/junior/src/chat/plugins/types.ts @@ -84,6 +84,7 @@ export interface PluginManifest { configKeys: string[]; apiDomains?: string[]; apiHeaders?: Record; + commandEnv?: Record; envVars?: Record; credentials?: PluginCredentials; runtimeDependencies?: PluginRuntimeDependency[]; diff --git a/packages/junior/src/chat/skills.ts b/packages/junior/src/chat/skills.ts index 2f3361c8..196cbb36 100644 --- a/packages/junior/src/chat/skills.ts +++ b/packages/junior/src/chat/skills.ts @@ -288,6 +288,7 @@ function formatManifestSurface(manifest: PluginManifest): string { if (manifest.runtimePostinstall?.length) surface.push("postinstall steps"); if (manifest.mcp) surface.push("MCP tools"); if (manifest.credentials) surface.push("credentials"); + if (manifest.commandEnv) surface.push("command env"); if (manifest.oauth) surface.push("OAuth"); if (manifest.configKeys.length > 0) surface.push("config keys"); @@ -300,7 +301,7 @@ function buildPluginRuntimeBoundary(manifest: PluginManifest): string { "", `The ${manifest.name} plugin manifest, not this skill's prose, controls runtime setup.`, `Manifest-owned surface: ${formatManifestSurface(manifest)}.`, - "Do not install provider runtime packages, run installer scripts, configure API keys, create OAuth clients, or set up MCP servers because this skill says to.", + "Do not install provider runtime packages, run installer scripts, configure API keys or command env, create OAuth clients, or set up MCP servers because this skill says to.", `If that surface is unavailable, report a ${manifest.name} plugin runtime setup failure instead of repairing setup from the skill workflow.`, ].join("\n"); } diff --git a/packages/junior/tests/unit/capabilities/capability-factory.test.ts b/packages/junior/tests/unit/capabilities/capability-factory.test.ts index ba4f62f5..8f0f8bd5 100644 --- a/packages/junior/tests/unit/capabilities/capability-factory.test.ts +++ b/packages/junior/tests/unit/capabilities/capability-factory.test.ts @@ -59,6 +59,9 @@ describe("capability runtime factory", () => { Authorization: "Bearer ${EXAMPLE_API_HEADER}", "X-Api-Version": "2026-01-01", }, + commandEnv: { + EXAMPLE_API_KEY: "host_managed_credential", + }, }, dir: "/tmp/example", skillsDir: "/tmp/example/skills", @@ -77,7 +80,9 @@ describe("capability runtime factory", () => { ).resolves.toMatchObject({ reused: false }); expect(createPluginBrokerMock).not.toHaveBeenCalled(); - expect(runtime.getTurnEnv()).toBeUndefined(); + expect(runtime.getTurnEnv()).toEqual({ + EXAMPLE_API_KEY: "host_managed_credential", + }); expect(runtime.getTurnHeaderTransforms()).toEqual([ { domain: "api.example.com", diff --git a/packages/junior/tests/unit/plugins/api-headers-broker.test.ts b/packages/junior/tests/unit/plugins/api-headers-broker.test.ts index 7c4143bd..892cf93d 100644 --- a/packages/junior/tests/unit/plugins/api-headers-broker.test.ts +++ b/packages/junior/tests/unit/plugins/api-headers-broker.test.ts @@ -41,6 +41,22 @@ describe("API headers broker", () => { ]); }); + it("includes plugin command env in issued leases", async () => { + process.env.EXAMPLE_AUTH_HEADER = "Basic abc123"; + + const broker = createApiHeadersBroker({ + ...MANIFEST, + commandEnv: { + EXAMPLE_API_KEY: "host_managed_credential", + }, + }); + const lease = await broker.issue({ reason: "test:command-env" }); + + expect(lease.env).toEqual({ + EXAMPLE_API_KEY: "host_managed_credential", + }); + }); + it("throws when an env-backed header references a missing env var", async () => { delete process.env.EXAMPLE_AUTH_HEADER; diff --git a/packages/junior/tests/unit/plugins/plugin-manifest-api-headers.test.ts b/packages/junior/tests/unit/plugins/plugin-manifest-api-headers.test.ts index 37b56672..e0a02b7e 100644 --- a/packages/junior/tests/unit/plugins/plugin-manifest-api-headers.test.ts +++ b/packages/junior/tests/unit/plugins/plugin-manifest-api-headers.test.ts @@ -1,3 +1,5 @@ +import { readFileSync } from "node:fs"; +import path from "node:path"; import { describe, expect, it } from "vitest"; import { parsePluginManifest } from "@/chat/plugins/manifest"; @@ -26,6 +28,89 @@ describe("plugin manifest API headers", () => { }); }); + it("parses command env with literals and default-backed env references", () => { + const manifest = parsePluginManifest( + [ + "name: example", + "description: Example API access", + "env-vars:", + " EXAMPLE_AUTH_HEADER:", + " EXAMPLE_SITE:", + " default: example.com", + "api-domains:", + " - api.example.com", + "api-headers:", + ' Authorization: "${EXAMPLE_AUTH_HEADER}"', + "command-env:", + " EXAMPLE_API_KEY: host_managed_credential", + ' EXAMPLE_SITE: "${EXAMPLE_SITE}"', + ].join("\n"), + "/tmp/example", + ); + + expect(manifest.commandEnv).toEqual({ + EXAMPLE_API_KEY: "host_managed_credential", + EXAMPLE_SITE: "example.com", + }); + }); + + it("parses the packaged Datadog manifest", () => { + const manifestPath = path.resolve( + process.cwd(), + "../junior-datadog/plugin.yaml", + ); + const manifest = parsePluginManifest( + readFileSync(manifestPath, "utf8"), + path.dirname(manifestPath), + ); + + expect(manifest.name).toBe("datadog"); + expect(manifest.apiHeaders).toEqual({ + "DD-API-KEY": "${DATADOG_API_KEY}", + "DD-APPLICATION-KEY": "${DATADOG_APP_KEY}", + }); + expect(manifest.commandEnv).toEqual({ + DD_API_KEY: "host_managed_credential", + DD_APP_KEY: "host_managed_credential", + DD_SITE: "datadoghq.com", + DD_READ_ONLY: "1", + FORCE_AGENT_MODE: "1", + }); + expect(manifest.runtimePostinstall).toHaveLength(1); + }); + + it("rejects command env references without defaults", () => { + expect(() => + parsePluginManifest( + [ + "name: example", + "description: Example API access", + "env-vars:", + " EXAMPLE_SECRET:", + "command-env:", + ' EXAMPLE_TOKEN: "${EXAMPLE_SECRET}"', + ].join("\n"), + "/tmp/example", + ), + ).toThrow( + "Plugin example command-env.EXAMPLE_TOKEN references env var EXAMPLE_SECRET, but command-env env vars must declare defaults", + ); + }); + + it("rejects command env without credentials or API headers", () => { + expect(() => + parsePluginManifest( + [ + "name: example", + "description: Example CLI access", + "command-env:", + " EXAMPLE_TOKEN: host_managed_credential", + ].join("\n"), + "/tmp/example", + ), + ).toThrow("Plugin example command-env requires credentials or api-headers"); + }); + it("rejects API headers without API domains", () => { expect(() => parsePluginManifest( diff --git a/packages/junior/tests/unit/plugins/test-broker.test.ts b/packages/junior/tests/unit/plugins/test-broker.test.ts index ee4bdecd..cdb837c3 100644 --- a/packages/junior/tests/unit/plugins/test-broker.test.ts +++ b/packages/junior/tests/unit/plugins/test-broker.test.ts @@ -30,12 +30,19 @@ describe("test credential broker", () => { }, }, ], + env: { + EXAMPLE_SITE: "example.com", + }, envKey: "EXAMPLE_TOKEN", placeholder: "host_managed_credential", }); const lease = await broker.issue({ reason: "test:headers" }); + expect(lease.env).toEqual({ + EXAMPLE_SITE: "example.com", + EXAMPLE_TOKEN: "host_managed_credential", + }); expect(lease.headerTransforms).toEqual([ { domain: "uploads.example.com", diff --git a/packages/junior/tests/unit/skills/skills.test.ts b/packages/junior/tests/unit/skills/skills.test.ts index 55828795..4e93384d 100644 --- a/packages/junior/tests/unit/skills/skills.test.ts +++ b/packages/junior/tests/unit/skills/skills.test.ts @@ -321,7 +321,7 @@ describe("skills", () => { "Manifest-owned surface: runtime packages, MCP tools, credentials, config keys.", ); expect(loaded?.body).toContain( - "Do not install provider runtime packages, run installer scripts, configure API keys, create OAuth clients, or set up MCP servers because this skill says to.", + "Do not install provider runtime packages, run installer scripts, configure API keys or command env, create OAuth clients, or set up MCP servers because this skill says to.", ); expect(loaded?.body).toContain( "Run `npm install example-cli` before using this skill.", diff --git a/specs/plugin-spec.md b/specs/plugin-spec.md index cdfb71c3..e459e893 100644 --- a/specs/plugin-spec.md +++ b/specs/plugin-spec.md @@ -3,7 +3,7 @@ ## Metadata - Created: 2026-03-01 -- Last Edited: 2026-05-03 +- Last Edited: 2026-05-08 ## Changelog @@ -22,6 +22,7 @@ - 2026-04-28: Kept MCP execution behind stable `callMcpTool` while disclosing searchable MCP catalogs through `loadSkill`, `searchMcpTools`, and ``. - 2026-04-30: Added install-wide config defaults via `createApp({ configDefaults })` with channel-scoped override precedence. - 2026-05-03: Added plugin-level `api-headers` injection backed by declared deployment env vars. +- 2026-05-08: Added plugin-level `command-env` for non-secret sandbox CLI placeholders and default-backed deployment values. ## Status @@ -59,7 +60,7 @@ Define a plugin model where provider integrations are self-contained directories 6. Plugin-declared MCP tools are host-managed and activated only after a skill from the same plugin is loaded for the turn. 7. Pi sees stable native tools (`loadSkill`, `searchMcpTools`, and `callMcpTool`) at turn start. After a plugin-backed skill is loaded, the runtime activates that plugin's discovered MCP tools for search and execution. 8. `loadSkill` activates the provider catalog and returns provider/count metadata once the MCP server is connected and `listTools` succeeds. If connection/listing needs MCP OAuth, `loadSkill` initiates the MCP auth pause and the resumed turn re-activates the catalog before the model continues. `searchMcpTools` returns focused descriptors, including input/output schema and annotations, for any available active-provider tool before `callMcpTool` executes it. -9. Runtime setup belongs to `plugin.yaml`: CLI packages, system packages, postinstall commands, MCP endpoints/tool allowlists, credential delivery, OAuth, and provider config keys are manifest declarations, not skill instructions. +9. Runtime setup belongs to `plugin.yaml`: CLI packages, system packages, postinstall commands, MCP endpoints/tool allowlists, credential delivery, command env, OAuth, and provider config keys are manifest declarations, not skill instructions. 10. Skills consume the plugin-provided runtime surface. They must not instruct the agent to install packages, bootstrap CLIs, configure MCP servers, create credentials, or repair sandbox package installation as part of normal workflow. ## Plugin directory structure @@ -157,12 +158,18 @@ capabilities: env-vars: BETTER_STACK_AUTH_HEADER: + BETTER_STACK_SITE: + default: betterstack.com api-domains: - api.betterstack.com api-headers: Authorization: ${BETTER_STACK_AUTH_HEADER} Content-Type: application/json + +command-env: + BETTER_STACK_API_KEY: host_managed_credential + BETTER_STACK_SITE: ${BETTER_STACK_SITE} ``` ## Plugin manifest contract @@ -176,50 +183,51 @@ api-headers: ### Optional fields -| Field | Type | Rules | -| ------------------------------------ | ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `capabilities` | `string[]` | Short names (e.g. `issues.read`). Qualified to `.issues.read` by the registry. No qualified capability may appear in more than one plugin. | -| `config-keys` | `string[]` | Short names (e.g. `org`). Qualified to `.org` by the registry. | -| `api-domains` | `string[]` | Optional domains for plugin-level API header injection. Required when `api-headers` is set. | -| `api-headers` | `Record` | Optional headers injected for matching `api-domains`. Values may reference `${NAME}` placeholders declared in `env-vars`; referenced env vars must not declare defaults. | -| `credentials` | `object` | Credential delivery configuration. | -| `credentials.type` | `string` | `"oauth-bearer"` or `"github-app"`. | -| `credentials.api-domains` | `string[]` | Domains for token-backed header transforms. At least one required. | -| `credentials.api-headers` | `Record` | Optional extra headers applied alongside runtime-managed `Authorization` for `oauth-bearer` and `github-app`; `Authorization` itself is reserved for those types. Prefer plugin-level `api-headers` for new manifests. | -| `credentials.auth-token-env` | `string` | Env var name for static token fallback and sandbox placeholder. Required for `oauth-bearer` and `github-app`. | -| `credentials.auth-token-placeholder` | `string` | Optional non-secret placeholder injected into sandbox env for CLI compatibility. Applies to `oauth-bearer` and `github-app`. | -| `credentials.app-id-env` | `string` | Env var name for GitHub App ID. Required when `credentials.type` is `"github-app"`. | -| `credentials.private-key-env` | `string` | Env var name for GitHub App private key (PEM). Required when `credentials.type` is `"github-app"`. | -| `credentials.installation-id-env` | `string` | Env var name for GitHub App installation ID. Required when `credentials.type` is `"github-app"`. | -| `oauth` | `object` | OAuth provider configuration. Requires `credentials.type` = `"oauth-bearer"`. | -| `oauth.client-id-env` | `string` | Env var name for client ID. | -| `oauth.client-secret-env` | `string` | Env var name for client secret. | -| `oauth.authorize-endpoint` | `string` | Valid HTTPS URL. | -| `oauth.token-endpoint` | `string` | Valid HTTPS URL. | -| `oauth.scope` | `string` | Optional OAuth scope string. | -| `oauth.authorize-params` | `Record` | Optional authorize URL params added alongside core params. Reserved OAuth param names may not be overridden. | -| `oauth.token-auth-method` | `string` | Optional token client auth method: `"body"` (default) or `"basic"`. | -| `oauth.token-extra-headers` | `Record` | Optional token request headers. `Authorization` is reserved; `Content-Type` controls token body serialization. | -| `target` | `object` | Capability target for scoped credentials. | -| `target.type` | `string` | Currently only `"repo"`. | -| `target.config-key` | `string` | Must appear in `config-keys`. | -| `runtime-dependencies` | `object[]` | Optional sandbox dependency declarations used to build reusable snapshots. | -| `runtime-dependencies[].type` | `string` | `"npm"` or `"system"`. | -| `runtime-dependencies[].package` | `string` | Package identifier (npm package name or system package name). Required for `npm`; optional for `system` when `url` is used. | -| `runtime-dependencies[].version` | `string` | Optional for `npm` dependencies. When omitted, runtime uses `latest`. Must be omitted for `system` dependencies. | -| `runtime-dependencies[].url` | `string` | HTTPS URL for direct system package install (RPM). Allowed only for `system` dependencies. | -| `runtime-dependencies[].sha256` | `string` | Required with `url`. Lowercase or uppercase hex SHA-256 checksum used for integrity verification before install. | -| `runtime-postinstall` | `object[]` | Optional post-install command declarations executed after dependency install and before snapshot capture. | -| `runtime-postinstall[].cmd` | `string` | Non-empty command name. | -| `runtime-postinstall[].args` | `string[]` | Optional command arguments. | -| `runtime-postinstall[].sudo` | `boolean` | Optional sudo flag for commands requiring elevated privileges. | -| `env-vars` | `Record` | Optional map declaring deployment env vars the manifest may reference from `mcp.url` or plugin-level `api-headers`. Keys must match `[A-Z_][A-Z0-9_]*`. See [MCP URL env-var expansion](#mcp-url-env-var-expansion). | -| `env-vars..default` | `string` | Optional default value used by `mcp.url` when `process.env[NAME]` is unset or empty. Must be omitted for env vars referenced from `api-headers`. | -| `mcp` | `object` | Optional MCP server configuration for host-managed tool discovery. | -| `mcp.transport` | `string` | Optional. When omitted and `mcp.url` is present, Junior infers HTTP. If provided in v1, it must be `"http"`. Stdio/command transports are not supported. | -| `mcp.url` | `string` | HTTPS endpoint for the MCP server. Supports `${NAME}` placeholders declared in `env-vars` — see [MCP URL env-var expansion](#mcp-url-env-var-expansion). Expansion runs before HTTPS validation. | -| `mcp.headers` | `Record` | Optional static non-Authorization headers sent with MCP HTTP requests. `Authorization` is reserved for runtime-managed auth. | -| `mcp.allowed-tools` | `string[]` | Optional non-empty allowlist of raw MCP tool names to expose for this provider. Activation fails if any listed tool is missing from discovery. | +| Field | Type | Rules | +| ------------------------------------ | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `capabilities` | `string[]` | Short names (e.g. `issues.read`). Qualified to `.issues.read` by the registry. No qualified capability may appear in more than one plugin. | +| `config-keys` | `string[]` | Short names (e.g. `org`). Qualified to `.org` by the registry. | +| `api-domains` | `string[]` | Optional domains for plugin-level API header injection. Required when `api-headers` is set. | +| `api-headers` | `Record` | Optional headers injected for matching `api-domains`. Values may reference `${NAME}` placeholders declared in `env-vars`; referenced env vars must not declare defaults. | +| `command-env` | `Record` | Optional non-secret sandbox env vars injected when provider credentials or API headers are enabled. Requires `credentials` or `api-headers`. Values may reference `${NAME}` placeholders declared in `env-vars`; referenced env vars must declare defaults. | +| `credentials` | `object` | Credential delivery configuration. | +| `credentials.type` | `string` | `"oauth-bearer"` or `"github-app"`. | +| `credentials.api-domains` | `string[]` | Domains for token-backed header transforms. At least one required. | +| `credentials.api-headers` | `Record` | Optional extra headers applied alongside runtime-managed `Authorization` for `oauth-bearer` and `github-app`; `Authorization` itself is reserved for those types. Prefer plugin-level `api-headers` for new manifests. | +| `credentials.auth-token-env` | `string` | Env var name for static token fallback and sandbox placeholder. Required for `oauth-bearer` and `github-app`. | +| `credentials.auth-token-placeholder` | `string` | Optional non-secret placeholder injected into sandbox env for CLI compatibility. Applies to `oauth-bearer` and `github-app`. | +| `credentials.app-id-env` | `string` | Env var name for GitHub App ID. Required when `credentials.type` is `"github-app"`. | +| `credentials.private-key-env` | `string` | Env var name for GitHub App private key (PEM). Required when `credentials.type` is `"github-app"`. | +| `credentials.installation-id-env` | `string` | Env var name for GitHub App installation ID. Required when `credentials.type` is `"github-app"`. | +| `oauth` | `object` | OAuth provider configuration. Requires `credentials.type` = `"oauth-bearer"`. | +| `oauth.client-id-env` | `string` | Env var name for client ID. | +| `oauth.client-secret-env` | `string` | Env var name for client secret. | +| `oauth.authorize-endpoint` | `string` | Valid HTTPS URL. | +| `oauth.token-endpoint` | `string` | Valid HTTPS URL. | +| `oauth.scope` | `string` | Optional OAuth scope string. | +| `oauth.authorize-params` | `Record` | Optional authorize URL params added alongside core params. Reserved OAuth param names may not be overridden. | +| `oauth.token-auth-method` | `string` | Optional token client auth method: `"body"` (default) or `"basic"`. | +| `oauth.token-extra-headers` | `Record` | Optional token request headers. `Authorization` is reserved; `Content-Type` controls token body serialization. | +| `target` | `object` | Capability target for scoped credentials. | +| `target.type` | `string` | Currently only `"repo"`. | +| `target.config-key` | `string` | Must appear in `config-keys`. | +| `runtime-dependencies` | `object[]` | Optional sandbox dependency declarations used to build reusable snapshots. | +| `runtime-dependencies[].type` | `string` | `"npm"` or `"system"`. | +| `runtime-dependencies[].package` | `string` | Package identifier (npm package name or system package name). Required for `npm`; optional for `system` when `url` is used. | +| `runtime-dependencies[].version` | `string` | Optional for `npm` dependencies. When omitted, runtime uses `latest`. Must be omitted for `system` dependencies. | +| `runtime-dependencies[].url` | `string` | HTTPS URL for direct system package install (RPM). Allowed only for `system` dependencies. | +| `runtime-dependencies[].sha256` | `string` | Required with `url`. Lowercase or uppercase hex SHA-256 checksum used for integrity verification before install. | +| `runtime-postinstall` | `object[]` | Optional post-install command declarations executed after dependency install and before snapshot capture. | +| `runtime-postinstall[].cmd` | `string` | Non-empty command name. | +| `runtime-postinstall[].args` | `string[]` | Optional command arguments. | +| `runtime-postinstall[].sudo` | `boolean` | Optional sudo flag for commands requiring elevated privileges. | +| `env-vars` | `Record` | Optional map declaring deployment env vars the manifest may reference from `mcp.url`, plugin-level `api-headers`, or `command-env`. Keys must match `[A-Z_][A-Z0-9_]*`. See [MCP URL env-var expansion](#mcp-url-env-var-expansion). | +| `env-vars..default` | `string` | Optional default value used by `mcp.url` or `command-env` when `process.env[NAME]` is unset or empty. Must be omitted for env vars referenced from `api-headers`. | +| `mcp` | `object` | Optional MCP server configuration for host-managed tool discovery. | +| `mcp.transport` | `string` | Optional. When omitted and `mcp.url` is present, Junior infers HTTP. If provided in v1, it must be `"http"`. Stdio/command transports are not supported. | +| `mcp.url` | `string` | HTTPS endpoint for the MCP server. Supports `${NAME}` placeholders declared in `env-vars` — see [MCP URL env-var expansion](#mcp-url-env-var-expansion). Expansion runs before HTTPS validation. | +| `mcp.headers` | `Record` | Optional static non-Authorization headers sent with MCP HTTP requests. `Authorization` is reserved for runtime-managed auth. | +| `mcp.allowed-tools` | `string[]` | Optional non-empty allowlist of raw MCP tool names to expose for this provider. Activation fails if any listed tool is missing from discovery. | Snapshot build/reuse and invalidation behavior for `runtime-dependencies` is defined in [Sandbox Snapshots Spec](./sandbox-snapshots-spec.md). @@ -239,34 +247,33 @@ without a default and `process.env[NAME]` is unset or empty. not listed in `env-vars` are rejected at load time — this makes the set of env vars a manifest may read explicit and auditable, and prevents a manifest from opportunistically reading ambient host env vars (e.g. -`SLACK_BOT_TOKEN`). Manifest-load expansion applies to `mcp.url`; API -header placeholders are validated at manifest load and resolved only when a -credential lease is issued, so secret header values are not stored in the -parsed manifest. Other manifest fields (credentials envs, OAuth endpoints, -api-domains, etc.) already have dedicated env-ref mechanisms -(`auth-token-env`, `client-id-env`, …) or must remain literal for -validation. +`SLACK_BOT_TOKEN`). Manifest-load expansion applies to `mcp.url` and +default-backed `command-env` references. API header placeholders are +validated at manifest load and resolved only when a credential lease is +issued, so secret header values are not stored in the parsed manifest. Other +manifest fields (credentials envs, OAuth endpoints, api-domains, etc.) +already have dedicated env-ref mechanisms (`auth-token-env`, +`client-id-env`, ...) or must remain literal for validation. Defaults live in the `env-vars` declaration, not inline in the placeholder. There is no `${NAME:-default}` form. -The primary motivation is region-pinned providers (Datadog, Sentry -self-hosted, GitHub Enterprise, Linear EU, …) where the hostname is the only -thing that varies across deployments. Example: +The primary motivation is region-pinned providers (Sentry self-hosted, +GitHub Enterprise, Linear EU, ...) where the hostname is the only thing that +varies across deployments. Example: ```yaml env-vars: - DATADOG_SITE: - default: datadoghq.com + EXAMPLE_SITE: + default: example.com mcp: - url: https://mcp.${DATADOG_SITE}/api/unstable/mcp-server/mcp?toolsets=core,apm,error-tracking + url: https://mcp.${EXAMPLE_SITE}/mcp ``` -US1 operators leave `DATADOG_SITE` unset and get the declared default. -Operators on US3/US5/EU/AP1/AP2/GovCloud set `DATADOG_SITE=us5.datadoghq.com` -(etc.) in their Junior deployment env. No code changes, no app-local plugin -copy. +Operators can leave `EXAMPLE_SITE` unset and get the declared default, or +set it in their Junior deployment env for a different regional host. No code +changes, no app-local plugin copy. ### API header env-var references @@ -275,6 +282,36 @@ declared in `env-vars`. These placeholders are intended for headers that may carry secrets, so their declarations must not include `default`. Missing env values fail when the provider's header transforms are issued. +### Command env + +Plugin-level `command-env` supports non-secret sandbox environment variables +that are injected into provider command leases. It is intended for CLI +compatibility values such as placeholder API keys, read-only mode toggles, or +site defaults needed by the command process. + +Values may be literal strings, or `${NAME}` placeholders declared in +`env-vars`. Placeholder references must declare `default`, because +`command-env` is visible inside the sandbox and must not read secret +deployment env vars. Use `api-headers` for secret-bearing provider values and +`command-env` only for placeholders or defaults safe to expose. + +```yaml +env-vars: + EXAMPLE_AUTH_HEADER: + EXAMPLE_SITE: + default: example.com + +api-domains: + - api.example.com +api-headers: + Authorization: ${EXAMPLE_AUTH_HEADER} + +command-env: + EXAMPLE_API_KEY: host_managed_credential + EXAMPLE_SITE: ${EXAMPLE_SITE} + EXAMPLE_READ_ONLY: "1" +``` + System runtime dependency execution environment: - Sandbox OS is Amazon Linux 2023. @@ -299,6 +336,7 @@ System runtime dependency execution environment: - No two plugins may declare the same capability token. - No two plugins may use the same `name`. - If `target.config-key` is set, it must be listed in `config-keys`. +- If `command-env` is set, the plugin must also declare credentials or API headers so a credential broker exists to deliver it. - If a plugin declares capabilities without credentials or API headers, manifest load succeeds and runtime credential enablement fails with an explicit no-broker error when an authenticated command needs that provider. - `plugin.yaml` remains the enforceable runtime authority. `loadSkill` re-resolves the skill's parent plugin from its path, rejects mismatched plugin metadata, rebuilds metadata from the current skill file, and prepends a host-owned runtime boundary before the skill body. @@ -327,9 +365,9 @@ The plugin registry is initialized at module load time (sync). This means it is The registry provides `createPluginBroker(provider, deps)` which constructs the appropriate broker from manifest config: -- `oauth-bearer`: Creates a generic `OAuthBearerBroker` that handles per-user OAuth tokens, token refresh, static env fallback, and header transforms — all parameterized from the manifest. +- `oauth-bearer`: Creates a generic `OAuthBearerBroker` that handles per-user OAuth tokens, token refresh, static env fallback, command env, and header transforms — all parameterized from the manifest. - `github-app`: Creates a `GitHubAppBroker` that signs JWTs with an RSA private key and exchanges them for short-lived installation tokens via the GitHub App API. No `UserTokenStore` dependency — tokens are per-installation, not per-user. -- plugin-level `api-headers`: Creates an `ApiHeadersBroker` for providers that only need header injection. Token-backed brokers include plugin-level API header transforms alongside their credential transforms; credential headers are applied last and win if both sources set the same header for the same domain. +- plugin-level `api-headers`: Creates an `ApiHeadersBroker` for providers that only need header injection. Token-backed brokers include plugin-level API header transforms and command env alongside their credential transforms; credential headers are applied last and win if both sources set the same header for the same domain. - no-credentials/no-headers plugins: broker creation fails with a provider-scoped no-credentials error. ### Plugin registry exports @@ -383,13 +421,13 @@ All existing functions (`getCapabilityProvider`, `isKnownCapability`, etc.) work ```typescript for (const plugin of getPluginProviders()) { - const { apiHeaders, credentials, name } = plugin.manifest; + const { apiHeaders, commandEnv, credentials, name } = plugin.manifest; if (!credentials && !apiHeaders) continue; brokersByProvider[name] = useTestBroker ? new TestCredentialBroker({ provider: name, // token-backed credentials add domains/env placeholder; header-only - // plugins only add header transforms. + // plugins add header transforms and optional command env. }) : createPluginBroker(name, { userTokenStore }); } @@ -411,7 +449,7 @@ The OAuth callback route uses `getOAuthProviderConfig()` instead of accessing `O ### Test credential override -`TestCredentialBroker` substitution in eval mode works the same — `factory.ts` checks `EVAL_ENABLE_TEST_CREDENTIALS=1` and substitutes regardless of source. For plugin-level `api-headers`, eval mode injects deterministic dummy header values instead of resolving deployment env vars. +`TestCredentialBroker` substitution in eval mode works the same — `factory.ts` checks `EVAL_ENABLE_TEST_CREDENTIALS=1` and substitutes regardless of source. For plugin-level `api-headers`, eval mode injects deterministic dummy header values instead of resolving deployment env vars. Plugin-level `command-env` is preserved so CLI placeholder behavior matches production without exposing real secrets. ### Install-wide config defaults @@ -438,7 +476,7 @@ Plugin skills use the same `SKILL.md` format and frontmatter contract as existin ### Skill/runtime boundary -Plugin-backed skills may tell the model how to use available commands, MCP tools, config defaults, and provider-specific query syntax. They may include troubleshooting for unavailable runtime surfaces only as diagnosis and escalation, for example “report that the GitHub plugin runtime dependency is unavailable.” +Plugin-backed skills may tell the model how to use available commands, MCP tools, command env, config defaults, and provider-specific query syntax. They may include troubleshooting for unavailable runtime surfaces only as diagnosis and escalation, for example “report that the GitHub plugin runtime dependency is unavailable.” When the runtime loads a plugin-backed skill, it enforces the parent plugin before returning the skill: @@ -447,7 +485,7 @@ When the runtime loads a plugin-backed skill, it enforces the parent plugin befo - rebuild loaded metadata from the current `SKILL.md` frontmatter; - prepend a host-owned runtime boundary derived from the plugin manifest. -That boundary tells the model that provider runtime packages, installer scripts, API keys, OAuth clients, and MCP servers are controlled by `plugin.yaml`, not by arbitrary skill prose. +That boundary tells the model that provider runtime packages, installer scripts, API keys, command env, OAuth clients, and MCP servers are controlled by `plugin.yaml`, not by arbitrary skill prose. Plugin-backed skills must not: @@ -456,7 +494,7 @@ Plugin-backed skills must not: - ask the model to configure API keys, OAuth credentials, tokens, or MCP server endpoints; - ask the model to fix sandbox package installation from within a user workflow. -When a bundled or third-party skill needs a CLI, system package, postinstall step, credential source, config key, or MCP server, the plugin wrapper declares that requirement in `plugin.yaml`. The skill should then rely on the runtime to provide it and fail with a clear plugin-runtime remediation when it is unavailable. +When a bundled or third-party skill needs a CLI, system package, postinstall step, credential source, command env, config key, or MCP server, the plugin wrapper declares that requirement in `plugin.yaml`. The skill should then rely on the runtime to provide it and fail with a clear plugin-runtime remediation when it is unavailable. ### Discovery @@ -479,9 +517,9 @@ Plugin skills are subject to the same frontmatter validation and name-deduplicat All existing security invariants from `security-policy.md` are preserved: - **Host-trusted code.** Plugin manifests are YAML files committed to the repository. No dynamic code loading. -- **Credential delivery via header transforms only.** Token credentials and plugin-level `api-headers` are delivered as host-managed header transforms for declared `api-domains`. Real secret values never enter sandbox env vars, files, or command arguments. +- **Credential delivery via header transforms only.** Token credentials, API keys, and plugin-level `api-headers` are delivered as host-managed header transforms for declared `api-domains`. Real secret values never enter sandbox env vars, files, or command arguments. - **Short-lived leases.** Lease behavior is unchanged. The `CredentialLease` contract enforces expiry timestamps. -- **No env var leakage.** Placeholder values are injected for the `auth-token-env` variable. +- **No env var leakage.** Only non-secret placeholder/default command env values are injected into the sandbox. Secret-bearing provider values are delivered through host-managed header transforms. - **OAuth privacy rules unchanged.** Authorization URLs are delivered privately. The agent never sees token values. - **Plugin manifests are static.** Parsed once at startup, no runtime mutation. diff --git a/specs/security-policy.md b/specs/security-policy.md index fca377c0..9b874703 100644 --- a/specs/security-policy.md +++ b/specs/security-policy.md @@ -54,9 +54,10 @@ This policy applies to: - Loaded skills and their plugin declarations determine which provider credentials may be injected into a turn. - Credential issuance for user-owned provider access must be requester-bound; runtime paths without requester context must fail instead of issuing reusable credentials. - Even for host-managed integrations, credentials are activated only inside the requesting turn and must not carry over to later turns or different message authors. -- Real tokens are delivered exclusively via host-level header transforms — the host proxies `Authorization` headers for matching API domains (e.g. `api.github.com`, `sentry.io`). The sandbox never sees real token values. -- When CLI tools require an auth env var (e.g. `SENTRY_AUTH_TOKEN`), set it to a non-secret placeholder so the tool proceeds to make HTTP requests. Placeholder values may be provider-specific via plugin manifest config. The host authenticates those requests via header transforms. -- Never inject real tokens into sandbox env vars, files, or command arguments. +- Real provider secrets are delivered exclusively via host-level header transforms — the host proxies auth headers for matching API domains (e.g. `Authorization` for `api.github.com`/`sentry.io` or provider-specific API key headers). The sandbox never sees real secret values. +- When CLI tools require tool-native sandbox auth env vars (for example `SENTRY_AUTH_TOKEN`, Pup's `DD_API_KEY`, or Pup's `DD_APP_KEY`), set them to non-secret placeholders so the tool proceeds to make HTTP requests. Placeholder values may be provider-specific via plugin manifest config. The host authenticates those requests via header transforms. +- Plugin-declared command env may include non-secret placeholders and default-backed deployment values needed by the command process. It must not read or expose secret deployment env vars. +- Never inject real provider secrets into sandbox env vars, files, or command arguments. ### GitHub baseline diff --git a/specs/skill-capabilities-spec.md b/specs/skill-capabilities-spec.md index 7047aeb2..b3124179 100644 --- a/specs/skill-capabilities-spec.md +++ b/specs/skill-capabilities-spec.md @@ -3,7 +3,7 @@ ## Metadata - Created: 2026-02-26 -- Last Edited: 2026-04-26 +- Last Edited: 2026-05-08 ## Changelog @@ -12,6 +12,7 @@ - 2026-03-20: Documented prompt exposure of declared capabilities and clarified Sentry OAuth initiation paths. - 2026-04-17: Removed skill-level capability declarations and explicit model-facing auth commands in favor of plugin-owned permission manifests plus runtime-owned implicit auth. - 2026-04-26: Added the plugin-owned runtime setup boundary for packages, MCP endpoints, OAuth, and credentials. +- 2026-05-08: Added plugin-owned `command-env` as a non-secret CLI compatibility surface. ## Status @@ -34,7 +35,7 @@ Define how Junior maps a loaded plugin-backed skill to host-managed credentials 3. After a plugin-backed skill is loaded, the agent runs the real provider command. 4. The runtime resolves the provider from the active skill, issues a provider lease, and injects credentials for the current turn only. 5. If auth is missing or stale, the runtime starts a private OAuth flow and resumes the paused turn after authorization. -6. Plugin manifests own runtime setup. Skills do not instruct the agent to install packages, bootstrap CLIs, configure provider credentials, or set up MCP servers. +6. Plugin manifests own runtime setup. Skills do not instruct the agent to install packages, bootstrap CLIs, configure provider credentials, command env, or MCP servers. ## Plugin contract @@ -42,6 +43,7 @@ Plugins define: - `capabilities`: host-side permission manifest for the provider integration - `credentials`: how runtime leases are delivered to tools +- `command-env`: non-secret env vars or placeholders needed by sandbox commands - `oauth`: optional per-user OAuth configuration - `target`: optional provider-default metadata such as a repo config key @@ -64,7 +66,7 @@ Rules: - `requires-capabilities` is no longer supported. - Skills must never include secret values. - Skills should use provider defaults from the runtime provider catalog so repo/project commands stay deterministic. -- Skills must treat plugin-provided commands and tools as already available. Missing CLIs, missing MCP tools, sandbox package failures, or missing credentials are runtime/plugin setup failures to report or reconnect through runtime-owned flows, not problems for the skill to repair with package-manager or credential setup commands. +- Skills must treat plugin-provided commands, tools, and command env as already available. Missing CLIs, missing MCP tools, sandbox package failures, missing command env, or missing credentials are runtime/plugin setup failures to report or reconnect through runtime-owned flows, not problems for the skill to repair with package-manager or credential setup commands. ## Runtime contract @@ -80,6 +82,7 @@ Rules: - Enablement happens when the authenticated provider command runs, not at skill-load time. - Delivery uses sandbox header transforms for matching domains. - Plugin credentials may define a provider-specific `auth-token-placeholder` for CLI compatibility. +- Plugin manifests may define non-secret `command-env` values for CLI compatibility. These may include placeholder API keys or deployment defaults, but never real secrets. - Do not inject long-lived secrets into sandbox files. ### Runtime setup boundary @@ -89,6 +92,7 @@ Rules: - CLI and system packages belong in `plugin.yaml` `runtime-dependencies`. - Postinstall/bootstrap commands belong in `plugin.yaml` `runtime-postinstall`. - MCP endpoints and allowed tool surfaces belong in `plugin.yaml` `mcp`. +- CLI env placeholders and deployment defaults belong in `plugin.yaml` `command-env`. - OAuth and static credential env names belong in `plugin.yaml` `oauth` and `credentials`. - Skill text may diagnose missing runtime surfaces, but must not tell the agent to install packages, run installer scripts, configure API keys, or repair sandbox package installation from inside a user workflow. @@ -140,7 +144,7 @@ Emit events without secret material: - Skill-level capability allowlists. - Model-visible auth-management commands. - Provider-specific policy engines beyond requester and turn scoping. -- Using arbitrary skill prose as an authority source for runtime package installation, MCP setup, or credential configuration. +- Using arbitrary skill prose as an authority source for runtime package installation, MCP setup, command env, or credential configuration. ## Backward compatibility