Fix shared/apm.md matrix secret-stripping; add empirical verify workflow#1373
Conversation
The pre-fix shared/apm.md embedded the GitHub App private-key inside the apm-prep job's 'matrix' output. GitHub Actions ran every job output through HostContext.SecretMasker.MaskSecrets (see actions/runner JobExtension.cs); when the masked value differed from the original, the runner dropped the output entirely with the warning 'Skip output "matrix" since it may contain secret.' The downstream apm job's matrix: fromJSON(needs.apm-prep.outputs.matrix) then received an empty string, no replicas ran, no APM bundles were packed, and pre-agent-steps failed at 'no apm bundles found'. Every single-app legacy (form 2) and apps[] (form 3) consumer of shared/apm.md was broken end-to-end. The fix moves credentials off the job-output path. apm-prep now emits routing metadata only (id, kind, index, owner, repositories, packages, has-app); the apm job carries the gh-aw-substituted secrets in its own per-replica env, and a new 'Resolve credentials' step picks the row's credentials with jq and relays them through $GITHUB_ENV as ROW_APP_ID / ROW_PRIVATE_KEY. The create-github-app-token step now reads from env.ROW_*. $GITHUB_ENV is per-job and is masked-in-logs but NOT subject to the cross-job output redaction filter. Empirical proof: .github/workflows/verify-shared-apm-matrix.yml exercises both shapes with a synthetic ::add-mask::'d PEM (no real secrets needed). Job set A replicates the pre-fix shape and asserts the matrix output IS stripped (regression sentinel against GH silently changing the rule). Job set B replicates the post-fix shape and asserts the matrix output survives, the matrix-strategy job fans out to two rows (legacy + apps), each row resolves ROW_PRIVATE_KEY from $GITHUB_ENV, and the PEM round-trips intact via SHA-256 comparison. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The loop body '[ -n "$line" ] && echo ...' returns the exit status of the failed test when $line is empty, which under 'set -e' aborts the script. Use an explicit 'if ...; then ...; fi' instead so the loop tolerates the trailing empty line from the heredoc-string read deterministically. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a regression introduced by PR #982 where the apm-prep job in shared/apm.md embedded the GitHub App private-key inside its cross-job matrix output. The GitHub Actions runner's SecretMasker silently replaces any cross-job output containing a registered secret substring with an empty string, causing the downstream apm matrix strategy to fan out to zero replicas and every consumer (form 2 single-app + form 3 apps[]) to fail at no apm bundles found. The fix keeps the matrix output strictly non-secret (routing metadata only) and resolves per-row credentials inside the apm job via $GITHUB_ENV, which is not subject to the cross-job redaction filter. A new self-contained workflow empirically asserts both the runner's stripping behaviour (regression sentinel) and the new shape's survival, using a synthetic ::add-mask::-registered PEM.
Changes:
- Restructured
apm-prepto emitid/kind/index/owner/repositories/packages/has-apponly; droppedapp-id/private-keyfrom the matrix. - Added a "Resolve credentials" step to the
apmjob that case-switches onmatrix.group.kind, indexes apps bymatrix.group.index, re-masks, and writesROW_APP_ID/ROW_PRIVATE_KEYto$GITHUB_ENV;create-github-app-tokennow readsenv.ROW_*and is gated onhas-app == 'true'. - Added
verify-shared-apm-matrix.ymlwith two job sets (broken-shape sentinel + fixed-shape positive proof with SHA round-trip).
Show a summary per file
| File | Description |
|---|---|
.github/workflows/shared/apm.md |
Splits credentials out of the cross-job matrix output and relays them per row via $GITHUB_ENV in the apm job. |
.github/workflows/verify-shared-apm-matrix.yml |
New empirical proof workflow asserting both the broken pre-fix shape is stripped and the post-fix shape survives end-to-end. |
CHANGELOG.md |
Adds an Unreleased / Fixed entry for the fix. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 1
|
|
||
| ### Fixed | ||
|
|
||
| - `shared/apm.md`: rebuilt the apm-prep -> apm matrix hand-off so the per-row credentials are relayed via `$GITHUB_ENV` on the `apm` job instead of being embedded in the matrix job output. The previous design placed the GitHub App `private-key` value inside the `matrix` job output of `apm-prep`, which the runner stripped via its cross-job secret-redaction filter (`HostContext.SecretMasker.MaskSecrets` in `actions/runner` / `JobExtension.cs`, surfaced as `##[warning]Skip output 'matrix' since it may contain secret.`). Downstream consumers received an empty matrix, zero `apm` replicas ran, and pre-agent-steps failed at `no apm bundles found`. Every form 2 (single-app legacy) and form 3 (`apps[]`) consumer of `shared/apm.md` was broken end-to-end. The fix moves credentials out of the matrix output entirely: `apm-prep` now emits routing metadata only (`id`, `kind`, `index`, `owner`, `repositories`, `packages`, `has-app`); the `apm` job carries the gh-aw-substituted secrets in its own per-replica env, resolves the row credentials via `jq`, and relays them to `create-github-app-token` through `ROW_APP_ID` / `ROW_PRIVATE_KEY` in `$GITHUB_ENV` (which is per-job and never subject to the cross-job redaction filter). A new `.github/workflows/verify-shared-apm-matrix.yml` workflow provides empirical proof: one job set replicates the broken pre-fix shape and asserts the matrix output is stripped (regression sentinel for the runner behaviour), and a second job set replicates the post-fix shape and asserts the matrix output survives, the `apm`-style matrix job fans out, and the resolved `ROW_PRIVATE_KEY` round-trips intact (SHA-256 comparison). |
There was a problem hiding this comment.
Addressed in 65e5868 -- compressed to a single line matching the file's convention and added the (#1373) suffix. The full root-cause exposition lives in the PR description.
The legacy resolution path preserves the PEM's trailing newline (direct env-var expansion), while the apps[] path strips it ($() command substitution around jq -r). Both are equally valid for a real PEM that downstream create-github-app-token consumes, but the bit-exact SHA-256 comparison would diverge by one newline. Trim trailing newlines on both the expected (prep) and got (consume) side before hashing so both rows converge on the same value, without weakening the substring-leak detection (the trim affects only whitespace, not PEM body bytes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three parallel adversarial expert audits (secret-masker, GitHub Apps, matrix-composition) returned MOSTLY CONFIDENT on the env-relay fix and converged on three concrete hardenings worth applying. Mirrors the production resolver in the verify workflow so the empirical proof exercises the hardened code path. 1) Random heredoc delimiter (matrix-composition expert) Replace the fixed 'APMPK' delimiter with 'APMPK_$(openssl rand -hex 16)'. The official multi-line workflow command docs explicitly warn against fixed delimiters for arbitrary multi-line values; the runner's FileCommandManager does an exact-line ordinal compare, so a PEM whose base64 body happened to contain a line equal to 'APMPK' would prematurely terminate the value. Probability for real PEMs is astronomically low but non-zero; eliminate the class entirely. 2) Trailing-newline normalisation (matrix-composition expert) The legacy path (direct env var) preserves the PEM's trailing newline while the apps[] path (jq via $(...) command substitution) strips it. create-github-app-token v3.1.1 tolerates both via core.getInput().trim(), but normalising once at the env-relay boundary removes the asymmetry at the source instead of leaning on downstream tolerance. Both paths now produce byte-identical ROW_PRIVATE_KEY values. 3) Never-echo SECURITY comment (secret-masker + github-apps experts) ::add-mask:: registers the PEM as a single multi-line substring; the runner's secret masker does not match individual PEM lines printed in isolation. A future debug 'echo $pk' or 'set -x' in this step would leak the key body in clear text. Pure preventive guard. Primary sources cited by the audits: - actions/runner JobExtension.cs:639-647 (the output filter itself) - actions/runner FileCommandManager.cs:378 (heredoc parser - exact compare) - actions/runner ActionCommandManager.cs:426 (::add-mask:: implementation) - docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#multiline-strings - raw.githubusercontent.com/actions/create-github-app-token/v3.1.1/main.js Verify workflow simplification: with the resolver now normalising trailing newlines uniformly, the b-fix-consume SHA assertion no longer needs its own normalisation loop -- both rows arrive at byte-identical PEMs at the $GITHUB_ENV boundary. Drops 4 lines of belt-and-suspenders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PR ref Per review comment from copilot-pull-request-reviewer on PR #1373: - Add missing (#1373) suffix to match Keep a Changelog format used throughout the file. - Compress the multi-paragraph root-cause exposition to a single line; the deep technical detail (runner internals, JobExtension.cs, verify workflow design) lives in the PR description where it belongs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- One concise line per PR answering 'so what?' for end users - Add 5 missing entries: #1376 (perf resolver), #1373 (shared/apm.md matrix secret-stripping), #1246 (install.ps1 GHES env vars), #1255 (warn missing apm.yml), #1248 (extends:org unmanaged_files) - Drop internal/CI/test-infra entries (#1270, #1271, #1272, #1274, #1276, #1291, #1360 refactor) - Consolidate three #605 lines and four #1317 lines into one entry per PR where appropriate - Promote MCP Registry v0.1 to a dedicated Breaking section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: cut 0.14.0 Renames the [Unreleased] block in CHANGELOG.md to [0.14.0] - 2026-05-18 and bumps the package version from 0.13.0 to 0.14.0 in pyproject.toml (and uv.lock by regeneration). 0.14.0 ships the producer-experience epic (#1348) on the CLI side -- notably: - apm pack --check-versions / --check-clean (#1365), the release gates consumed by apm-action mode: release. - apm plugin init (#1370), the noun-verb successor to apm init --plugin. - apm pack multi-format outputs (--marketplace, --marketplace-path, --json, marketplace.outputs map form) (#1317). - New producer docs corpus (repo-shapes / releasing-from-any-ci / versioning-strategies) (#1370). - Breaking: MCP registry client adopts the official v0.1 spec; self- hosted registries must serve /v0.1/ paths (#1337). Plus the deprecations and fixes already listed in the moved block. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(changelog): tighten v0.14.0 entries; add post-cut PRs - One concise line per PR answering 'so what?' for end users - Add 5 missing entries: #1376 (perf resolver), #1373 (shared/apm.md matrix secret-stripping), #1246 (install.ps1 GHES env vars), #1255 (warn missing apm.yml), #1248 (extends:org unmanaged_files) - Drop internal/CI/test-infra entries (#1270, #1271, #1272, #1274, #1276, #1291, #1360 refactor) - Consolidate three #605 lines and four #1317 lines into one entry per PR where appropriate - Promote MCP Registry v0.1 to a dedicated Breaking section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(changelog): add #1377 Bitbucket DC tilde fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Meppiel <copilot-rework@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TL;DR
shared/apm.mdhas been broken for every form 2 (single-app legacy) and form 3 (apps[]) consumer since PR #982 added theapm-prep->apmmatrix split. Theapm-prepjob emits the GitHub Appprivate-keyPEM inside itsmatrixjob output, GitHub Actions silently strips the output as a secret leak, the downstream matrix evaluates to empty, no APM bundles are produced, and the agent fails atno apm bundles found.This PR fixes the canonical
shared/apm.mdby moving credentials off the cross-job output path entirely, and adds.github/workflows/verify-shared-apm-matrix.ymlwhich empirically asserts both the failure mode and the fix using a synthetic::add-mask::-registered PEM (no real secrets required).Empirical proof (green): Verify shared/apm.md matrix secret-stripping fix #26058337658 -- all 6 jobs pass, including the regression sentinel that asserts GH still strips the old shape and the positive matrix-strategy job that asserts the new shape's env-relay round-trips the PEM intact.
Problem (WHY)
Reported by a consumer in the field: after updating to the latest
gh-aw+ canonicalshared/apm.md, every workflow consuming it failed with:...followed by zero matrix replicas and
no apm bundles foundat the agent boundary.Root cause
The GitHub Actions runner runs every job output value through its secret masker before publishing it for consumption by dependent jobs. In
actions/runner(src/Runner.Worker/JobExtension.cs), the relevant guard is:The pre-fix
apm-prepstep built itsmatrixoutput byjq-foldingprivate-keydirectly into the matrix rows. Onceprivate-keywas a registered secret (via repo / orgsecretscontext or::add-mask::), the masker detected the secret substring inside the output value, the runner replaced the output with"", and the downstreamapm.strategy.matrix: ${{ fromJSON(needs.apm-prep.outputs.matrix) }}saw an empty string.This is the runner working as designed. There is no flag to disable it and there is no path-pattern allow-list. Any value that contains any registered-secret substring is dropped, full stop. The only fix is to stop putting secrets into job outputs.
(This regressed silently because the pre-PR-982 design was a single
apmjob -- no matrix, no job-output secret crossing. The split design that PR-982 introduced is what triggered the masker.)Approach (WHAT)
Keep the cross-job
matrixoutput strictly non-secret (routing metadata only). Resolve credentials per matrix row inside theapmjob itself, using gh-aw compile-time-substituted env vars, and relay the resolved row credentials between steps via$GITHUB_ENV-- which is per-job and not subject to the cross-job output-redaction filter (it still gets masked in logs, but it is usable in subsequent steps of the same job).Patched
apm-prepmatrix shape (secret-free){"group":[ {"id":"legacy", "kind":"legacy", "index":0, "owner":"...", "repositories":"...", "packages":[...], "has-app":"true"}, {"id":"acme", "kind":"apps", "index":0, "owner":"...", "repositories":"...", "packages":[...], "has-app":"true"}, {"id":"default", "kind":"default","index":0, "owner":"", "repositories":"", "packages":[...], "has-app":"false"} ]}No
app-id, noprivate-key. The masker has nothing to match on; the output is published verbatim.Patched
apmjobenv:carries the gh-aw compile-time-substituted credential fields (AW_APM_LEGACY_APP_ID,AW_APM_LEGACY_PRIVATE_KEY,AW_APM_APPS). GA evaluates these per replica; secret-bearing values are masked in logs but not stripped (env is not an output).Resolve credentials for this matrix rowreadsmatrix.group.kind+matrix.group.index, picks the rightapp-id/private-keyfrom the job env (orjq-indexesAW_APM_APPSfor the apps[] case), normalises trailing newlines, and writes them to$GITHUB_ENVasROW_APP_ID/ROW_PRIVATE_KEY. It also re-runs::add-mask::on the resolved PEM (defence in depth).actions/create-github-app-token@v3.1.1then receives${{ env.ROW_APP_ID }}/${{ env.ROW_PRIVATE_KEY }}. Sameif: matrix.group.has-app == 'true'guard.Downstream validators (
pre-agent-steps) only ever readidfrom the matrix manifest -- no change needed.Empirical proof (
verify-shared-apm-matrix.yml)The PR ships its own proof. This workflow runs on
pull_requestagainst itself and exercises both shapes:Job set A --
a-bug-prep+a-bug-consume(regression sentinel)Replicates the pre-fix shape literally: registers a synthetic PEM via
::add-mask::, then writes amatrixjob output containing that PEM, and the consumer asserts thatneeds.a-bug-prep.outputs.matrixis the empty string. Passes iff GitHub still strips secret-bearing outputs. If GA ever loosens the rule, Job A will fail and tell us the fix's premise has changed.Job set B --
b-fix-prep+b-fix-assert-shape+b-fix-consume(positive proof)Replicates the post-fix shape: registers the same PEM, emits a secret-free 2-row matrix (
legacyrow + apps[] row atindex: 0), plus a non-secretexpected_sha(SHA-256 of the PEM, which is not a substring of the PEM and therefore not masked). Asserts:needs.b-fix-prep.outputs.matrixis non-empty in the asserter.b-fix-consumejob fans out and runs each row.ROW_PRIVATE_KEYfrom$GITHUB_ENVand the value round-trips intact (sha256sumof the resolved PEM equalsexpected_sha, byte-for-byte, on both the legacy and apps[] resolution paths).No real GitHub App credentials are needed; everything is self-contained.
Adversarial expert audit (3 parallel reviews)
Before finalising, three independent expert reviewers were spawned in parallel (secret-masker / GitHub Apps / matrix-composition), each instructed to use only live verified primary sources -- no invented details. All three returned MOSTLY CONFIDENT on the env-relay design; none blocking. The audit converged on three concrete hardenings, applied to the resolver step in commit
e027211a.Primary-source citations (verified at audit time)
actions/runner--src/Runner.Worker/JobExtension.cs:639-647-- the output filter itself (MaskSecretsround-trip equality check).actions/runner--src/Runner.Worker/FileCommandManager.cs:378--$GITHUB_ENVheredoc parser: exact-line ordinal compare for the terminator.actions/runner--src/Runner.Worker/ActionCommandManager.cs:426--::add-mask::registers the value as a single substring; does not auto-decompose multi-line secrets.docs.github.com-- workflow-commands -- multi-line strings -- explicit warning against fixed heredoc delimiters for arbitrary multi-line values.docs.github.com-- contexts --env.*context availability for step-levelwith:/env:.actions/create-github-app-token@v3.1.1--main.js-- usescore.getInput()which calls.trim()on the PEM by default; tolerates either presence or absence of trailing newline.Findings applied (commit
e027211a)APMPKviolates the official "arbitrary multi-line value" guidance. The runner does an exact-line ordinal compare, so a PEM whose base64 body happened to contain a line equal toAPMPKwould silently truncate the env var. Probability for real PEMs is astronomically low but non-zero; the class is eliminated entirely viadelim="APMPK_$(openssl rand -hex 16)".$(...)command substitution) strips it.create-github-app-tokenv3.1.1 tolerates both, but normalising once at the env-relay boundary (pk="${pk%$'\n'}") eliminates the asymmetry at the source instead of leaning on downstream tolerance. Both paths now produce byte-identicalROW_PRIVATE_KEY.::add-mask::registers the PEM as a single multi-line substring; the runner's secret masker does not match individual PEM lines printed in isolation. A future debugecho "$pk"orset -xin this step would leak the key body in clear text. Pure preventive guard on the resolver step.Findings deferred (out of scope for this PR)
actions/create-github-app-tokenby SHA -- repo-wide concern, separate PR.app-id->client-id-- the underlying action emits a deprecation hint; separate PR.shared/apm.md, unchanged by this fix.Verdict
All three reviewers independently arrived at MOSTLY CONFIDENT (not TOTAL) because secret masking is fundamentally a runtime promise from GitHub Actions, not a static guarantee any PR can prove offline. The verify workflow's regression sentinel (Job A) closes this gap empirically: if GitHub ever loosens the cross-job output filter, Job A will start failing and we get an automatic signal that the fix's premise has changed.
Validation evidence
uv run --extra dev ruff check src/ tests/)uv run --extra dev ruff format --check src/ tests/)verify-shared-apm-matrix.yml[OK]/[!])verify-shared-apm-matrix.yml##[warning]Skip output 'matrix' since it may contain secret.annotation comes from Job A by design (regression sentinel). A fresh run will be triggered by the hardening push and must remain green.How to test
verify-shared-apm-matrix.ymlworkflow on this PR is green (run 26058337658) -- both job sets pass.workflow_dispatchon any branch that contains both file changes.shared/apm.mdviaimports:against this branch's commit SHA should now produce a non-emptymatrixand fan out APM bundle replicas. The previously-failingno apm bundles foundvalidator should pass.Trade-offs
jqstep at the start of eachapmreplica. Necessary because the matrix can no longer carry credentials directly..github/workflows/shared/apm.mdinmicrosoft/apmis the canonical source of truth and the one consumers import; the vendored copy inmicrosoft/gh-awis downstream and tracked viaredirect:frontmatter -- it will sync naturally on the next gh-aw release. Out of scope for this PR.[map[a:1] map[b:2]]for object arrays in${{ github.aw.import-inputs.* }}, whichjqcannot parse. The verify workflow proves the receiving side works -- the actual unblock for live apps[] usage requires gh-aw to expose a JSON-encoding helper. Tracked separately.Files
.github/workflows/shared/apm.md-- the canonical fix + 3 audit hardenings..github/workflows/verify-shared-apm-matrix.yml-- new empirical proof (now mirrors the hardened resolver).CHANGELOG.md-- Unreleased / Fixed entry.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com