Skip to content

feat(secrets): add --all-repos and --source via Pipeline Preview discovery#624

Open
jamesadevine wants to merge 5 commits into
mainfrom
devinejames/secret-setting
Open

feat(secrets): add --all-repos and --source via Pipeline Preview discovery#624
jamesadevine wants to merge 5 commits into
mainfrom
devinejames/secret-setting

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

Summary

Adds project-scope token management to ado-aw via two new flags on the secrets subcommands:

  • --all-repos — operate on every ado-aw pipeline ADO knows about in the project (direct ado-aw definitions and consumer pipelines that include ado-aw templates), regardless of which repo their root YAML lives in.
  • --source <path> — filter to consumers of one specific template, e.g. agents/security-scan.md.

Composing these solves the user-visible pain that motivated the work:

# Rotate GITHUB_TOKEN on every consumer of one template, anywhere in the project:
ado-aw secrets set GITHUB_TOKEN --source agents/security-scan.md --all-repos

The legacy lexical local-fixture matcher remains the default; --definition-ids remains the explicit-ID escape hatch. enable / disable / remove keep their existing source-scoped safety semantics and are intentionally not changed.

How it works

Two pieces of new infrastructure make the above possible:

  1. Always-on marker step (src/compile/extensions/ado_aw_marker.rs). Every compiled pipeline now carries a # ado-aw-metadata: {schema,source,version,target} JSON line inside a Setup-job bash step.

    Why a bash step (and not a top-of-file comment): ADO's Pipeline Preview API strips top-of-document comments during YAML expansion — verified empirically against live def 2434 in msazuresphere/4x4 — but preserves comments inside step bodies (80 such lines survived the spike). The marker has to live inside a step body to survive Preview-driven discovery.

    Uniform across all four targets (standalone / 1es / job / stage); no per-target placement special-casing. Implemented via the existing CompilerExtension::setup_steps hook so it cleanly slots in alongside GitHubExtension and SafeOutputsExtension in collect_extensions.

  2. Preview-driven discovery (src/ado/discovery.rs). New discover_ado_aw_pipelines(scope) enumerates project definitions, calls POST /_apis/pipelines/{id}/preview per definition (8-permit semaphore, env-tunable via ADO_AW_PREVIEW_CONCURRENCY), and scans the response's finalYaml for marker steps via the existing crate::detect::parse_marker_step. Definitions whose process.yamlFilename matches a local lock file take a fast path that parses the local header directly and skips the Preview call.

    DiscoveryStatus classifies each definition as Direct / Consumer / UnknownRequiredParams / UnknownForbidden / PreviewFailed(_) / NotAdoAw. Pure scope filtering and classification logic are factored out for unit-testing without HTTP.

Files

  • src/compile/extensions/ado_aw_marker.rs (new) — the always-on marker extension.
  • src/ado/discovery.rs (new) — Preview client, DiscoveryScope, DiscoveryStatus, discover_ado_aw_pipelines, adapters into MatchedDefinition.
  • src/detect.rs — adds MarkerMetadata + parse_marker_step. Existing # @ado-aw parser unchanged.
  • src/compile/extensions/mod.rs — registers AdoAwMarkerExtension as always-on; CompileContext gains input_path.
  • src/compile/{standalone,onees,common}.rs — pass input_path to CompileContext::new; 1ES generate_setup_job now routes extension setup_steps (was previously the only target that didn't).
  • src/compile/types.rsCompileTarget::as_str().
  • src/compile/common.rs — extracted normalize_source_path helper shared by header comment and marker extension.
  • src/ado/mod.rsDefinitionSummary gains repository + revision; new Repository struct; new MatchMethod::Discovery.
  • src/secrets.rs, src/main.rs--all-repos / --source flag plumbing; resolve_for_command chooses between legacy and discovery paths.
  • docs/cli.md — documents the new flags and the project-scope-discovery section.
  • Test helper updates in src/enable.rs / src/list.rs (extending the def(...) helpers for the new DefinitionSummary fields).

Empirical grounding

Used az-acquired bearer auth to run a live spike against msazuresphere/4x4 def 2434 (OS Release Readiness) before settling on the marker design. The spike confirmed:

  • POST /_apis/pipelines/2434/preview?api-version=7.1-preview.1 works end-to-end and returns 56 KB of expanded YAML in finalYaml.
  • The source file's leading # This file is auto-generated by ado-aw… and # @ado-aw source="…" version=… are not in the expanded YAML (header is stripped during expansion).
  • 80 # comment lines inside step bodies are preserved verbatim.

That finding mandated moving the marker from a top-of-file comment into a step body, which this PR implements.

Test plan

  • cargo test: 1739 passed, 0 failed.
  • cargo clippy --all-targets --all-features: zero new warnings (only pre-existing baseline warnings remain).
  • 6 unit tests for parse_marker_step (single / multiple / malformed JSON / forward-compat unknown fields / no-match / prefix-only-without-JSON).
  • 4 integration tests in tests/compiler_tests.rs assert the marker step appears with correct source / version / target / schema for every target.
  • 12 unit tests in src/ado/discovery.rs for scope filtering, Direct/Consumer classification, and the local-lock fast-path lookup table.
  • Live secrets list --definition-ids 2434 against msazuresphere/4x4 confirms the legacy path still works.
  • Live secrets list --all-repos --source agents/release-readiness.md --debug confirms the discovery path correctly fans out POST /_apis/pipelines/{id}/preview calls to every definition in the project. (Returns no match for def 2434 because that definition was compiled before this PR; resolves once the dogfood recompile follow-up lands.)

Follow-up

Pre-marker definitions remain findable only via the process.yamlFilename fast-path until they're recompiled with this PR's ado-aw. A separate PR will run ado-aw compile against every dogfood pipeline and push the regenerated lock files. Tracked as a known follow-up; no flag day required because the legacy match path stays as a fallback.

…overy

Adds project-scope token management via two new flags on `secrets set`,
`secrets list`, and `secrets delete`:

- `--all-repos` — operate on every ado-aw pipeline ADO knows about in
  the project (direct ado-aw definitions *and* consumer pipelines that
  include ado-aw templates), regardless of which repo their root YAML
  lives in.
- `--source <path>` — filter to consumers of one specific template.

Both flags activate a new Preview-driven discovery path that calls
`POST /_apis/pipelines/{id}/preview` per definition and scans the
expanded YAML for an `# ado-aw-metadata: {…}` JSON marker. The legacy
lexical local-fixture matcher remains the default; `--definition-ids`
remains the explicit-ID escape hatch.

To make discovery work, every compiled pipeline now carries a marker
via a new always-on `AdoAwMarkerExtension`. The marker lives inside a
bash Setup-job step because ADO's Preview API strips top-of-document
comments during YAML expansion (verified empirically against live def
2434 in `msazuresphere/4x4`) but preserves comments inside step
bodies. Uniform across all four targets (standalone / 1es / job /
stage); no per-target placement special-casing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid overall — the three-layer design (marker extension → parse_marker_step → discovery) is clean and well-tested. A few concrete issues worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:441-444is_direct_match logic inversion

    if markers.len() != 1 {
        return markers.is_empty(); // returns true for 0 markers!
    }

    When called with 0 markers this returns true, which would classify a non-ado-aw definition as Direct. The current callsite in classify_definition guards against it, but this is a latent bug waiting for the next caller. Should be return false — 0 markers means "not ado-aw", not "direct match".

  • src/ado/discovery.rs:502-513 — doc comment contradicts implementation for UnknownRequiredParams
    The doc comment says:

    UnknownRequiredParams is propagated with a marker-less MatchedDefinition because the user explicitly opted in and may want to act on it.

    But the code returns None for UnknownRequiredParams (same arm as NotAdoAw, UnknownForbidden, PreviewFailed). Either the behavior needs to match the comment (propagate it) or the comment is stale. Silently dropping UnknownRequiredParams definitions means a user running secrets set --all-repos gets no indication that some pipelines were skipped because they require parameters — which is surprising.

  • src/ado/discovery.rs:563-575repo_url() missing percent-encoding

    Some(format!("{}/{}/_git/{}", self.org_url.trim_end_matches('/'), self.project, self.repo_name))

    ADO project names can contain spaces (e.g. "My Project"). The API returns repository.url as a URL where spaces are percent-encoded (My%20Project), but self.project stores the decoded string. After normalize_repo_url lowercases both sides, "my project" still won't equal "my%20project". DiscoveryScope::CurrentRepo (which is the scope when --source is used alone) would silently return no results for any project with spaces in its name. The rest of the codebase uses percent_encoding::utf8_percent_encode(&ctx.project, PATH_SEGMENT) for URL construction — same fix needed here.

🔒 Security Concerns

  • src/compile/extensions/ado_aw_marker.rs:966-975 — single-quote injection in generated echo step
    The echo line uses single-quoting around the user-controlled source value:
    echo 'ado-aw metadata: source={source} version={version} target={target}'
    If a source path contains a ' (e.g. agents/foo's-agent.md), the generated bash is syntactically broken. normalize_source_path strips " but not '. Since this is a compiler-generated step for arbitrary user-provided paths, normalize_source_path should also strip or escape single quotes — or the echo should use double-quoting with the JSON-safe form.

⚠️ Suggestions

  • src/detect.rs:1420,1428,1459 — stale #[allow(dead_code)] annotations
    MARKER_STEP_PREFIX, MarkerMetadata, and parse_marker_step are all actively consumed by src/ado/discovery.rs which this PR wires in. The "upcoming Preview-driven discovery" comments are no longer accurate. These attrs can be removed.

  • src/secrets.rs:1762 — fast-path bypassed in discovery
    resolve_definitions_via_discovery is called with local_lock_paths: None, meaning even definitions with a matching local lock file will spend a Preview API round-trip. If the caller has a local path available (e.g. via --source pointing to a local file), threading it through would save calls for direct pipelines. Not a correctness issue, but could matter in large projects.

✅ What Looks Good

  • The parse_marker_step design (line-by-line scan, skip-malformed, forward-compatible via #[serde(deny_unknown_fields)]-free deserialization) is exactly right for an embedded-in-YAML marker format.
  • The 8-permit semaphore with ADO_AW_PREVIEW_CONCURRENCY override is thoughtful — big project operators have an escape hatch without a code change.
  • The build_lock_path_map / normalize_ado_yaml_path key alignment is correct and tested.
  • discovered_to_matched keeping the rest of the CLI commands unchanged is a clean adapter pattern.
  • Test coverage is thorough — 6 unit tests for the parser, 4 integration tests across all 4 targets, 12 unit tests for scope/classification logic.

Generated by Rust PR Reviewer for issue #624 · ● 1.1M ·

- `is_direct_match`: return `false` (not `markers.is_empty()`) for the
  marker-less case. The previous code returned `true` for 0 markers,
  which would misclassify a non-ado-aw definition as `Direct` if any
  future caller hit that path. Belt-and-braces — the current callsite
  in `classify_definition` already guards against it.

- `discovered_to_matched`: doc-comment claimed `UnknownRequiredParams`
  was propagated; implementation drops it. Keep the safer drop
  behaviour (we can't act on a marker-less definition) but surface a
  `warn!` summary from `resolve_definitions_via_discovery` so
  `secrets set --all-repos` operators can see when pipelines were
  skipped because of required-template-parameters / 403 / other
  preview failures. Doc comment updated to match.

- `AdoContext::repo_url`: percent-encode `project` and `repo_name` so
  `DiscoveryScope::CurrentRepo` works for projects whose names contain
  reserved characters (e.g. spaces). The lowercase normalize step
  can't reconcile a decoded local name with the encoded form ADO
  returns in `repository.url`.

- `AdoAwMarkerExtension`: bash-quote-escape the source path embedded
  in the runtime `echo` line. A markdown filename containing `'`
  (e.g. `agents/foo's.md`) would otherwise produce syntactically
  broken bash. New `bash_single_quote_escape` helper applies the
  canonical `'\''` idiom; the JSON marker line keeps the raw value
  because JSON has no quoting concern with `'`. Two new tests cover
  the idiom and a `foo's-agent.md` path.

- `src/detect.rs`: drop the now-stale `#[allow(dead_code)]` attrs on
  `MARKER_STEP_PREFIX`, `MarkerMetadata`, and `parse_marker_step`.
  All three are actively consumed by `src/ado/discovery.rs`.

- `resolve_for_command`: thread local-lock-file paths into discovery
  so the `process.yamlFilename` fast-path can skip Preview calls for
  locally-compiled pipelines. Best-effort scan — failures fall back
  to Preview-for-everything cleanly.

All 1741 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid, well-tested feature addition — two real issues worth addressing before merge.

Findings

🔒 Security Concerns

  • src/compile/extensions/ado_aw_marker.rs:1021-1029 — The echo line in the generated bash step embeds the user-controlled source path inside a single-quoted string: echo 'ado-aw metadata: source={echo_source} ...'. The bash_single_quote_escape correctly handles ' quoting, but if a file is named something like agents/##vso[task.setvariable variable=FOO]value.md, the ##vso[ sequence survives into stdout verbatim, and the ADO agent's logging-command scanner would process it as a task command. normalize_source_path strips " and \n/\r but not ##vso[. Suggest adding a ##vso[ strip/replace in normalize_source_path (or directly before embedding in the echo) — consistent with the sanitisation already applied to other user-facing YAML surfaces.

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:465-492 (resolve_definitions_via_discovery) — The skip-counters (skipped_required_params, skipped_forbidden, skipped_failed) are incremented inside the .filter() closure before the source_filter check. When --source agents/foo.md is active, definitions with UnknownRequiredParams / Forbidden / PreviewFailed are counted even if they have no markers and are wholly unrelated to the requested template. The resulting warnings ("Discovery skipped N definitions whose Pipeline Preview requires templateParameters...") will mislead users into thinking they might be missing consumers of agents/foo.md when those N definitions were unrelated pipelines that simply failed. Fix: only count a definition toward skip totals when either source_filter is None, or when the definition does reference the target source (i.e., filter first, count second — or two-pass).

⚠️ Suggestions

  • src/ado/discovery.rs:38#![allow(dead_code)] with the comment "Wired into CLI commands in workstream S" is stale: resolve_definitions_via_discovery and discovered_to_matched are both called from secrets.rs in this PR. The file-level suppression silently hides any future dead-code regressions. Remove it (or narrow to only the pub async fn preview_pipeline if that's genuinely only used in tests for now).

  • src/main.rs:98-108 / src/secrets.rs:197-198--source alone (without --all-repos) silently activates DiscoveryScope::CurrentRepo. The CLI help text says "Activates the discovery code path; pairs with --all-repos..." but doesn't explain that without --all-repos it scopes only to the current repo's definitions. A user who runs secrets set GITHUB_TOKEN --source agents/foo.md and gets zero results because they're in a different repo will be confused. Consider adding [only searches definitions in the current repository; add --all-repos to search the full project] to the help string.

✅ What Looks Good

  • The empirical grounding (live spike against def 2434) and its impact on the architectural decision (marker in step body, not top-of-file comment) is excellent engineering rigour — well documented in both the PR description and the module doc comments.
  • bash_single_quote_escape + its dedicated test covering the '\'' idiom is exactly the right defensive approach; good test for single-quote in a path.
  • parse_marker_step is robust: trims leading whitespace, skips non-JSON hits, ignores unknown future fields via serde defaults. The six unit tests cover the edge cases well.
  • Fast-path short-circuit (local lock file → skip Preview call) with graceful fall-through when the file doesn't parse is a clean performance/resilience trade-off.
  • discovered_to_matched correctly gates writes to only Direct | Consumer statuses, preventing accidental action on ambiguous (UnknownRequiredParams, Forbidden) definitions.

Generated by Rust PR Reviewer for issue #624 · ● 771.9K ·

- `AdoAwMarkerExtension`: neutralise `##vso[` and `##[` logging-command
  prefixes in the source path before embedding it in the marker step's
  runtime `echo` line. Without this, a markdown filename like
  `agents/##vso[task.setvariable variable=FOO]value.md` would echo a
  literal `##vso[...]` sequence that the ADO build agent's stdout
  scanner treats as a task command — a logging-command-injection
  primitive any attacker controlling a filename could trigger. New
  `sanitize_for_vso_logging` helper mirrors the existing convention in
  `crate::agent_stats::sanitize_for_markdown` (`[vso-filtered][` /
  `[filtered][`). The `# ado-aw-metadata:` JSON line keeps the raw
  value (it's a YAML comment, not echoed to stdout). Two new tests:
  the sanitiser unit test and an end-to-end attack-payload roundtrip
  asserting the echo line is neutralised.

- `resolve_definitions_via_discovery`: the previous skip-counter
  implementation counted `UnknownRequiredParams` / `Forbidden` /
  `PreviewFailed` failures *before* applying `source_filter`, so under
  `--source agents/foo.md` the warnings would tell the user "N
  definitions skipped requiring template parameters" for definitions
  that had nothing to do with `agents/foo.md`. Split the counting:

  * without `--source`: per-status counts are honest (we're operating
    on every ado-aw pipeline) and the existing three warnings stand;
  * with `--source`: a single conservative `uninspectable` counter,
    surfaced as one warning that explicitly acknowledges we can't tell
    whether any of those skipped definitions would have been consumers
    of the filtered template.

- `src/ado/discovery.rs`: drop the file-level `#![allow(dead_code)]`.
  `resolve_definitions_via_discovery` and `discovered_to_matched` are
  now wired into `secrets.rs`; the suppression was hiding future
  dead-code regressions. Build is clean without it.

- `src/main.rs` (`SecretsCmd`): clarified `--source` help text — calls
  out that **without `--all-repos`, only the current repository is
  searched**. Saves the user-confusion case "I ran `secrets set
  GITHUB_TOKEN --source agents/foo.md` and got zero results" when
  they're in a different repo than the consumer pipelines.

All 1743 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid design with good test coverage — a few targeted fixes needed before merge.

Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:551source_filter is not normalized before comparison

    The marker's source field is stored in normalized form (via normalize_source_path: strips ./, converts \ to /, strips newlines). But the value from --source is passed through as-is:

    Some(src) => d.markers.iter().any(|m| m.source == src),

    A user passing --source ./agents/foo.md or --source agents\foo.md (Windows) silently matches nothing instead of finding the expected pipeline. The fix is to call normalize_source_path(Path::new(src)) on the filter value before the comparison, matching the exact normalization applied at compile time.

  • src/ado/mod.rs:222 — misleading revision doc comment

    The field comment says "Used as a cache key by Preview-driven discovery so an unchanged definition is only previewed once per session", but discover_ado_aw_pipelines never reads revision and no caching logic exists anywhere in the PR. The comment will mislead future contributors looking for the caching logic or trying to add it. Either remove the caching claim or add a // TODO: implement revision-keyed caching note.

  • src/ado/discovery.rs:69DiscoveryScope::Explicit is dead code

    The variant is defined, handled in apply_scope_filter, and exercised in unit tests, but no callsite ever constructs it. The --definition-ids path is handled entirely by the legacy resolve_definitions function before DiscoveryScope is ever created. Clippy may not catch this because the unit tests construct it — but it should at least get a comment (// Reserved for future use) or a #[cfg(test)] annotation on the test-only construction to make the intent explicit.

⚠️ Suggestions

  • src/ado/discovery.rs:373PreviewError::NotFound maps to PreviewFailed

    A 404 from the Preview endpoint almost certainly means the definition was deleted in the window between list_definitions and preview_pipeline. It'll be emitted as a skipped_failed warning under --all-repos, confusing operators who'll see counts go up and then wonder why. Silently dropping 404s (or mapping them to a distinct DiscoveryStatus::NotFound that's excluded from the warning) would be more signal-correct.

  • src/secrets.rs:41--source without --all-repos with no git remote gives a generic error

    When a user runs ado-aw secrets set KEY --source agents/foo.md outside a git repo, repo_url() returns None, apply_scope_filter returns an empty vec, and the user gets "No ado-aw pipelines found via Preview-driven discovery." — with no hint that the scoping culprit is the missing git remote. A targeted message like "Could not determine current repo URL; try adding --all-repos to search the full project" would be much friendlier.

✅ What Looks Good

  • The marker-step approach is well-motivated and the empirical grounding (verifying ADO's Preview API strips top-of-file comments but preserves step body comments) is exactly the right level of rigor for a design decision like this.
  • Security handling on the echo line is thorough: sanitize_for_vso_logging covers ##vso[ and ##[, bash_single_quote_escape uses the canonical '\'' idiom, and both have dedicated unit tests including an injection attempt.
  • The parse_marker_step forward-compat behavior (unknown JSON fields silently ignored, malformed entries logged at debug and skipped rather than panicking) is the right default for a discovery surface that'll outlive compiler versions.
  • The fast-path (local lock file → skip Preview HTTP call) is a nice latency win for the common case and the fallback-to-Preview design is sound.
  • The 8-permit semaphore with ADO_AW_PREVIEW_CONCURRENCY env override is thoughtful — large projects won't accidentally hammer the ADO rate limiter.

Generated by Rust PR Reviewer for issue #624 · ● 736.3K ·

- `resolve_definitions_via_discovery`: normalize the user-supplied
  `--source` value through `normalize_source_path` before comparing
  against `marker.source`. The marker stores the canonical form
  (`agents/foo.md`), so without normalization a user typing
  `--source ./agents/foo.md` or `--source agents\foo.md` (Windows)
  silently matched nothing. Re-export `normalize_source_path` from
  `crate::compile` so callers outside the compile module tree can
  reach it cleanly. New test asserts the three common variants
  (canonical, leading-`./`, backslash) all produce the same normalized
  string.

- `classify_definition` / `DiscoveryStatus::NotFound`: 404 from the
  Preview endpoint almost certainly means the definition was deleted
  between `list_definitions` and `preview_pipeline` (TOCTOU race).
  Previously routed through `PreviewFailed`, which inflated
  `skipped_failed` counts and confused operators. New
  `DiscoveryStatus::NotFound` variant is excluded from skip-warning
  counters in `resolve_definitions_via_discovery` and dropped by
  `discovered_to_matched` like the other non-actionable statuses.
  Debug-logged with the definition id+name so `--debug` users can
  still see what happened.

- `DefinitionSummary::revision`: doc comment claimed Preview-driven
  discovery uses it as a cache key, but no caching is implemented.
  Rewrote to say it's deserialised for a future cache and there is
  *no* caching yet, with a clear "see the discovery module for current
  behaviour" pointer.

- `DiscoveryScope::Explicit`: clarified the docstring to call out
  that no production callsite constructs this variant — `--definition-ids`
  uses the legacy `resolve_definitions` path before discovery ever
  runs. Variant is kept (not removed and not `#[cfg(test)]`-gated)
  because direct API consumers may want to feed pre-filtered IDs into
  discovery; the existing unit-test construction stays.

- `secrets::resolve_for_command`: bail early with a targeted error
  when `--source` is used without `--all-repos` outside a recognised
  ADO repo. The previous behaviour was a generic "No ado-aw pipelines
  found via Preview-driven discovery" message that gave no hint that
  the empty result was caused by the missing git remote. New error
  spells out the cause and suggests `--all-repos`.

All 1746 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid implementation — the security work is careful and the architecture is well-thought-out. A few items worth discussing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:354normalize_repo_url doesn't normalize percent-encoding vs literal spaces. repo_url() percent-encodes project/repo names before the comparison, but if ADO's REST API ever returns a repository.url with spaces decoded (e.g. my project instead of my%20project), CurrentRepo filtering silently drops everything and --source without --all-repos returns zero results with no helpful error. The comment claims this was verified against one live endpoint; it's a latent assumption worth documenting as such, or adding a decode→reencode round-trip to make the comparison encoding-independent.

  • src/ado/discovery.rs:565–571 — Consumer yaml_path is silently first-wins. discovered_to_matched sets yaml_path: d.markers.first().map(|m| m.source.clone()).unwrap_or_default(). For a Consumer that includes two templates (agents/a.md and agents/b.md), ado-aw list will consistently print → agents/a.md and silently omit agents/b.md. For secrets set, this means the displayed source reference in the summary may mislead the operator into thinking the variable was set only for one template when it was actually set on a pipeline that consumes both. The PR description acknowledges a richer surface is deferred, but the truncation isn't called out in the printed summary.

🔒 Security Concerns

  • src/compile/extensions/ado_aw_marker.rs:1183 + src/ado/discovery.rs:565##vso[ sanitization stops at the echo line. The echo correctly applies sanitize_for_vso_logging, but the raw source path flows unsanitized from the JSON comment → parse_marker_stepMarkerMeta(redacted) → MatchedDefinition::yaml_pathprint_matched_summary(which writes to stdout). Ifado-aw secrets set --all-reposis ever called from an ADO pipeline step (e.g., in an operator automation pipeline), a crafted filename like##vso[task.setvariable variable=MY_SECRET]x.md` would pass through the JSON comment (bash comment, harmless there) but then surface in the CLI's own stdout, where the ADO agent would scan it. The threat is non-zero whenever the CLI is run inside a pipeline. Given the current operator-facing usage this is low risk, but it's architecturally inconsistent — the echo sanitizes but the downstream data flow doesn't.

⚠️ Suggestions

  • src/ado/discovery.rs:282–286ADO_AW_PREVIEW_CONCURRENCY=0 is silently clamped. The .max(1) guard is correct and prevents a deadlock, but there's no warning when the user's env value parses to 0. A warn! or debug! log when the env value parses to a u64 but clamps to 1 would save future head-scratching.

  • src/ado/discovery.rs — no test for --source with CurrentRepo scope when repository.url is absent. test_scope_current_repo_excludes_definitions_without_repository exists, but there's no test verifying that resolve_for_command bails with the helpful "no ADO git remote detected" message when repo_name is empty and source_filter is set. Currently that bail lives in secrets.rs and the discovery module only sees the scope enum — the integration path is untested.

✅ What Looks Good

  • Security layering in the marker step is solid: sanitize_for_vso_logging + bash_single_quote_escape on the echo line, serde_json::json! for the JSON comment (properly encodes ", \, and special chars), and normalize_source_path strips \n/\r before anything is emitted. The attack paths for VSO command injection and bash quoting are well-covered where it matters most (the runtime surface).
  • The DiscoveryStatus::NotFound variant correctly handles the TOCTOU race (definition deleted between list and preview) without surfacing a noisy error or inflating the skip counters.
  • parse_marker_step is defensively written: trims leading whitespace (handles indented heredoc content), skips malformed JSON silently, accepts unknown fields via serde defaults, and is exercised by all the right edge cases.
  • Semaphore permits.max(1) prevents a deadlock if a user sets ADO_AW_PREVIEW_CONCURRENCY=0.
  • Fast-path lock-file lookup correctly normalizes paths (trim_start_matches('/') aligns with normalize_ado_yaml_path) so the optimization fires correctly in practice.
  • CompileContext::new signature change (takes input_path instead of input_dir) is a nice cleanup — less footgun surface, and the parent().unwrap_or(".") fallback is now encapsulated in one place.

Generated by Rust PR Reviewer for issue #624 · ● 1.6M ·

- `normalize_repo_url`: percent-decode before comparing, so a project
  named "My Project" matches whether ADO returns the encoded form
  (`My%20Project`) or the decoded form. The previous implementation
  assumed ADO always returns percent-encoded URLs; that assumption is
  documented in code now and the comparison is encoding-independent.
  New unit tests cover the encoded/decoded equivalence and the
  case-insensitive/trailing-slash behaviour.

- `discovered_to_matched`: stop silently truncating consumers that
  include multiple ado-aw templates. The `yaml_path` field used by
  `print_matched_summary` now joins every marker source with `, ` so
  e.g. `agents/a.md, agents/b.md` shows up honestly in the CLI
  summary. New unit test asserts both sources are surfaced.

- `##vso[` defence-in-depth: the marker step's runtime echo already
  neutralises `##vso[` and `##[` prefixes, but the same raw source
  string was flowing through `MarkerMetadata` -> `MatchedDefinition::yaml_path`
  -> `print_matched_summary` (which writes to stdout). When the CLI
  is invoked from inside an ADO pipeline step, the agent's stdout
  scanner would still pick up an attacker-controlled `##vso[...]`
  payload. New `sanitize_for_vso_logging` helper in the discovery
  module applies the same convention (`##vso[` -> `[vso-filtered][`,
  `##[` -> `[filtered][`) when building the `yaml_path`. New unit test
  asserts the sanitisation.

- `ADO_AW_PREVIEW_CONCURRENCY=0` now emits a `warn!` before clamping
  to 1, instead of silently masking the typo. Operators who set `=0`
  will see the warning and can correct the env value rather than
  wondering why their concurrency tuning had no effect.

- New unit test for the `--source` + no-git-remote bail in
  `secrets::resolve_for_command`: previously the helpful "no Azure
  DevOps git remote was detected; try --all-repos" error path was
  untested. Now asserted via a `tokio::test` that constructs an empty
  AdoContext and verifies the error message contains both the cause
  and the suggested mitigation.

All 1753 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — solid architecture, excellent test coverage, and careful attention to security. A few maintainability notes worth addressing before merge.

Findings

⚠️ Suggestions

  • [src/compile/extensions/ado_aw_marker.rs + src/ado/discovery.rs] sanitize_for_vso_logging is duplicated

    The function is defined privately and identically in both new files:

    fn sanitize_for_vso_logging(s: &str) -> String {
        s.replace("##vso[", "[vso-filtered][")
            .replace("##[", "[filtered][")
    }

    src/sanitize.rs already handles VSO neutralisation (for markdown context), and src/agent_stats.rs has a sanitize_for_markdown with similar intent. If the set of dangerous prefixes ever needs to expand, this must be updated in two independent places with no compiler enforcement. Worth extracting to src/sanitize.rs as a pub function and importing from both call sites.

  • [src/ado/discovery.rs:648-709] Side-effecting .filter() closure is confusing

    In resolve_definitions_via_discovery, the .filter() closure simultaneously decides inclusion and mutates four skip counters as a side effect. In the unfiltered case (normalized_filter.is_none()), it always returns true, so PreviewFailed/UnknownForbidden/UnknownRequiredParams items all land in kept and are only discarded later by discovered_to_matched. The variable name kept then misleads — it holds items that are about to be dropped. A two-pass approach (classify then warn then convert) or a manual for loop with explicit continue/push would be clearer and make it obvious that the warning counts and the final filtered set are derived from the same pass.

  • [src/ado/discovery.rs:474-508] is_direct_match silently misclassifies sources with non-.md suffixes

    let Some(stem) = marker.source.strip_suffix(".md").map(|s| format!("{s}.lock.yml")) else {
        return false;
    };

    If marker.source doesn't end in .md (e.g. a future extension allows .yaml-sourced agents), the function returns false and the definition is classified as Consumer instead of Direct. The test fixture names document the convention, but there's no defensive log or comment explaining why a non-.md source is a legitimate case for returning false vs a marker corruption. Worth a // non-standard source extension; treat conservatively as Consumer comment at minimum.

✅ What Looks Good

  • Security handling is thorough: both bash_single_quote_escape and sanitize_for_vso_logging are applied to the echo line before it reaches the generated bash. The JSON marker itself uses serde_json serialisation, so ", \n, and other special characters in source paths are correctly escaped. Injection tests exist for both vectors.
  • Forward-compat serde on MarkerMetadata: #[serde(default)] on all fields means a schema-2 marker with new fields won't break the current parser, and a schema-1 parser encountering schema-2 output degrades gracefully.
  • Fast-path for local lock files is a smart optimisation — skipping a network round-trip per already-compiled definition in the common case, while falling through to Preview as a defensive measure if the local parse fails.
  • ADO_AW_PREVIEW_CONCURRENCY=0 warning (clamp-with-warn instead of silent clamp) is the right UX choice.
  • Test coverage: 6 unit tests on the parser, 12 on discovery scope/classification logic, 4 integration tests for the marker across all four compile targets — comprehensive for a new infra feature.

Generated by Rust PR Reviewer for issue #624 · ● 1.4M ·

Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `is_direct_match`: return `false` (not `markers.is_empty()`) for the
  marker-less case. The previous code returned `true` for 0 markers,
  which would misclassify a non-ado-aw definition as `Direct` if any
  future caller hit that path. Belt-and-braces — the current callsite
  in `classify_definition` already guards against it.

- `discovered_to_matched`: doc-comment claimed `UnknownRequiredParams`
  was propagated; implementation drops it. Keep the safer drop
  behaviour (we can't act on a marker-less definition) but surface a
  `warn!` summary from `resolve_definitions_via_discovery` so
  `secrets set --all-repos` operators can see when pipelines were
  skipped because of required-template-parameters / 403 / other
  preview failures. Doc comment updated to match.

- `AdoContext::repo_url`: percent-encode `project` and `repo_name` so
  `DiscoveryScope::CurrentRepo` works for projects whose names contain
  reserved characters (e.g. spaces). The lowercase normalize step
  can't reconcile a decoded local name with the encoded form ADO
  returns in `repository.url`.

- `AdoAwMarkerExtension`: bash-quote-escape the source path embedded
  in the runtime `echo` line. A markdown filename containing `'`
  (e.g. `agents/foo's.md`) would otherwise produce syntactically
  broken bash. New `bash_single_quote_escape` helper applies the
  canonical `'\''` idiom; the JSON marker line keeps the raw value
  because JSON has no quoting concern with `'`. Two new tests cover
  the idiom and a `foo's-agent.md` path.

- `src/detect.rs`: drop the now-stale `#[allow(dead_code)]` attrs on
  `MARKER_STEP_PREFIX`, `MarkerMetadata`, and `parse_marker_step`.
  All three are actively consumed by `src/ado/discovery.rs`.

- `resolve_for_command`: thread local-lock-file paths into discovery
  so the `process.yamlFilename` fast-path can skip Preview calls for
  locally-compiled pipelines. Best-effort scan — failures fall back
  to Preview-for-everything cleanly.

All 1741 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `AdoAwMarkerExtension`: neutralise `##vso[` and `##[` logging-command
  prefixes in the source path before embedding it in the marker step's
  runtime `echo` line. Without this, a markdown filename like
  `agents/##vso[task.setvariable variable=FOO]value.md` would echo a
  literal `##vso[...]` sequence that the ADO build agent's stdout
  scanner treats as a task command — a logging-command-injection
  primitive any attacker controlling a filename could trigger. New
  `sanitize_for_vso_logging` helper mirrors the existing convention in
  `crate::agent_stats::sanitize_for_markdown` (`[vso-filtered][` /
  `[filtered][`). The `# ado-aw-metadata:` JSON line keeps the raw
  value (it's a YAML comment, not echoed to stdout). Two new tests:
  the sanitiser unit test and an end-to-end attack-payload roundtrip
  asserting the echo line is neutralised.

- `resolve_definitions_via_discovery`: the previous skip-counter
  implementation counted `UnknownRequiredParams` / `Forbidden` /
  `PreviewFailed` failures *before* applying `source_filter`, so under
  `--source agents/foo.md` the warnings would tell the user "N
  definitions skipped requiring template parameters" for definitions
  that had nothing to do with `agents/foo.md`. Split the counting:

  * without `--source`: per-status counts are honest (we're operating
    on every ado-aw pipeline) and the existing three warnings stand;
  * with `--source`: a single conservative `uninspectable` counter,
    surfaced as one warning that explicitly acknowledges we can't tell
    whether any of those skipped definitions would have been consumers
    of the filtered template.

- `src/ado/discovery.rs`: drop the file-level `#![allow(dead_code)]`.
  `resolve_definitions_via_discovery` and `discovered_to_matched` are
  now wired into `secrets.rs`; the suppression was hiding future
  dead-code regressions. Build is clean without it.

- `src/main.rs` (`SecretsCmd`): clarified `--source` help text — calls
  out that **without `--all-repos`, only the current repository is
  searched**. Saves the user-confusion case "I ran `secrets set
  GITHUB_TOKEN --source agents/foo.md` and got zero results" when
  they're in a different repo than the consumer pipelines.

All 1743 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `resolve_definitions_via_discovery`: normalize the user-supplied
  `--source` value through `normalize_source_path` before comparing
  against `marker.source`. The marker stores the canonical form
  (`agents/foo.md`), so without normalization a user typing
  `--source ./agents/foo.md` or `--source agents\foo.md` (Windows)
  silently matched nothing. Re-export `normalize_source_path` from
  `crate::compile` so callers outside the compile module tree can
  reach it cleanly. New test asserts the three common variants
  (canonical, leading-`./`, backslash) all produce the same normalized
  string.

- `classify_definition` / `DiscoveryStatus::NotFound`: 404 from the
  Preview endpoint almost certainly means the definition was deleted
  between `list_definitions` and `preview_pipeline` (TOCTOU race).
  Previously routed through `PreviewFailed`, which inflated
  `skipped_failed` counts and confused operators. New
  `DiscoveryStatus::NotFound` variant is excluded from skip-warning
  counters in `resolve_definitions_via_discovery` and dropped by
  `discovered_to_matched` like the other non-actionable statuses.
  Debug-logged with the definition id+name so `--debug` users can
  still see what happened.

- `DefinitionSummary::revision`: doc comment claimed Preview-driven
  discovery uses it as a cache key, but no caching is implemented.
  Rewrote to say it's deserialised for a future cache and there is
  *no* caching yet, with a clear "see the discovery module for current
  behaviour" pointer.

- `DiscoveryScope::Explicit`: clarified the docstring to call out
  that no production callsite constructs this variant — `--definition-ids`
  uses the legacy `resolve_definitions` path before discovery ever
  runs. Variant is kept (not removed and not `#[cfg(test)]`-gated)
  because direct API consumers may want to feed pre-filtered IDs into
  discovery; the existing unit-test construction stays.

- `secrets::resolve_for_command`: bail early with a targeted error
  when `--source` is used without `--all-repos` outside a recognised
  ADO repo. The previous behaviour was a generic "No ado-aw pipelines
  found via Preview-driven discovery" message that gave no hint that
  the empty result was caused by the missing git remote. New error
  spells out the cause and suggests `--all-repos`.

All 1746 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `normalize_repo_url`: percent-decode before comparing, so a project
  named "My Project" matches whether ADO returns the encoded form
  (`My%20Project`) or the decoded form. The previous implementation
  assumed ADO always returns percent-encoded URLs; that assumption is
  documented in code now and the comparison is encoding-independent.
  New unit tests cover the encoded/decoded equivalence and the
  case-insensitive/trailing-slash behaviour.

- `discovered_to_matched`: stop silently truncating consumers that
  include multiple ado-aw templates. The `yaml_path` field used by
  `print_matched_summary` now joins every marker source with `, ` so
  e.g. `agents/a.md, agents/b.md` shows up honestly in the CLI
  summary. New unit test asserts both sources are surfaced.

- `##vso[` defence-in-depth: the marker step's runtime echo already
  neutralises `##vso[` and `##[` prefixes, but the same raw source
  string was flowing through `MarkerMetadata` -> `MatchedDefinition::yaml_path`
  -> `print_matched_summary` (which writes to stdout). When the CLI
  is invoked from inside an ADO pipeline step, the agent's stdout
  scanner would still pick up an attacker-controlled `##vso[...]`
  payload. New `sanitize_for_vso_logging` helper in the discovery
  module applies the same convention (`##vso[` -> `[vso-filtered][`,
  `##[` -> `[filtered][`) when building the `yaml_path`. New unit test
  asserts the sanitisation.

- `ADO_AW_PREVIEW_CONCURRENCY=0` now emits a `warn!` before clamping
  to 1, instead of silently masking the typo. Operators who set `=0`
  will see the warning and can correct the env value rather than
  wondering why their concurrency tuning had no effect.

- New unit test for the `--source` + no-git-remote bail in
  `secrets::resolve_for_command`: previously the helpful "no Azure
  DevOps git remote was detected; try --all-repos" error path was
  untested. Now asserted via a `tokio::test` that constructs an empty
  AdoContext and verifies the error message contains both the cause
  and the suggested mitigation.

All 1753 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant