Skip to content

Validate published Aspire CLI builds end-to-end from AzDO + GH#17532

Draft
radical wants to merge 9 commits into
microsoft:mainfrom
radical:radical/staging-cli-smoke-workflow
Draft

Validate published Aspire CLI builds end-to-end from AzDO + GH#17532
radical wants to merge 9 commits into
microsoft:mainfrom
radical:radical/staging-cli-smoke-workflow

Conversation

@radical

@radical radical commented May 27, 2026

Copy link
Copy Markdown
Member

What this adds

A new post-publish validation path that exercises a just-published Aspire CLI build with the full CLI E2E suite and catches stale-channel-pointer failures, wired into both AzDO pipelines that produce a publish.

Three observable changes for operators and CI:

  1. GitHub Actions workflow Validate Published Build (.github/workflows/validate-published-build.yml) — workflow_dispatch with quality (dev/staging/release) + version. Runs the full Aspire.Cli.EndToEnd.Tests class-split matrix. Installs via the requested channel (so the same channel-resolution path real users hit gets exercised) and asserts post-install that aspire --version matches the supplied version (so a stale aka.ms/dotnet/.../{quality} pointer fails the run loudly).

  2. AzDO main build pipeline (eng/pipelines/azure-pipelines.yml) — new fire-and-forget dispatch_validate_published_build stage that runs once build succeeds and _PackagesPublished=true. Quality + dispatched ref are derived from the source branch: mainquality=dev, ref=main; release/X.Yquality=staging, ref=release/X.Y; internal/release/X.Yquality=staging, ref=release/X.Y (the internal/ prefix is stripped so the workflow YAML is loaded from the public mirror branch).

  3. AzDO release pipeline (eng/pipelines/release-publish-nuget.yml) — new ValidatePublishedBuild stage after GitHubTasks dispatches with quality=release, version from the derived release version, and workflow ref from the source build's branch. New SkipValidatePublishedBuild + ValidatePublishedBuildWorkflowRef advanced parameters for partial-failure re-runs and ref overrides.

Both AzDO dispatches are fire-and-forget — the CLI E2E suite is informational signal, not a release gate. Blocking the release on a CLI E2E run that's susceptible to transient GH Actions/Docker/test flakiness would punish releases for noise unrelated to the release itself. The dispatched run URL is printed in the AzDO log; re-runs go through the GitHub Actions UI.

Why this matters

Before this PR there was no automated verification that a publish actually reached the public install path. The previous in-pipeline installer validation (prepare_installers Full mode) downloads from the versioned https://ci.dot.net/public/aspire/{ver}/aspire-cli-{rid}-{ver}.zip URL, which by construction serves the version it names — so a publish that succeeded for assets but failed to flip a channel pointer would still pass installer validation and ship undetected.

The new workflow exercises aka.ms/dotnet/9/aspire/{ga|rc|daily}/daily directly, which is the URL real users hit. Combined with the post-install aspire --version assertion, the path now catches both "channel URL is reachable" and "channel points at the expected build."

Why depending on build (not prepare_installers)

Arcade v3 publishing (enablePublishUsingPipelines: true on the build job) uploads native archives to ci.dot.net during the build stage itself, not in a separate post-build stage. prepare_installers' own Full validation works for exactly that reason and likewise only depends on build. Dispatching after build (in parallel with prepare_installers) removes ~20–30 minutes of latency on the validation signal at no correctness cost.

Shared plumbing

  • eng/pipelines/scripts/dispatch-github-workflow.ps1 — renamed from dispatch-release-github-tasks.ps1 (the body was already generic). Added -NoWait switch so callers can fire-and-forget without the run-id-resolve + poll path. Inputs are passed as a hashtable.
  • eng/pipelines/templates/dispatch-github-workflow-steps.yml — generic dispatch step template. Inputs flow as an object parameter that's serialized via AzDO's convertToJson() at template expansion and parsed by the pwsh body from an env var — avoids the AzDO restriction on template directives inside string scalars and dodges PowerShell-quoting concerns for input values.
  • eng/pipelines/templates/dispatch-validate-published-build-job.yml — single-job template specific to validate-published-build.yml. Both pipelines invoke it with just quality + version + ref (default main).

Channel-pointer assertion in test infra

  • tests/Shared/CliInstallStrategy.cs — new WithExpectedVersion(string) builder and a new ASPIRE_E2E_EXPECTED_VERSION env var applied as an override at the tail of Detect(). When set on a strategy that doesn't already carry a deterministic ExpectedVersion (which LocalArchive and DotnetTool with explicit version do), the override populates ExpectedVersion and the existing VerifyAspireCliVersionAsync path runs aspire --version post-install and fails with CLI_VERSION_MISMATCH:expected=X actual=Y on a stale channel.
  • tests/Aspire.Cli.EndToEnd.Tests/Helpers/CliInstallStrategyTests.cs — 6 new tests cover the override-apply path on quality + version strategies, the deterministic-ExpectedVersion no-op, empty-string handling, and the builder including its ArgumentException on empty input.

Daily smoke convenience

.github/workflows/tests-daily-smoke.yml gains an optional expectedVersion workflow_dispatch input that flows into ASPIRE_E2E_EXPECTED_VERSION. Scheduled runs leave it empty (current behavior); operators can dispatch with a specific value to verify channel resolution on demand during outage triage.

Validation

End-to-end exercised against live infra prior to this PR being shipped:

  • GH-sidegh workflow run "Daily CLI Smoke Tests" --ref ankj/staging-cli-smoke-workflow -f quality=staging -f expectedVersion=13.4.0 (run) — proved the channel install + ASPIRE_E2E_EXPECTED_VERSION assertion path on actual GitHub Actions infra.
  • AzDO-side — a temporary feature-branch dispatch stage (since reverted) confirmed the AzDO templates compile against the live AzDO engine and successfully fire workflow_dispatch against the GitHub API as the aspire-repo-bot GitHub App.
  • Unit tests — full CliInstallStrategyTests class passes (62 / 62) including the 6 new tests.

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes (test-infra changes covered; AzDO + GH wiring exercised end-to-end as described above)
  • Did you add public API?
    • No
  • Does the change make any security assumptions or guarantees?
    • No

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17532

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17532"

@github-actions

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 2 jobs were identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Matched test failure patterns (1 test)
  • Aspire.Cli.EndToEnd.Tests.KubernetesDeployWithValkeyTests.DeployK8sWithValkey — Unable to access container registry during publish

@radical radical changed the title Add workflow to validate published Aspire CLI builds Validate published Aspire CLI builds end-to-end from AzDO + GH May 30, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@github-actions

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

radical and others added 7 commits May 31, 2026 02:54
…Strategy

Channel-based installs (quality=dev / staging / release) go through aka.ms
aliases whose targets can be stale, so the install itself can't catch
"channel pointer didn't get updated after publish". Add an optional
post-install version assertion: ExpectedVersion on a strategy causes
VerifyAspireCliVersionAsync to run `aspire --version` after install and
fail with CLI_VERSION_MISMATCH if it doesn't match.

The selector layer exposes this as:

  - WithExpectedVersion(version) builder for callers constructing a
    strategy directly. Validates the value against the same regex
    FromVersion uses — the value is interpolated unquoted into a bash
    equality check, so the regex doubles as a shell-safety guard.

  - ASPIRE_E2E_EXPECTED_VERSION env var applied as an override in
    Detect() when the chosen strategy doesn't already carry a
    deterministic ExpectedVersion (LocalArchive from nupkg, DotnetTool
    with explicit version still win). Whitespace is trimmed and
    treated as unset so a blank workflow_dispatch input doesn't trip
    the assertion.

Tests cover the builder, both whitespace and invalid-format rejection,
the env override on quality + version strategies, the no-op for
deterministic strategies, and trim/whitespace handling on the env path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The release-publish-nuget pipeline dispatched release-github-tasks.yml
via an inline pwsh block + a per-script-call ASPIRE_BOT_APP_ID/
PRIVATE_KEY env mapping. Add a second AzDO dispatcher (for the upcoming
validate-published-build workflow) and they'd both repeat the same
JSON-encode + secret-mapping ceremony.

Extract:

  - dispatch-github-workflow-steps.yml — reusable template covering the
    pwsh dispatch step, JSON-via-env input passing (avoids embedding a
    JSON literal in the script body), and the post-step that surfaces
    the dispatched run URL. Takes workflowFile, workflowRef, inputs,
    and a noWait switch (fire-and-forget vs wait-for-completion).

  - dispatch-github-workflow.ps1 — renamed from
    dispatch-release-github-tasks.ps1 (the body was already generic).
    Adds a -NoWait switch so fire-and-forget callers skip the run-id
    resolution + polling path.

Refactor DispatchGitHubTasksJob to invoke the new template. As part of
that, lift the source-build release-branch derivation up to PrepareJob's
deriveReleaseVersion step (releaseBranchEffective) so it can be reused
by future consumers — DispatchGitHubTasksJob now reads it via
stageDependencies rather than recomputing inline. Drop ReleaseBranchDerived.

No behavior change to release-github-tasks.yml's dispatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Shared Node helper for workflows that auto-create a tracking issue
when they fail, modelled on the existing inline pattern in
tests-daily-smoke.yml and deployment-tests.yml but using the search
API for exact-title dedup (no listForRepo-window churn, no substring
collision between e.g. "13.4" and "13.4.1").

Callers compose their own title/body/labels (the parts that genuinely
differ — artifact parsing, prose, dedup key). The helper owns only
search → exact-title-match → comment-or-create.

Loaded from workflows via the established
require(${GITHUB_WORKSPACE}/.github/workflows/...) pattern used by
create-failing-test-issue.js and workflow-command-helpers.js.

Tests in tests/Infrastructure.Tests/WorkflowScripts/ drive the helper
against a fake Octokit, covering:
- buildDedupQuery shape with JSON-escaped phrase quoting
- create path when no existing issue matches
- comment path when an exact-title match exists
- exact-title-match rejection of substring hits (e.g. "13.4" search
  returning a "13.4.1" issue must NOT collapse)
- create path on search failure (don't lose the failure report)

No caller yet; introduced separately so the workflow that consumes
it stays focused.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a workflow that runs the full Aspire.Cli.EndToEnd.Tests suite
against a CLI installed via a published channel (quality + version
inputs). Splits every test class into its own matrix job so the shape
matches PR validation.

Quality drives the channel install; the version input flows into
ASPIRE_E2E_EXPECTED_VERSION so `aspire --version` post-install asserts
the channel actually resolved to the requested build — catches "publish
succeeded but channel pointer is stale" failures that a versioned
install path would silently mask.

On failure or cancellation (per-leg timeout, GHA infra cancel), opens
or comments on a (version, quality) tracking issue via the shared
create-failure-tracking-issue helper. validate-published-build is
dispatched fire-and-forget from AzDO, so without this nobody sees a
failure unless they're watching the Actions UI.

Also adds a thin job template, dispatch-validate-published-build-job.yml,
so the AzDO callers added in a later commit can share the dispatch
shape.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wires both AzDO pipelines to dispatch validate-published-build.yml
(fire-and-forget) so each production-branch build and release exercises
the public install path end-to-end:

  - azure-pipelines.yml gains a dispatch_validate_published_build stage
    that fires when _PackagesPublished=true (main, release/*,
    internal/release/*). Quality derives from the source branch — main
    publishes to 'daily' (quality=dev) per build_sign_native.yml,
    release branches publish to 'staging'. Workflow ref tracks the
    source branch so the dispatched workflow YAML matches the test
    source for that channel.

  - release-publish-nuget.yml gains a ValidatePublishedBuild stage that
    dispatches with quality=release and the just-published version. It
    depends on Release (the stage that publishes NuGet packages and
    promotes the channel pointer), not GitHubTasks, because the channel
    pointer — not the GitHub release — is what aka.ms/dotnet/.../release
    resolves to. PrepareArtifacts is required to have succeeded so the
    version macro is populated; Release.result == 'Skipped' is permitted
    so operator-driven partial reruns work. Two new advanced parameters
    cover skip and ref override.

Both callers use the dispatch-github-workflow-steps template added in
the previous commit, plus a thin
dispatch-validate-published-build-job.yml that pins the workflow
filename and matrix shape.

Dispatched fire-and-forget because validate-published-build is
informational signal, not a release gate — blocking on CLI E2E
flakiness would punish releases for noise unrelated to the release
itself. The dispatched workflow opens a tracking issue on failure so
the signal isn't silently lost.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssertion

Optional workflow_dispatch input flows into ASPIRE_E2E_EXPECTED_VERSION
so an operator triaging a channel outage can dispatch the smoke run
with the version the channel should currently be pointing at and have
`aspire --version` assert it post-install. Scheduled runs leave it
empty (current behavior).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the new stage to the release-process overview, documents the two
new advanced parameters (SkipValidatePublishedBuild,
ValidatePublishedBuildWorkflowRef), and updates the Step 5 monitoring
narrative — GitHubTasks is no longer the final stage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical force-pushed the radical/staging-cli-smoke-workflow branch from d4d8b89 to a9687ea Compare May 31, 2026 06:57
@github-actions

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@radical radical added the area-engineering-systems infrastructure helix infra engineering repo stuff label May 31, 2026
…moke-workflow

# Conflicts:
#	docs/release-process.md
#	eng/pipelines/azure-pipelines.yml
#	eng/pipelines/release-publish-nuget.yml
@radical radical mentioned this pull request Jun 9, 2026
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-engineering-systems infrastructure helix infra engineering repo stuff

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant