Skip to content

perf(ci): cut microsoft-aspire pipeline wall-clock from 121min to ~57min#17760

Merged
radical merged 7 commits into
microsoft:mainfrom
radical:radical/azdo-pipeline-perf
Jun 9, 2026
Merged

perf(ci): cut microsoft-aspire pipeline wall-clock from 121min to ~57min#17760
radical merged 7 commits into
microsoft:mainfrom
radical:radical/azdo-pipeline-perf

Conversation

@radical

@radical radical commented May 31, 2026

Copy link
Copy Markdown
Member

Cuts the internal AzDO microsoft-aspire pipeline wall-clock by roughly 50% — from ~121 min on main to ~57 min after this restructure. No content changes to what ships; only build-graph + agent choices.

Wall-clock measurements

Build Branch state Wall-clock
2987287 main baseline 121 min
2988127 this restructure 56.6 min

Both partiallySucceeded only because of the standard "Secure Supply Chain Analysis" warning — zero errors.

Stage graph

flowchart LR
    bsn["build_sign_native"]
    bex["build_extension"]
    bld["build"]
    asm["assemble<br/>sign npm tgz<br/>unified -publish<br/>BAR push"]
    tt["template_tests"]
    pi["prepare_installers<br/>winget / homebrew<br/>npm install validation"]
    bsn --> asm
    bex --> asm
    bld --> asm
    bld --> tt
    bsn --> pi
Loading

Stages with dependsOn: [] start in parallel. assemble waits for the three build stages and owns the unified -publish that emits the BAR AssetManifest covering managed nupkgs, native CLI archives, dashboard runtime zips, and the signed VS Code extension VSIX. prepare_installers reads its version inputs from a ComputeVars step on the build_sign_native linux-x64 job (not from assemble), so installer prep runs in parallel with asset-manifest emission.

What changed

Parallel build + assemble. build no longer dependsOn: build_sign_native; the managed compile/sign/pack doesn't need native CLI binaries until -publish. New assemble stage depends on build_sign_native + build + build_extension and runs the unified -publish. New managed_packages_shipping + managed_dashboard_artifacts pipeline-artifact handoffs.

Template tests stage. Aspire.Templates.Tests moved to a template_tests stage depending only on build, so assemble + prepare_installers no longer wait for it.

VS Code extension as its own stage (build_extension.yml). The extension's inputs are only extension/ + signVsix.proj — no managed nupkg or native CLI archive coupling — so it runs in parallel with the managed compile and native CLI builds and uploads aspire-vscode-extension for Assemble to download.

prepare_installers parallel with assemble. Previously read three version variables (aspireVersion / aspireArtifactVersion / installerChannel) from the assemble job, forcing serial execution. A new ComputeVars step on the build_sign_native linux-x64 job evaluates those variables directly from MSBuild and exposes them as job outputs, so installer prep depends only on build_sign_native.

Assemble on Windows + npm tgz signing. Assemble runs on Windows because MicroBuild Authenticode signing of aspire.js via the MicrosoftDotNet500 cert is rejected by Linux ESRP ("This file format cannot be signed because it is not recognized"). Per-RID native jobs can't sign the npm tgz files — PackDotnetTool (in eng/AfterSigning.targets) runs after Arcade's Sign target, so the tgz files only exist after Sign has completed — so a second -sign pass runs in Assemble. The npm tgz sign happens before downloading managed packages into Shipping, to keep SignToolTask from re-submitting Aspire.Hosting.Orchestration.<rid>.nupkg's nested manifest.cat to ESRP. eng/Publishing.props' inline Compress-Archive call switched from powershell (Windows PowerShell 5.1 only) to pwsh.

Signature validation extracted and tested (eng/scripts/validate-npm-package-signatures.ps1). The post-signing PGP marker check that was inlined in BuildAndTest.yml is now a script the Assemble stage calls, with 12 behavioral test executions covering missing sidecars, short sidecars, no-PGP-marker, ASCII-armored and binary OpenPGP packet tags (0x88..0x8B old format, 0xC2 new format per RFC 9580 §4.3/§5.2), and combined-failure accumulation. The script is gated by a new validateNpmPackageSignatures template parameter so callers that don't pack the npm tgz files into their Shipping directory (the parallelized build stage) can opt out.

Narrow restores. Both the native preStep restore and the assemble publish restore walked the full ProjectToBuild graph (~100-393 csprojs) just to bootstrap the SDK. Adding SkipManagedBuild / SkipTestProjects / SkipPlaygroundProjects / TargetRids on the preStep and -projects src/Aspire.Hosting/Aspire.Hosting.csproj on the publish collapses both. Tools.proj still restores unconditionally in InitializeCustomToolset. The official Windows Build had the same hole — SkipPlaygroundProjects=true was missing and was walking 173 playground csprojs through restore + build + pack.

Parallelize native-archive downloads (eng/scripts/download-native-archives.ps1). Two DownloadPipelineArtifact@2 tasks were serializing 7 artifacts AND downloading each twice (once per pattern slice). The new script lists artifacts via the AzDO REST API and fans out via Start-ThreadJob, fetching each Container artifact once over its downloadUrl. Covered by 6 tests in tests/Infrastructure.Tests/PowerShellScripts/DownloadNativeArchivesTests.cs, including a zip-slip guard. Bearer-token error paths surface HTTP status + exception message only — not the full ErrorRecord — to avoid leaking the request's Authorization header on some failure modes.

BuildAndTest.yml parameterized. New runPublish / runTemplateTests / buildExtension / validateNpmPackageSignatures parameters, all defaulting to false (or true where the safe default is to keep the inlined behavior). The official build stage leaves the new opts off (work is owned by sibling stages); azure-pipelines-unofficial.yml (monolithic, no sibling stages) sets runPublish: true + runTemplateTests: true so a manual /azp run of the unofficial pipeline still exercises -publish + Aspire.Templates.Tests inline.

Branch-gated variable groups in common-variables.yml. Publish-Build-Assets / DotNet-HelixApi-Access / SDL_Settings have branch-restricted ACLs in AzDO; consuming any at pipeline scope causes 1ES to inject a per-stage Branch control check that blocks every stage on contributor branches before any work can start. Only the BAR publish job and helix telemetry need those tokens, neither of which runs as a real publish on a contributor branch. The groups now only load on main / release/* / internal/release/*.

Sanity check. A post-publish step asserts the resulting AssetManifest has ≥50 items, so if Arcade's Publish target ever silently fires with an empty input set the build fails loudly instead of shipping nothing.

Considered but not done

Merging Publish Assets into Assemble would save another ~5-6 min — Publish Assets is 7.3m for ~1 min of real work, the rest is per-job 1ES PT overhead. But it's auto-injected by Arcade's eng/common/core-templates/job/publish-build-assets.yml, which is a job-level template; collapsing would require duplicating ~30 lines of wrapper YAML. Parked rather than replicating Arcade YAML in this PR.

Call-outs

  • The AzDO official pipeline this PR restructures does not run on PRs — only on scheduled triggers, manual /azp run aspire-tests, and pushes to main / release/*. PR validation cannot exercise this end-to-end; verify with a manual /azp run aspire-tests against this branch before merging.
  • Public PR pipeline (public-pipeline-template.yml) is unchanged; playgrounds + template tests still validated there.
  • SkipPlaygroundProjects=true is added to the managed Build's invocation. Playgrounds are validated by the GH Actions Playground test job (tests.ymlrun-tests.ymlAspire.Playground.Tests, which transitively builds the playgrounds it references via ProjectReference). The ~50 platform-specific playgrounds not referenced by Aspire.Playground.Tests (Azure/Foundry/GitHubModels/Go/Java/Python/TypeScript/etc.) used to compile in AzDO and now don't compile in CI at all; treating these as samples not gated by CI.
  • eng/Publishing.props now calls pwsh. Developers running Publish locally on Windows will need PowerShell 7+ installed.
  • Branch-gated variable groups (above): non-prod branches running this pipeline will no longer load Publish-Build-Assets / DotNet-HelixApi-Access / SDL_Settings. If a future stage requires them on a non-prod branch, the gate in common-variables.yml must be expanded.
  • Drive-by: 15 near-identical private FindRepoRoot() methods in Infrastructure.Tests consolidated into a single tests/Infrastructure.Tests/Shared/RepoRoot.cs helper.

@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17760

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17760"

@radical radical added the area-engineering-systems infrastructure helix infra engineering repo stuff label May 31, 2026
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

radical and others added 4 commits June 6, 2026 02:29
…nloader

Add two PowerShell helpers and the behavioral xUnit coverage that exercises
them. Used by the parallel-pipeline restructure that follows.

eng/scripts/download-native-archives.ps1 — fetches per-RID build artifacts
in parallel via Start-ThreadJob, opens each downloaded zip in-memory, and
writes only the entries matching the aspire-cli archive / Aspire.Cli nupkg
/ microsoft-aspire-cli npm tgz shapes to the configured staging dirs.
Replaces the dual DownloadPipelineArtifact@2 pattern used by the previous
assemble flow, which serialized each artifact's bytes once per
itemPattern slice (~100s wasted on the Assemble critical path for 7
native_archives_<rid> artifacts).

  Handles both ThreadJob module names (PS 7.0-7.3 bare `ThreadJob` and PS
  7.4+ `Microsoft.PowerShell.ThreadJob`) with a PSGallery install fallback,
  so the script works across the mixed-vintage Windows/Linux/macOS images
  AzDO can route to. Includes a zip-slip guard — entries whose
  canonicalized destination escapes <target>/<artifactName>/ are rejected
  before extraction.

eng/scripts/validate-npm-package-signatures.ps1 — post-signing validation
that catches the most likely silent failure mode in Arcade/ESRP signing:
the sidecar gets emitted (so a file-existence check passes) but the
content is empty or garbage. Confirms each microsoft-aspire-cli*.tgz.sig
is at least 64 bytes and starts with the ASCII-armored PGP header
(RFC 9580 §6) or an OpenPGP binary signature packet tag (old-format
0x88..0x8B or new-format 0xC2, RFC 9580 §4.3 / §5.2). Accumulates both
missing-sidecar and invalid-sidecar failures before exiting so operators
diagnosing a real signing outage see every problem in one CI run instead
of fixing one and rediscovering the next.

Behavioral coverage:

tests/Infrastructure.Tests/PowerShellScripts/DownloadNativeArchivesTests.cs
  drives the downloader against an in-process HttpListener that mimics
  AzDO's /_apis/build/builds/{id}/artifacts endpoint and per-artifact
  downloadUrl. Six tests: missing-token failure, no-matching-artifacts
  failure (with available-artifact diagnostic), wrong-type rejection,
  happy-path extraction + layout preservation, partial-failure reporting,
  and zip-slip rejection.

tests/Infrastructure.Tests/PowerShellScripts/ValidateNpmPackageSignaturesTests.cs
  drives the validator against mocked sidecars in a temp Shipping
  directory. Twelve tests covering missing-dir, no-tarballs-found,
  missing sidecar, short sidecar, no-PGP-marker, ASCII-armored sidecar,
  binary OpenPGP packet tags (theory: 0x88, 0x89, 0x8A, 0x8B, 0xC2), and
  the combined "report all failures in one pass" case.

ReleasePublishNugetPipelineTests.cs change keeps the existing
NpmSignatureSidecarsAreContentSanityChecked test pointed at
release-publish-nuget.yml, which still inlines the same PGP byte-tag check
verbatim (the script extraction here covers only BuildAndTest.yml's copy
of the logic, not the release pipeline's). When release-publish-nuget.yml
is refactored to call the shared script, that test can move to
ValidateNpmPackageSignaturesTests too.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ages

Template-layer primitives consumed by the pipeline restructure that
follows. Additive on its own — no existing caller breaks.

eng/pipelines/templates/build_extension.yml (NEW) — standalone job that
builds, signs (MicroBuild + VSCodePublisher cert), verifies (vsce
verify-signature on the .signature.p7s sidecar), and publishes the VS
Code extension VSIX as `aspire-vscode-extension`. Extracted from the
managed Windows build job so it can run in parallel with managed compile
and native CLI builds; the extension's inputs are only extension/ +
signVsix.proj, so it has no real coupling to the other halves of the
build.

eng/pipelines/templates/build_sign_native.yml — adds a `ComputeVars` step
gated by a new `computeVarsRid` parameter. When set, the matching per-RID
native job evaluates Aspire.Hosting.AppHost PackageVersion +
Aspire.Dashboard PackageVersion (with SuppressFinalPackageVersion=true) +
DotNetFinalVersionKind / StabilizePackageVersion and exposes the three
installer-pipeline variables (aspireVersion / aspireArtifactVersion /
installerChannel) as job outputs. Lets the downstream prepare_installers
stage depend on the shorter build_sign_native stage rather than waiting
for assemble. Includes the safety gate that refuses to publish prerelease
installer artifacts under a stable release-branch name, $LASTEXITCODE
checks on every msbuild invocation (so an evaluation failure surfaces
instead of falling through to channel='prerelease'), and an $officialArgs
splat that adds zero positional args on PR / test-signed builds where
_OfficialBuildIdArgs is empty.

eng/pipelines/templates/npm-cli-install-validation-steps.yml — refactored
to pull microsoft-aspire-cli*.tgz directly from the per-RID
native_archives_<rid> artifact (which already contains it) via a new
required `nativeArchiveArtifactName` parameter, instead of from the
downstream BlobArtifacts. Lets the prepare_installers stage start as soon
as the producing native job completes. Required (no default) because AzDO
disallows template expressions in parameter defaults — callers with a
literal rid use `native_archives_${{ replace(rid, '-', '_') }}`; the
macOS caller resolves the arch in a preStep and threads the name in as a
runtime variable.

eng/pipelines/common-variables.yml — gates the Publish-Build-Assets /
DotNet-HelixApi-Access / SDL_Settings variable groups behind a branch
check (main / release/* / internal/release/* only). These groups have
branch-restricted ACLs in AzDO; consuming any at pipeline scope causes
1ES to inject a per-stage Branch control check that blocks every stage
on contributor branches before any work can start. Only the BAR publish
job and helix telemetry actually need those tokens, and neither runs as
a real publish on a contributor branch. Adds `_IsProductionBranch` as a
single source of truth for the same branch set so downstream stages can
gate consistently. Adds NPM_VALIDATION_SUMMARY_* artifact names and
COREPACK_ENABLE_DOWNLOAD_PROMPT=0 to keep corepack from hanging on
stdin in CI.

eng/Publishing.props — replace `powershell -NoProfile ...` with
`pwsh -NoProfile ...` for the per-VSIX zip wrapper invocation. The
restructured pipeline runs Assemble on agents where Windows PowerShell
5.1 is not installed (azurelinux containers, pool migrations); using pwsh
matches every other PowerShell invocation in the build.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… assemble / installers

Restructure the AzDO official pipeline (`azure-pipelines.yml`) and the
shared `BuildAndTest.yml` template so the managed build, the native CLI
build, the VS Code extension build, and Aspire.Templates.Tests run in
parallel instead of serialized inside one Windows job. Cuts the official
build wall-clock by ~30 minutes on a typical run.

Stage graph after this change:

    build_sign_native ─┐
    build_extension ───┼──► assemble (signs + unified -publish + BAR push)
    build ─────────────┤
                       └──► template_tests
    build_sign_native ────► prepare_installers (winget / brew / npm)

Stages with `dependsOn: []` start in parallel. `assemble` waits for the
three build stages and owns the unified -publish that emits the BAR
AssetManifest covering managed nupkgs, native CLI archives, dashboard
runtime zips, and the signed VSIX. `prepare_installers` reads its three
version inputs from the build_sign_native (linux-x64) ComputeVars step,
not from `assemble`, so installer prep runs in parallel with
asset-manifest emission.

assemble runs on Windows (required for MicroBuild Authenticode signing
of aspire.js via the MicrosoftDotNet500 cert — Linux ESRP rejects that
file format). It signs the npm tgz files before downloading the managed
packages into Shipping/ — Arcade SignTool walks every container under
ArtifactsShippingPackagesDir at sign time, and the default
ItemsToSign=`**\*.nupkg` would otherwise re-submit
Aspire.Hosting.Orchestration.<rid>.nupkg's nested manifest.cat to ESRP.

assemble also includes an AssetManifest sanity check that fails the
build if the unified -publish emits fewer than 50 items, guarding
against a future change to the narrowed `-restore -projects <small>`
publish-call scope that silently produces an empty manifest and ships
nothing to BAR.

BuildAndTest.yml is now parameterized so the same template serves both
the parallel official pipeline (where assemble owns -publish, the
template_tests stage owns templates, build_extension owns the VSIX, and
the Assemble stage owns npm sig validation) and the monolithic
unofficial pipeline (where everything still runs inline in the Windows
job). New parameters: runPublish, runTemplateTests, buildExtension,
validateNpmPackageSignatures — all default to false and the unofficial
caller opts back in to preserve its pre-restructure behavior on manual
/azp run validations.

azure-pipelines-unofficial.yml wires the new parameters (runPublish:
true, runTemplateTests: true) so the unofficial pipeline keeps running
Publishing.props validations and Aspire.Templates.Tests inline.

Other notable changes inside this commit:

- BAR publish (`enablePublishBuildAssets` / `publishAssetsImmediately`)
  is gated to main / release/* / internal/release/* in assemble. On
  contributor branches the auto-injected Asset_Registry_Publish job
  would otherwise fail at runtime for a missing MaestroAccessToken
  (the Publish-Build-Assets variable group has a branch ACL).
- prepare_installers' macOS npm-install validation resolves osx-arm64
  vs osx-x64 from Agent.OSArchitecture in a preStep and threads the
  computed `native_archives_osx_<arch>` artifact name in as a runtime
  variable, since AzDO ${{ }} template substitution can't see runtime
  values and would otherwise resolve to the literal
  `native_archives_$(NpmValidationRid)`.
- The managed Windows Build job uploads `managed_packages_shipping`
  and `managed_dashboard_artifacts` pipeline artifacts so the Assemble
  and template_tests stages can stage them back into the canonical
  Shipping / DashboardArtifacts paths their downstream tooling expects.
- Native CLI archive + per-RID nupkg downloads moved out of the
  Windows Build job — they only matter at -publish time, which now
  lives in Assemble.
- /p:SkipPlaygroundProjects=true added to the managed Build's
  invocation. Playgrounds are validated by the GH Actions Playground
  test job (`tests.yml` → `run-tests.yml` → `Aspire.Playground.Tests`,
  which transitively builds the playgrounds it references); the
  AzDO build's compile of every playground was redundant with that.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace 15 near-identical private FindRepoRoot() methods (and 20 call
sites) across Infrastructure.Tests with a single Shared/RepoRoot.cs
helper exposing RepoRoot.Path. All copies walked the test assembly's
base directory looking for Aspire.slnx; they only differed in variable
name, exception type, and error wording. AGENTS.md asks tests to reuse
shared helpers in tests/Shared or per-project Helpers/ rather than
recreating the same pattern per file.

Two fixture classes (DownloadFailingJobLogsFixture,
CreateFailingTestIssueFixture) exposed a public `RepoRoot` property
that cached the result. The grep for external consumers turned up none
— the property was only used internally to compute neighbouring path
properties. Drop both, inline RepoRoot.Path at the use sites.

CreateFailingTestIssueWorkflowTests' previous copy walked up looking
for a .git directory or file rather than Aspire.slnx. Both markers
coincide at the Aspire repo root, including in worktrees (`.git` as a
file), so the consolidation is behaviour-equivalent for any consumer
running tests from within an Aspire checkout.

Net: -257 / +20 LOC.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Gate validate-npm-package-signatures.ps1 in BuildAndTest.yml on the
   new validateNpmPackageSignatures template parameter that the
   parallelized `build` stage was already passing as false. Previously
   the parameter was declared but never referenced, so on internal
   main/release (_SignType=real) the step ran against a Shipping
   directory with no microsoft-aspire-cli*.tgz and exited 1. PR builds
   don't trip it because they use _SignType=test, so the bug was
   latent until the first real-signed run.

2. Remove the in-stage compute version / artifact-version / channel
   steps from the assemble job. prepare_installers reads those vars
   from build_sign_native.BuildNative_linux_x64.outputs (the
   ComputeVars step). Leaving the assemble copies in place was dead
   code and a duplicate `addbuildtag release-version - X` emission
   that could diverge from build_sign_native's if MSBuild evaluation
   ever differed between the two locations.

3. Avoid splatting $_ in the Invoke-RestMethod / Invoke-WebRequest
   catch blocks in download-native-archives.ps1. The ErrorRecord's
   string representation can include the request's Authorization
   header on some failure modes; AzDO scrubs $(System.AccessToken)
   but the script's -AccessToken parameter is also designed for
   non-AzDO use where nothing scrubs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical force-pushed the radical/azdo-pipeline-perf branch from b6d8f9e to 6a0f073 Compare June 6, 2026 16:52
@radical radical marked this pull request as ready for review June 6, 2026 17:03
@radical radical requested a review from adamint June 6, 2026 17:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the internal AzDO microsoft-aspire pipeline to reduce wall-clock time by increasing parallelism (separating managed build, native build/sign, VS Code extension build, assemble/publish, and template tests), while keeping shipping outputs unchanged. It also extracts/extends supporting infra scripts and tests to validate signing behavior and speed up artifact staging.

Changes:

  • Refactors azure-pipelines.yml into parallel stages (build_sign_native, build_extension, build, assemble, template_tests, prepare_installers) and adds artifact handoffs for managed shipping packages and dashboard artifacts.
  • Extracts post-sign npm .tgz.sig validation into eng/scripts/validate-npm-package-signatures.ps1 and adds behavioral tests for it.
  • Replaces serial DownloadPipelineArtifact@2 native-archive downloads with eng/scripts/download-native-archives.ps1 (parallel + zip-slip guard) and adds tests; also consolidates repo-root discovery via tests/Infrastructure.Tests/Shared/RepoRoot.cs.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/Infrastructure.Tests/WorkflowScripts/CreateFailingTestIssueWorkflowTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/WorkflowScripts/AutoRerunTransientCiFailuresTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/Shared/RepoRoot.cs Adds a centralized repo-root locator for Infrastructure.Tests.
tests/Infrastructure.Tests/PowerShellScripts/ValidateNpmPackageSignaturesTests.cs Adds behavioral tests for npm signature-sidecar validation script.
tests/Infrastructure.Tests/PowerShellScripts/DownloadNativeArchivesTests.cs Adds behavioral tests for parallel native-archive downloader (incl. zip-slip guard).
tests/Infrastructure.Tests/PowerShellScripts/StageNativeCliToolPackagesTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/PowerShellScripts/SplitTestProjectsTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/PowerShellScripts/SplitTestMatrixByDepsTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/PowerShellScripts/ExpandTestMatrixGitHubTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/PowerShellScripts/BuildTestMatrixTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/Pipelines/ReleasePublishNugetPipelineTests.cs Updates test expectations to align with extracted signature validation logic.
tests/Infrastructure.Tests/Pipelines/NpmCliPackageTests.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/GenerateTestSummary/GenerateTestSummaryFixture.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/ExtractTestPartitions/ExtractTestPartitionsFixture.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/DownloadFailingJobLogs/DownloadFailingJobLogsFixture.cs Switches repo-root discovery to shared RepoRoot.Path.
tests/Infrastructure.Tests/CreateFailingTestIssue/CreateFailingTestIssueFixture.cs Switches repo-root discovery to shared RepoRoot.Path.
eng/scripts/validate-npm-package-signatures.ps1 New script to validate presence/plausibility of detached PGP .sig sidecars for npm tarballs.
eng/scripts/download-native-archives.ps1 New script to enumerate/download/extract native artifacts in parallel (with zip-slip defense).
eng/Publishing.props Switches VSIX zipping to pwsh (PowerShell 7).
eng/pipelines/templates/npm-cli-install-validation-steps.yml Downloads npm tgz from per-RID native_archives_* instead of assemble outputs.
eng/pipelines/templates/BuildAndTest.yml Adds parameters to optionally run publish/template tests/extension build and gates signature validation.
eng/pipelines/templates/build_sign_native.yml Narrows restores and adds optional ComputeVars job-output producer for installer pipeline variables.
eng/pipelines/templates/build_extension.yml New template: builds/signs/verifies/publishes VS Code extension in its own parallel stage.
eng/pipelines/common-variables.yml Branch-gates restricted variable groups to avoid branch-control blocking on contributor branches.
eng/pipelines/azure-pipelines.yml Primary pipeline restructure to parallel stages + new assemble/publish stage ownership.
eng/pipelines/azure-pipelines-unofficial.yml Keeps monolithic behavior by enabling inline publish + template tests via new parameters.

Comment thread eng/scripts/download-native-archives.ps1
Comment thread eng/scripts/download-native-archives.ps1 Outdated
Comment thread eng/scripts/validate-npm-package-signatures.ps1 Outdated
radical and others added 2 commits June 7, 2026 18:05
Three small fixes surfaced by Copilot's PR review:

* download-native-archives.ps1: replace the runtime
  `Install-Module -Repository PSGallery` fallback for the ThreadJob
  module with a hard `throw` listing the two accepted module names and
  the active `$PSVersionTable.PSVersion`. Both
  `Microsoft.PowerShell.ThreadJob` (PS 7.4+) and `ThreadJob` (PS
  7.0-7.3) ship in-box with PowerShell 7 on every AzDO image we use,
  so missing both signals a broken / locked-down image rather than
  something a runtime install would fix — and on 1ES agents without
  internet egress the install itself would fail with a more confusing
  error.

* download-native-archives.ps1: synopsis previously claimed the script
  "opens the zip in-memory", but the implementation downloads to a
  temp file and opens it via `ZipFile::OpenRead` from disk. Updated
  the wording to match what actually happens so future perf/memory
  investigators aren't misled.

* validate-npm-package-signatures.ps1: tighten the `ShippingDir`
  guard to `Test-Path -LiteralPath -PathType Container`. Pointing
  `-ShippingDir` at a file used to slip past `Test-Path` and fail
  later inside `Get-ChildItem` with a less actionable message; now it
  fails up front with one consistent "not found (or not a directory)"
  error.

Added a `FailsWhenShippingDirectoryIsAFile` test covering the new
non-container rejection path so a future regression of the Test-Path
guard would go red.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical mentioned this pull request Jun 9, 2026
12 tasks

@davidfowl davidfowl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't break the build

@radical radical merged commit eecbdc2 into microsoft:main Jun 9, 2026
622 of 625 checks passed
@radical radical deleted the radical/azdo-pipeline-perf branch June 9, 2026 05:16
@microsoft-github-policy-service microsoft-github-policy-service Bot added this to the 13.5 milestone Jun 9, 2026
@radical

radical commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

PR Testing Report — #17760

Title: perf(ci): cut microsoft-aspire pipeline wall-clock from 121min to ~57min
Head commit: 7fb2856bdd48b744b90ad59e07a87a552462dcfc
Tested: 2026-06-08 (local) + AzDO build 2995557 / GitHub run 27182124477
Change type: GitHub/CI infra-only (AzDO pipeline restructure + 2 new eng/scripts helpers). CLI dogfood / template scenarios intentionally skipped per the github-infra-testing playbook.

Changes analyzed

Category Files Validation
AzDO pipelines (don't run on GitHub PRs) azure-pipelines.yml, azure-pipelines-unofficial.yml, common-variables.yml, BuildAndTest.yml, build_extension.yml, build_sign_native.yml, npm-cli-install-validation-steps.yml, eng/Publishing.props YAML parse + static + real AzDO run
New PowerShell helpers (unit-tested) download-native-archives.ps1, validate-npm-package-signatures.ps1 DownloadNativeArchivesTests, ValidateNpmPackageSignaturesTests + parser lint
Pipeline contract tests ReleasePublishNugetPipelineTests.cs, NpmCliPackageTests.cs ran the classes
Test-infra refactor new Shared/RepoRoot.cs + ~10 fixtures/test classes losing private repo-root helpers compiled + run by PR CI

Trigger analysis

eng/pipelines/** is a CI-skip glob in eng/testing/github-ci-trigger-patterns.txt, but this PR also touches eng/scripts/*.ps1, eng/Publishing.props, and tests/** — none skippable — so ci.yml runs the full build/test on the PR (confirmed: PR run 27182124477 includes the Tests / … jobs). The new script test classes are therefore covered by PR CI, not just locally.

Test results

Local (macOS, pwsh 7.5.4, dotnet local SDK)

Check Result
DownloadNativeArchivesTests + ValidateNpmPackageSignaturesTests + ReleasePublishNugetPipelineTests + NpmCliPackageTests 51 passed / 0 failed (5.5s)
pwsh [Parser]::ParseFile on both new .ps1 OK
YAML parse (7 changed pipeline files) all OK
git diff --check clean

GitHub PR CI — run 27182124477

311 pass, 2 skipping, 0 failed. Includes the full Infrastructure.Tests suite.

AzDO — build 2995557 (20260608.4) at 7fb2856bd

Headline partiallySucceeded, but the timeline has zero failed records → green for practical purposes (the succeededWithIssues marks are the standard SDL / Secure-Supply-Chain warning, zero real errors).

Wall-clock: 55.8 min (03:38:23 → 04:34:11 UTC) — confirms the PR's ~57min target vs the 121min main baseline.

Stage timing confirms the intended parallel graph:

Stage Window (UTC) Note
Build+Sign native packages 03:38–04:08 from start
Build 03:47–04:11 parallel with native (no longer dependsOn it)
Build VS Code Extension 03:43–03:58 parallel, own stage
Prepare Installers 04:09–04:23 starts when native finishes — depends on build_sign_native, not assemble
Assemble + Publish 04:20–04:34 waits for the 3 build stages
Template Tests 04:20–04:33 parallel with Assemble

Prepare Installers and Template Tests both overlap Assemble — exactly the parallelization the PR describes.

Failure-mode scan (github-infra-testing playbook)

Gotcha checked Outcome
Token leak on bearer-auth error paths (#2 Auth) ✅ Safe — error messages built from $_.Exception.Message + status code only, never the full ErrorRecord (download-native-archives.ps1:142–151, 204–209)
Zip-slip on archive extraction ✅ Safe — separator-terminated root prefix avoids /foo vs /foobar collision (:246–250); dedicated unit test
PowerShell portability (#5) ✅ Safe — Start-ThreadJob module resolved by trying both Microsoft.PowerShell.ThreadJob (PS 7.4+) and ThreadJob (PS 7.0–7.3); Publishing.props powershellpwsh switch is consistent with assemble running on Windows pwsh 7.x
Signature validation accumulates both failure categories ✅ Safe — uses ##[error] (not terminating Write-Error) so missing + invalid sidecars both surface in one run, then exit 1; >=64-byte guard precedes the [0..63] slice

Coverage-loss audit (Step I-1b)

The restructure relocates work into parallel stages — confirmed nothing stopped gating:

  • npm signature validation still runs — relocated to the assemble stage (azure-pipelines.yml:589); the build-stage validateNpmPackageSignatures: false is de-duplication (build no longer packs the tgz), not a gap.
  • Template Tests still run — now their own parallel stage (azure-pipelines.yml:373); official pipeline leaves the inline runTemplateTests off to avoid duplication, unofficial pipeline keeps it inline.
  • VS Code extension still built/signed — own build_extension stage.
  • download-native-archives.ps1 wired into assemble (azure-pipelines.yml:530).

Summary

Surface Result
Local unit tests ✅ 51/51
GitHub PR CI ✅ 311 pass / 2 skip / 0 fail
AzDO end-to-end ✅ green (0 failed records), 55.8 min
Failure-mode scan ✅ all clear
Coverage-loss audit ✅ no gating lost

Overall result: ✅ PR VERIFIED

The restructure delivers the claimed ~2x wall-clock improvement (55.8 min vs 121 min) with no failed stages on a real AzDO run, full GitHub CI green, and no loss of coverage. The two new PowerShell helpers are correct, defensively hardened (token-leak + zip-slip), portable across the PS 7.x versions the agents run, and unit-tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-engineering-systems infrastructure helix infra engineering repo stuff

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants