Skip to content

Add Aspire CLI npm package release integration#17297

Open
adamint wants to merge 34 commits into
microsoft:mainfrom
adamint:dev/adamint/npm-cli-package-followups
Open

Add Aspire CLI npm package release integration#17297
adamint wants to merge 34 commits into
microsoft:mainfrom
adamint:dev/adamint/npm-cli-package-followups

Conversation

@adamint
Copy link
Copy Markdown
Member

@adamint adamint commented May 20, 2026

Description

This completes the production path for distributing the Aspire CLI through npm while keeping publishing under the existing Microsoft release process.

The PR now:

  • Defines the npm package layout: @microsoft/aspire-cli as the pointer package plus RID-specific packages such as @microsoft/aspire-cli-linux-x64, @microsoft/aspire-cli-win-x64, and @microsoft/aspire-cli-osx-arm64.
  • Packages npm tarballs from the same signed native CLI archive payloads used by the existing native CLI artifacts, and verifies RID package binaries byte-for-byte against the native archive before upload.
  • Adds npm install/update detection so aspire update can point npm-installed users at npm update -g @microsoft/aspire-cli.
  • Requires npm tarballs in the native CLI CI staging path and publishes them as flat shipping blob artifacts for release consumption.
  • Moves npm publishing into the Azure DevOps release pipeline using MicroBuild/ESRP: RID packages are submitted first, the pipeline waits for registry propagation, then the pointer package is submitted.
  • Documents package layout, version/update behavior, publishing flow, recovery flags, and product/security tradeoffs in docs/specs/npm-cli-package.md and docs/release-process.md.

Security considerations

The .tgz container is not separately signed. Integrity relies on the signed native CLI payloads where platform signing applies, CI verification that npm package binaries match the signed native archives, SBOM-covered release artifacts, ESRP/MicroBuild submission controls, and npm registry integrity. Publishing does not use repository-scoped npm tokens or GitHub Actions Trusted Publisher; release operators must provide ESRP owners/approvers through the release pipeline parameters.

Fixes #17045

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
      • If yes, have you done a threat model and had a security review?
        • Yes
        • No
    • No
  • Does the change require an update in our Aspire docs?

Validation:

  • ./restore.sh
  • YAML parse of eng/pipelines/release-publish-nuget.yml, eng/pipelines/azure-pipelines.yml, and eng/pipelines/azure-pipelines-unofficial.yml
  • XML parse of eng/Publishing.props
  • dotnet test --project tests/Infrastructure.Tests/Infrastructure.Tests.csproj --no-launch-profile -- --filter-class "*.StageNativeCliToolPackagesTests" --filter-not-trait "quarantined=true" --filter-not-trait "outerloop=true" (12 passed)
  • git diff --check

davidfowl and others added 7 commits May 7, 2026 20:28
Create pointer and RID-specific npm packages from the native CLI archives and wire npm packaging, verification, and staging into the existing native CLI package build.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a design spec for the npm package POC and targeted comments explaining the launcher cache, generated package map, npm metadata, and package verification assumptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the npm and native package examples that shaped the Aspire CLI npm package POC, including optional platform packages, libc-specific packages, and writable-cache guidance.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create .github/workflows/publish-npm.yml for manual npm package publishing
- Support workflow_dispatch with inputs for version, run_id, dist_tag, pr_number, only_rid, skip_meta, and dry_run
- Enable npm provenance via id-token: write permission for Trusted Publisher OIDC
- Require admin/maintain permission for non-dry-run publishes
- Download artifacts from specified GitHub Actions run_id
- Publish RID packages before meta package with fail-fast: false matrix
- Wait for RID packages to propagate on npm before publishing meta package
- Support fallback to NPM_TOKEN secret until Trusted Publisher OIDC is configured
- Include recovery options via only_rid and skip_meta inputs
- Generate workflow summary with job status and next steps
- Update docs/specs/npm-cli-package.md with publishing section and prerequisites

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion

The MISSING_PACKAGES counter was incremented inside a pipeline subshell
and never propagated to the parent shell, causing the verification check
to always see 0 and never fail on missing tarballs.

Changed from pipeline (echo | jq | while) to process substitution
(while < <(echo | jq)) so the while loop runs in the main shell and
variable updates are visible to the later check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When dry_run=true, wait-for-packages is skipped, which previously
allowed publish-meta-package to start without waiting for
publish-rid-packages to complete. This violated the spec requirement
that RID packages must be published/validated before the meta package.

Changes:
- Add publish-rid-packages to publish-meta-package job needs
- Update if condition to require publish-rid-packages.result == 'success'
- Preserve existing behavior: wait-for-packages can be skipped (dry run)
  but only after RID packages complete successfully

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eout

Address code review findings:

1. Exact version matching for tarballs:
   - Replace broad patterns (microsoft-aspire-cli-$RID-*.tgz) with exact
     version patterns (microsoft-aspire-cli-$RID-$VERSION.tgz)
   - Apply to download verification, publish-rid-packages, and
     publish-meta-package steps
   - Improve error messages to show exact expected filename when missing

2. Configurable propagation timeout:
   - Add propagation_timeout_seconds workflow input (default 900s = 15min)
   - Validate input is positive integer in validate job
   - Compute MAX_ATTEMPTS from timeout/sleep interval, rounding up
   - Include timeout in workflow summary and parameter output
   - Preserve bounded polling and dry-run skip behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adamint and others added 3 commits May 28, 2026 15:08
Publishing the @microsoft/aspire-cli scoped npm packages now happens via the
AzDO release pipeline (eng/pipelines/release-publish-nuget.yml) using the
MicroBuild ESRP publish template, instead of the previous GitHub Actions
publish-npm.yml workflow. The release pipeline extends from
1ES.Official.Publish.yml@MicroBuildTemplate so it has access to the
DevDivEsrpAzDoSrvConn service connection.

Per-platform artifacts and the pointer package are produced and verified
during the source build (azure-pipelines.yml + build_sign_native.yml), then
flat-shipped via BlobArtifacts (eng/Publishing.props). The release pipeline
splits them into RID and pointer pipeline-artifact folders, attaches SBOMs,
and submits two MicroBuild.Publish.yml invocations (RID packages first, then
pointer) with a configurable propagation delay so the pointer never resolves
to a missing optional dependency.

Also fixed a cross-platform path bug in eng/clipack/Common.projitems where
$(RepoRoot)eng\\scripts\\pack-cli-npm-package.ps1 mixed separators in a
way that breaks pwsh resolution on Linux/macOS, and added the actions/setup-node
step missing from .github/workflows/build-cli-native-archives.yml now that
PackDotnetTool depends on PackNpmPackage (which calls npm pack).

Note: the npm contentType for MicroBuild.Publish.yml is documented in the
ESRP onboarding doc but the task itself is not yet shipped for npm. The
release pipeline is wired against the documented parameter shape so it can
go live the moment the task is available; today's release flow can run with
SkipNpmPublish=true.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Brings the npm package path to parity with the brew cask / winget manifest
flows:

* Add Aspire.Cli.Utils.NpmInstallDetection so 'aspire update --self' and
  update notifications detect global npm installs via the
  ASPIRE_NPM_PACKAGE/_VERSION/_RID env vars the launcher already sets, and
  print 'npm install -g @microsoft/aspire-cli@latest' instead of running the
  GitHub-binary downloader against npm-owned files. Wired into UpdateCommand
  (--self path and post-project-update prompt) and CliUpdateNotifier.

* Tighten launcher cache freshness in eng/clipack/npm/aspire.js: compare
  both size and mtime so a stale cache from a prior same-version install
  cannot shadow a freshly extracted native binary.

* Add eng/pipelines/templates/prepare-npm-cli-packages.yml that runs a real
  'npm install -g <rid>.tgz && npm install -g --omit=optional <pointer>.tgz'
  against the just-built tarballs on a scratch npm prefix, asserts
  'aspire --version' matches the build version, verifies the launcher's
  cache layout, uninstalls, and emits validation-summary.json. Wired into
  the Prepare Installers stage in azure-pipelines.yml alongside Homebrew
  and WinGet.

* Gate release-publish-nuget.yml on the validation summary before invoking
  MicroBuild.Publish for npm. Download the summary from the source build,
  re-publish with SBOM in stage 1, then refuse to submit unless
  validatedByPreparePipeline is true and every required check passed.

* Fix a pre-existing here-string parse bug in the 'Prepare npm Artifacts
  for Publishing' step. PowerShell requires the closing terminator at
  column 0, but YAML block scalars require every line to stay at or above
  the block indent. Compose the error message from an array joined with
  [Environment]::NewLine instead.

* Drop POC framing from docs/specs/npm-cli-package.md and document the
  Sigstore-provenance tradeoff honestly (ESRP's npm publish path does not
  currently emit Sigstore attestations; integrity is anchored at the
  signed binary and the Microsoft1ES maintainer identity).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two adversarial-review findings from local repro of the install-test:

1) prepare-npm-cli-packages: initialize NpmCheck* task variables to
   'failed' at the start of the install-test step. If a subsequent
   bash command exits via 'set -e' before the matching pass marker
   runs, the summary JSON now records 'failed' instead of an
   unexpanded '$(NpmCheckXxx)' AzDO token. The release-side gate
   already rejects anything that is not 'passed', but 'failed' is a
   much higher-signal diagnostic.

2) prepare-npm-cli-packages: tighten 'aspire --version' parsing. Use a
   semver-shaped regex against the full output and print the raw
   output for diagnostics, instead of blindly trusting 'tail -n 1'.
   System.CommandLine's VersionOption normally just prints the
   version and exits, but defending against a stray warn/info line
   makes failures self-explanatory.

3) release-publish-nuget: defense-in-depth — the gate now explicitly
   rejects status values that still look like an unexpanded
   '$(SomeVar)' token, in addition to the '!= passed' check. This
   catches future schema drift in the prepare template too.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 28, 2026

@copilot review

@adamint adamint marked this pull request as ready for review May 28, 2026 21:32
@adamint adamint marked this pull request as draft May 28, 2026 21:32
@adamint adamint changed the base branch from davidfowl/npm-cli-package to main May 28, 2026 21:33
…ackage-followups

# Conflicts:
#	docs/release-process.md
#	eng/pipelines/release-publish-nuget.yml
#	src/Aspire.Cli/Commands/UpdateCommand.cs
#	src/Aspire.Cli/Resources/UpdateCommandStrings.Designer.cs
#	src/Aspire.Cli/Resources/UpdateCommandStrings.resx
#	src/Aspire.Cli/Utils/CliUpdateNotifier.cs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17297

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17297"

Comment thread .github/workflows/build-cli-native-archives.yml Outdated
adamint and others added 3 commits May 28, 2026 17:49
…i-package-followups

# Conflicts:
#	docs/release-process.md
#	eng/pipelines/azure-pipelines.yml
#	eng/pipelines/release-publish-nuget.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Block prerelease npm publishes until non-latest dist-tags are supported, validate npm install summaries across Windows/Linux/macOS, and make launcher cache replacement avoid deleting the previous executable before rename.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use Node 22 for native CLI npm packaging and install-validation CI paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint adamint changed the title Add npm publishing workflow for Aspire CLI packages Add Aspire CLI npm package release integration May 29, 2026
@adamint adamint marked this pull request as ready for review May 29, 2026 03:43
Copilot AI review requested due to automatic review settings May 29, 2026 03:43
adamint and others added 5 commits May 29, 2026 13:21
Fail npm publish preflight validation before any irreversible NuGet publishing step so release runs cannot partially publish NuGet packages and then fail on npm ESRP/prerelease prerequisites.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Release dry-runs do not upload GitHub release assets, so LiveRelease Homebrew validation cannot pass for a new version without mutating the release. Switch Homebrew validation to LiveArchives when DryRun=true while preserving LiveRelease for non-dry releases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ESRP publish-template and owner invariant coverage, live npm registry smoke validation before channel promotion, a pointer-publish skip for safe reruns, and npm-installed CLI update tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Applies the dedicated Corepack pinning and Yarn cache seeding diff from microsoft#17630 instead of keeping a local source-build workaround on this branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint adamint force-pushed the dev/adamint/npm-cli-package-followups branch from 23f527f to 0d27405 Compare May 29, 2026 21:22
Copy link
Copy Markdown
Member Author

@adamint adamint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR review while validating internal CI / release dry-run end-to-end. 3 concrete issues to address:

  1. eng/clipack/npm/aspire.js — launcher comment vs. code mismatch; concurrent first-runs can fail on Windows.
  2. eng/pipelines/release-publish-nuget.yml — registry smoke-test finally block can mask a successful run on Windows.
  3. eng/clipack/npm/aspire.js — unbounded per-version cache growth; no eviction or documented cleanup.

No style / nit comments. Internal CI (build 2987420) and release pipeline 1600 dry-run still in progress — I'll comment separately with the dry-run report.

Comment thread eng/clipack/npm/aspire.js
Comment thread eng/pipelines/release-publish-nuget.yml
Comment thread eng/clipack/npm/aspire.js
- aspire.js: when the atomic rename of a freshly copied native binary
  fails (e.g. concurrent first-runs racing on Windows where the cached
  executable is already loaded), check whether the existing target is
  already a valid copy of the source via needsCopy(). If it is, the
  other process won the race and our tmp file can be discarded without
  failing the launcher. Only unexpected errors propagate.

- release-publish-nuget.yml: wrap the post-smoke Remove-Item cleanup in
  the finally block in a try/catch with -ErrorAction Stop and a Write-
  Warning fallback so a transient Windows file-handle lock during temp-
  directory teardown does not mask a successful npm registry smoke test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 29, 2026

Addressed the adversarial review findings from review #4392976259 in b6350b4:

# Finding Fix
1 aspire.js rename-failure path always rethrew, so concurrent first-runs on Windows could fail spuriously even when the existing cached binary was already valid. Catch now re-checks needsCopy(sourcePath, targetPath); if the target is already a valid copy, the tmp file is discarded and targetPath is returned. Only unexpected errors propagate.
2 Release pipeline finally { Remove-Item -Recurse -Force } could rethrow on Windows (file lock during cleanup) and mask a successful aspire --version smoke. finally block now uses Remove-Item -ErrorAction Stop inside a try/catch that downgrades cleanup failures to Write-Warning.
3 Per-version cache grows without bound. Acknowledged as follow-up — left as-is for this PR; freshness check already prevents stale binaries within a version, multi-version retention policy can ship separately.

Pushed to dnceng/internal and source-built; CI is running off the same commit.

@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 29, 2026

GitHub Actions failures are pre-existing main breakage, not from this PR

The 4 GitHub Actions failures in run 26663931880 (Tests / Setup for tests, Tests / Final Test Results, Stabilization Check, Final Results) all root-cause to the same compile error, and none are caused by changes in this PR:

tests/Aspire.Cli.EndToEnd.Tests/TypeScriptEmptyAppHostTemplateTests.cs(79,24): error CS1061:
  'Hex1bTerminalAutomator' does not contain a definition for 'RunCommandFailFastAsync'

Root cause on main

main HEAD (6436994f5f) is broken for the CLI E2E test project. The breakage is masked on push-to-main by .github/workflows/tests.yml:31:

buildArgs: '/p:IncludeTemplateTests=true /p:IncludeCliE2ETests=${{ github.event_name == ''pull_request'' }}'

IncludeCliE2ETests=false on push skips the project. On pull_request events it builds → compile fails → all downstream Tests jobs collapse. Every open PR sees the same failure right now.

Fixes already in flight (not from this PR)

Once either merges to main, PRs (including this one) will recover automatically; nothing in this branch needs to change. Merging main into this branch would only inherit the same broken state, so I'm deliberately not doing that. Internal CI (AzDO def 1602) is unaffected and currently green / in-progress on this same commit (b6350b465).

The macOS AzDO runner executes Bash@3 tasks with /bin/bash which is
still Bash 3.2 on every shipping macOS release. The 'Locate pointer
and RID tarballs' step in 'npm install validation (macOS native RID)'
was failing with:

  /Users/runner/work/_temp/<id>.sh: line 10: shopt: globstar: invalid shell option name
  Bash exited with code '1'.

Two constructs in the script require Bash 4+:
  - 'shopt -s globstar' (not in 3.2; we never used '**' anyway since
    we enumerate with 'find')
  - 'mapfile -t' (not in 3.2)

Replace both with a portable 'find | while IFS= read' loop that works
on Bash 3.2 and survives filenames containing spaces. Verified locally
under GNU bash 3.2.57.

Add a regression test in Infrastructure.Tests that fails the build if
'shopt -s globstar', 'mapfile', 'readarray', or 'declare -A' are
reintroduced into the npm install validation template.

Dry-run AzDO build 2987449 (Build def 1602) caught this on the macOS
'Prepare Installers' phase; linux-x64 and win-x64 are unaffected
because their bash is already 4+.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 29, 2026

Dry-run progress update

Pushed 727fe3ca9d — a Bash 3.2 portability fix for the npm install validation template.

Why: dry-run build 2987449 progressed through native build ✅, Build (Windows extension/sign/publish) ✅, Publish Assets ✅, and then failed in Prepare Installers → npm install validation (macOS native RID) at the 🟣Locate pointer and RID tarballs step with:

/Users/runner/work/_temp/<id>.sh: line 10: shopt: globstar: invalid shell option name
Bash exited with code '1'.

Root cause: macOS still ships /bin/bash 3.2 (last GPLv2 version), and the AzDO Bash@3 task uses /bin/bash. Two constructs in eng/pipelines/templates/prepare-npm-cli-packages.yml required Bash 4+:

  • shopt -s globstar
  • mapfile -t

Fix: replaced both with a portable find … | while IFS= read loop. Linux and Windows runners (Git Bash 5) were unaffected — the failure was platform-specific to macOS. Verified locally against GNU bash 3.2.57.

Added regression test in Infrastructure.Tests that fails the build if any of shopt -s globstar, mapfile, readarray, or declare -A are reintroduced into this template.

Queued dry-run build 2987514 off 727fe3ca9d to verify the fix.

GitHub Actions failures on this PR are unrelated to these changes — see the earlier comment about main being broken on PR events (open fix PRs #17701 and #17702).

adamint and others added 2 commits May 29, 2026 19:59
Pulls in microsoft#17701 (Fix TypeScript deadlock repro E2E test) so PR
GitHub Actions can complete. Main has been broken for PR events
since microsoft#17575 added a test calling RunCommandFailFastAsync which
microsoft#17588 renamed to RunCommandAsync. microsoft#17701 updates the call sites.

No infra changes from this branch are affected by this merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Dry-run AzDO build 2987514 progressed past the macOS Bash 3.2 fix and
then failed the install/verify/uninstall smoke on macOS with:

  Raw output: 13.5.0-preview.1.26279.34+727fe3ca9dcecbcc6d10d8b4373ae6f5779b25b4
  Reported version:
  ##[error]aspire --version reported '' but expected '13.5.0-preview.1.26279.34'

The CLI's --version prints the InformationalVersion which is full SemVer 2.0
(MAJOR.MINOR.PATCH-PRE+BUILDMETA where BUILDMETA is the source commit SHA).
The previous regex required the line to end at the pre-release segment so
the +<sha> suffix made the whole match fail and actualVersion became empty.

Extend the regex to optionally accept '+<buildmeta>', then strip it with the
portable POSIX expansion '${var%+*}' before comparing to the npm package
version (npm SemVer intentionally ignores build metadata for equality per
https://semver.org/#spec-item-10).

The PowerShell post-publish smoke in release-publish-nuget.yml already
handles this case (line ~1298, regex ends with '(\\+.*)?$') so no change
needed there.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 30, 2026

Dry-run progress update (2)

Pushed 91de8c9d20 — SemVer 2.0 build-metadata acceptance fix in npm install validation.

Why: dry-run build 2987514 progressed past the Bash 3.2 fix and got further (Build/Sign ✅, Build/Windows ✅, Publish Assets ✅, Homebrew Cask ✅, native build ✅) before failing in macOS 🟣Install, verify, and uninstall @microsoft/aspire-cli with:

Raw output: 13.5.0-preview.1.26279.34+727fe3ca9dcecbcc6d10d8b4373ae6f5779b25b4
Reported version:
##[error]aspire --version reported '' but expected '13.5.0-preview.1.26279.34'

Root cause: The CLI's --version prints the InformationalVersion, which is full SemVer 2.0 (MAJOR.MINOR.PATCH-PRE+BUILDMETA where BUILDMETA is the source commit SHA). My previous regex required the line to end at the pre-release segment so the +<sha> suffix made the whole match fail and the extracted version was empty.

Fix: extend the regex to optionally accept +<buildmeta>, then strip it with ${var%+*} (POSIX, Bash 3.2 compatible) before comparing to the npm package version. The PowerShell post-publish smoke in release-publish-nuget.yml already handles this case at line ~1298.

Linux x64 and Windows x64 npm install validation phases were in progress and would have hit the same regex bug; I cancelled the build to save runner time.

Queued dry-run build 2987581 off 91de8c9d20.

GitHub Actions: merged in upstream main (PR #17701 fixed the E2E test breakage that was failing the PR build).

…ol hang

The pointer package declares every supported RID as an optionalDependency
pinned to the just-built version. Even with --omit=optional, npm still
resolves optional dep metadata from the registry while building the
dependency tree, and in network-isolated 1ES Linux/Windows pools each of
the 7 lookups burns the full fetch-timeout. Dry-run build 2987581 hit a
9-minute hang on the pointer install step for that reason while macOS
(unrestricted egress, fast 404) completed in 3 seconds.

Pair --omit=optional with --offline so npm never touches the network for
this validation: optional deps are skipped without a resolution attempt
and the local tarball installs straight from disk. A short
--fetch-timeout=15000 is set as belt-and-suspenders. NPM_CONFIG_CACHE
already points at a fresh empty directory so --offline cannot reuse a
poisoned cache. Local verification against a synthetic 8-dep pointer
shows installation completes in 121ms with optional deps marked UNMET
(not fetched, not installed); uninstall completes in 88ms.

Apply the same args to the uninstall step so an audit/funding call
cannot hang the cleanup the way it almost hung the install.

Add a PrepareNpmCliPackagesScriptInstallsOfflineWithTimeout regression
test to prevent a refactor from quietly removing the flags.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 30, 2026

Status update on the npm install validation hang:

Diagnosis: npm install -g --omit=optional <pointer>.tgz still resolves optional-dep metadata from the npm registry while building the dep tree. The pointer declares all 7 supported RIDs as optionalDependencies pinned to the just-built version, which doesn't exist in the public registry during validation. On the 1ES macOS pool that lookup gets a fast 404 (3s total install); on the 1ES Linux/Windows pools the registry call is blackholed by network isolation, so each of the 7 lookups burns the full fetch-timeout — that's the 9-minute hang on Linux/Windows in build 2987581.

Fix (67bc6b3453):

  • Add --offline --fetch-timeout=15000 to both npm install and npm uninstall. With --offline npm never touches the network, optional deps are skipped without resolution attempts, and the local pointer/RID tarballs install straight from disk.
  • NPM_CONFIG_CACHE already points to a fresh empty directory so --offline cannot reuse a poisoned cache.
  • Local verification on a synthetic 8-dep pointer: install in 121ms (vs 9m), uninstall in 88ms, launcher executes correctly.

Regression test: PrepareNpmCliPackagesScriptInstallsOfflineWithTimeout asserts --offline and --fetch-timeout= remain in the script.

Build 2987581 cancelled (Windows still hung on the same step). Fresh dry-run build 2987640 queued at 67bc6b3453.

adamint and others added 5 commits May 30, 2026 00:18
Re-imports the latest Corepack install template + pinned version + Yarn
preparation script from the sibling PR. Required to share the same
Corepack pin between the npm-publishing release pipeline and the
extension build, and to keep Bash 3.2-safe wiring in place.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* loadRidPackageNames() now runs lazily from main() and surfaces a
  friendly 'installation is corrupted ... Reinstall' error rather than
  a raw Node stack trace when the JSON map is missing or malformed.
  Previously the read happened at module top-level, bypassing the
  launcher's outer try/catch.

* Detect Linux musl on arm64 and throw 'Unsupported platform' rather
  than silently falling through to the glibc-linked linux-arm64 binary
  (which crashes at exec with a dynamic-linker error).

* Forward SIGINT, SIGTERM, SIGHUP and SIGQUIT to the child process.
  Previously `kill <wrapper-pid>` orphaned the native CLI process,
  which broke programmatic shutdown of long-running commands like
  'aspire run' that keep an AppHost alive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous expandable here-string (@"..."@) interpreted both the
markdown code-span backticks AND $Rid / $PackageName as PowerShell
escape sequences and interpolations. Result: the shipped README on
npmjs.org rendered as 'Native Aspire CLI binary for $Rid.' with no
backticks visible.

Switch to a non-expanding here-string (@'...'@) plus -replace for
__RID__, __PACKAGE_NAME__ and __RID_PACKAGE_NAME__ placeholders so the
markdown code spans survive verbatim.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rsion

* Add 'Verify npm RID Packages Present Before Pointer Publish' step
  that extracts the pointer tgz, walks its optionalDependencies, and
  runs 'npm view <dep>@<version>' for each. If any RID dep is missing
  on the registry, fail the publish before ESRP submits the pointer.
  This closes a window where SkipNpmRidPublish=true, an operator
  partial run, or an ESRP RID-publish failure could ship a pointer
  package that resolves on install but throws 'native package was not
  installed' on first 'aspire' invocation.

* The preflight reads RIDs from the pointer's own optionalDependencies
  so it does not drift if RIDs are added or removed.

* In the post-publish version smoke, explicitly reject the case where
  'aspire --version' exits 0 with empty stdout. The previous
  @(...) -notmatch sequence produced an empty array on no output,
  which is falsy in PowerShell and let the bad install slip past the
  version regex check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Six new asserts pinning the fixes from the multi-model code review:
* RID preflight is wired before the pointer publish step
* Empty 'aspire --version' stdout is rejected by the post-publish smoke
* Launcher throws on Linux musl arm64
* Launcher forwards SIGINT/SIGTERM/SIGHUP/SIGQUIT to the child process
* Launcher loads the RID package map lazily and emits a friendly
  'installation is corrupted ... Reinstall' error
* pack-cli-npm-package.ps1 uses a literal here-string so markdown
  backticks survive in the published RID README

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 30, 2026

🔁 Dry-run status update — applied multi-model code-review fixes + resynced with #17630 head 0e2136ef.

New commits (5):

  1. 7190d84519 Resync Corepack changes with latest PR Pin Corepack explicitly for the VS Code extension build #17630 (head 0e2136e)
  2. 02d11d80ff Fix npm launcher: lazy package map load, musl arm64, signal forwarding
  3. 2e174cfa70 Fix RID README rendering: use literal here-string for PowerShell
  4. d8966ed117 Verify RID packages on npm before publishing pointer; reject empty version
  5. 70d60261d2 Add regression tests for npm launcher, README rendering, and preflight

Issues found / fixed:

Source Severity Issue Fix
Opus 4.7 HIGH SkipNpmRidPublish=true could ship pointer pkg without its RID deps Added Verify npm RID Packages Present Before Pointer Publish step that walks pointer optionalDependencies and npm views each before submission
Opus 4.7 MEDIUM Alpine arm64 (musl) silently fell through to glibc binary Added if (arch === 'arm64' && musl) throw Unsupported platform
Opus 4.7 MEDIUM aspire --version with empty stdout passed post-publish smoke Added explicit $versionLine.Count -eq 0 guard
Opus 4.7 MEDIUM loadRidPackageNames() ran at module top-level, bypassing top-level catch Made lazy; wrapped read/parse in try/catch surfacing friendly "installation is corrupted"
Opus 4.8 LOW Expandable PowerShell here-string ate backticks AND $Rid interpolation Switched RID README to literal @'...'@ + -replace placeholders
GPT-5.5 MEDIUM Wrapper never forwarded SIGINT/SIGTERM to native child Added process.once registration for SIGINT/SIGTERM/SIGHUP/SIGQUIT

Regression test coverage:

  • PointerPublishPreflightsRidPackagesAreOnRegistry
  • PostPublishSmokeRejectsEmptyAspireVersionOutput
  • LauncherDetectsMuslArm64AndThrowsUnsupported
  • LauncherForwardsTerminatingSignalsToChild
  • LauncherLoadsRidPackageMapInsideErrorHandler
  • PackScriptUsesLiteralHereStringForRidReadme

All 12 ReleasePublishNugetPipelineTests + NpmCliPackageTests pass locally.

Dry-run pipeline status:

  • Internal CI 1602: queued build 2987671 at 70d60261d2 (ETA ~1h50)
  • Release dry-run 1600: will queue on CI success

Cancelled stale build 2987640.

adamint and others added 3 commits May 30, 2026 02:39
Use 'MicroBuild.1ES.Official.Publish.yml@MicroBuildTemplate' (composite)
instead of '1ES.Official.Publish.yml@MicroBuildTemplate' (plain).

Without the 'MicroBuild.' prefix, the MicroBuildAuthorizePublishPlugin
task is auto-injected without credential context, causing a 401 against
devdiv.pkgs.visualstudio.com/_packaging/MicroBuildToolset.

Pattern verified against microsoft/vscode-azuretools, microsoft/pyright,
microsoft/vscode-python-environments, microsoft/vscode-deviceid, and
microsoft/vscode-common-python-lsp release pipelines — all of which use
the composite template for ESRP-based npm publishing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s template

The MicroBuild. prefix is required so the auto-injected
MicroBuildAuthorizePublishPlugin task inherits credential context for the
devdiv MicroBuildToolset feed. Without it, the task fails with a 401 in the
PrepareArtifacts stage before any publishing can begin.

Verified against microsoft/vscode-azuretools, microsoft/pyright, and
microsoft/vscode-python-environments — all use the MicroBuild. prefix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The MicroBuild.1ES.Official.Publish.yml@MicroBuildTemplate extends template auto-injects
two tasks into every job, both of which were failing in release 2987726:

1. MicroBuildAuthorizePublishPlugin@0 (start of every job)
   - Defaulted to fetching its nuget package from devdiv.pkgs.visualstudio.com/_packaging/MicroBuildToolset
   - This pipeline runs in the dnceng collection which does not have devdiv feed credentials
   - Result: HTTP 401 fails the stage before any customer step runs

2. MicroBuildCleanup@1 (end of every job, displayed as 'MicroBuild Telemetry')
   - Hard-requires a pipeline variable literally named TeamName
   - We had _TeamName (Arcade convention) but not TeamName

Fixes (both surfaced from MicroBuildTemplate Jobs/PublishJob.yml + Jobs/Job.yml):

* Add 'TeamName: dotnet-aspire' at pipeline-scope variables so MicroBuildCleanup@1
  succeeds on every job.

* On the only job that actually performs an ESRP publish (ReleaseJob via
  1ES.PublishNuget@1 / MicroBuild.Publish.yml), set
  templateContext.mb.publish.feedSource to the dnceng MicroBuildToolset mirror
  (https://pkgs.dev.azure.com/dnceng/_packaging/MicroBuildToolset/nuget/v3/index.json).
  This is the same pattern dotnet/roslyn uses. The dnceng feed is accessible
  to builds in this collection automatically.

* On every other job (PrepareJob, WinGetJob, DispatchGitHubTasksJob,
  PublishReleaseAssetsJob, HomebrewValidateJob), set
  templateContext.mb.publish.enabled: false. These jobs only download
  artifacts, push to GitHub via app tokens, or run validation; none of them
  publish via ESRP, so the publish-authorize plugin should not run at all.

Adds two regression tests in ReleasePublishNugetPipelineTests so a future edit
that removes TeamName or removes the publish auth overrides fails before the
pipeline is queued again.

Verified against MicroBuild template source pulled from
dev.azure.com/devdiv/MicroBuildTemplates/MicroBuildTemplates (Stages/PublishStage.yml,
Jobs/PublishJob.yml, Jobs/Job.yml).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sanity, Node 20, CRLF strip

Multi-model code review (Aspire arch, Opus 4.8, Opus 4.7) surfaced eight items.
The highest-severity issues fixed here are:

1. CRLF regression in Windows install validation (opus-4.7 — blocker)
   - prepare-npm-cli-packages.yml bash captures 'aspire --version' on Win
     runners under Git Bash. System.CommandLine 2.x writes CRLF on Windows,
     bash $() strips LF but NOT CR, anchored semver regex fails on \r.
   - Fix: pipe through 'tr -d "\r"' before capture. This regressed in
     commit debf4eb when 'tr -d [:space:]' was replaced with 'grep -Eo'.
     Dry run 2987740 did NOT exercise this path because SkipNpmPublish=true
     skips the consumer; Monday's real publish would have failed.

2. Preflight registry pin (opus-4.8)
   - 'npm view $spec version' in pointer preflight relied on the agent's
     ambient registry. 1ES images may have internal mirrors via .npmrc /
     npm_config_registry which (a) could spuriously fail after a successful
     public publish or (b) pass against a stale mirror and ship a broken
     pointer to npmjs.
   - Fix: add explicit '--registry=https://registry.npmjs.org/' pin.

3. Preflight retry loop (opus-4.8)
   - Single-shot preflight could fail closed AFTER 7 RID packages are
     already published, forcing manual SkipNpmRidPublish=true re-run.
   - Fix: wrap in 10x30s retry, matching post-publish smoke.

4. Strict semver-shape filter on 'npm view' output (opus-4.8)
   - 'npm view --loglevel=warn 2>&1' merges deprecation / peer-dep /
     EBADENGINE warnings onto stdout. 'Select-Object -First 1' could latch
     a warning as the version.
   - Fix: filter to lines matching strict semver regex, and switch metadata
     'npm view' to 2>$null.

5. PGP .sig content sanity check (opus-4.8)
   - Earlier validation only checked .tgz.sig EXISTS. If Arcade SignTool
     silently produced an empty/garbage sidecar (signing service hiccup,
     plugin misconfig), the release would publish unverifiable sidecars.
   - Fix: assert each sig >=64 bytes AND contains an OpenPGP marker
     (ASCII-armored '-----BEGIN PGP SIGNATURE-----' OR binary packet
     tag 2 per RFC 9580 — old-format 0x88-0x8B / new-format 0xC2).
     Added to BOTH source build (BuildAndTest.yml) AND release pipeline
     so failures surface at PR-time, not just release-time.

6. Node >=20 minimum (Aspire arch review)
   - aspire.js wrapper uses '{ cause: error }' (Node 16.9+) and rid->arch
     map covers musl-libc selector (npm >=10.7). 'engines.node = >=16'
     was technically permissive enough to install on Node 16.x where
     'cause' is rejected.
   - Fix: bump to '>=20' (covers all Node 16.9+ + libc + ESM-safe
     subprocess + Node 18 EOL was 2025-04-30).

Validation:
- All 21 Infrastructure.Tests pass locally (was 14, added 7 regression tests).
- Dry-run on release pipeline 2987740 already validated MicroBuild paths;
  this batch hardens against issues that would only have surfaced when
  SkipNpmPublish=false on Monday.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamint
Copy link
Copy Markdown
Member Author

adamint commented May 30, 2026

✅ Monday release dry-run: HIGH confidence

Completed comprehensive end-to-end dry-run validation against the internal microsoft-aspire-Release-To-NuGet pipeline (1600) on dnceng — nothing was published.

Validation runs

Build Pipeline Result Purpose
2987671 CI 1602 partiallySucceeded¹ Source-artifact producer (real ESRP-signed .tgz + .sig)
2987740 Release 1600 partiallySucceeded¹ All-Skip-true baseline — proves zero-publish path
2987776 Release 1600 partiallySucceeded¹ Comprehensive: SkipNpmPublish=false + sig sanity + registry reach

¹ Only succeededWithIssues cause: 1ES PT non-blocking "branch validation failed" because dry-runs are off a users/* branch, not main. These warnings will not appear on Monday.

What was exercised end-to-end

  • ✅ Pipeline resource pinning (proper API: POST /_apis/pipelines/1600/runs with resources.pipelines.aspire-build.version)
  • ✅ Download all 4 source-artifact groups (PackageArtifacts + 3 npm-package platforms) + SBOM/CodeSign validation
  • OpenPGP signature content sanity check against real ESRP-signed .sig sidecars (size ≥64 bytes + ASCII-armor or binary OpenPGP marker)
  • npm ping --registry=https://registry.npmjs.org/ registry reachability
  • ✅ MicroBuild Telemetry (TeamName), Network Isolation, dnceng feedSource — all pass
  • ✅ Skip-flag wiring routes to no-op tasks correctly
  • ✅ Zero failed tasks anywhere
  • Zero external publication: no nuget push, no npm publish, no gh release upload, no winget commit, no homebrew dispatch

Hardening landed (from multi-model review — Opus 4.7/4.8, GPT 5.5, Aspire-arch)

  1. engines.node >= "20.0.0" on the pointer package
  2. Explicit --registry=https://registry.npmjs.org/ pin on every npm operation (no agent-config dependency)
  3. Preflight retry loop (10×30s) for npm registry propagation
  4. Strict semver regex on parsed npm view output
  5. PGP .sig content sanity check (in both source pipeline BuildAndTest.yml and release pipeline release-publish-nuget.yml)
  6. Windows CRLF strip on aspire --version capture (defense-in-depth — current Windows runner happens to not need it, but Bash 3.2/PowerShell roundtripping is fragile)
  7. +7 regression tests (16 → 21 total in Infrastructure.Tests, all pass)

Source-build pinning gotcha (carry forward)

The classic POST /_apis/build/builds API silently ignores resources.pipelines.{alias}.version and auto-selects the latest CI build from any branch. Two release runs (2987764, 2987773) failed for this reason. The correct API is POST /_apis/pipelines/{id}/runs, which honors the resource pin. The AzDO UI uses the correct API by default — Monday's queue-from-UI flow will pick the latest main CI build correctly.

Monday checklist

  • Pipeline YAML verified correct (this dry-run)
  • All hardening + tests landed in 4ee6d2ceb1
  • Admin (not testable from pipeline): 1ES signing approval for pipeline 1600, DevDivEsrpAzDoSrvConn grant

Constraint compliance

DryRun=true and SkipNpmPublish=true/SkipNuGetPublish=true (where applicable) were enforced on every queue. Static-audited the YAML: MicroBuild.Publish.yml@MicroBuildTemplate (the actual npm publish call) is gated by and(DryRun=false, SkipNpmPublish=false, IsPrerelease=false) — physically unreachable in any dry-run path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ship Aspire CLI as an npm package

3 participants