Skip to content

Improve aspire start readiness diagnostics#17141

Merged
danegsta merged 18 commits into
mainfrom
danegsta/aspire-start-readiness
May 18, 2026
Merged

Improve aspire start readiness diagnostics#17141
danegsta merged 18 commits into
mainfrom
danegsta/aspire-start-readiness

Conversation

@danegsta
Copy link
Copy Markdown
Member

@danegsta danegsta commented May 15, 2026

Description

aspire start could detach from the background AppHost process before the AppHost had actually reached a stable startup state. That made early startup failures, such as TypeScript syntax errors in a polyglot AppHost or C# compile errors in a .NET AppHost, hard to diagnose because the parent command could exit before surfacing the child output.

This change makes detached startup wait for AppHost startup readiness when the AppHost supports the new auxiliary backchannel readiness RPC. The child aspire run process notifies readiness only after it has passed the same early startup gate used by foreground aspire run. On failure, the parent shows a curated, bounded startup excerpt in the terminal and replays relevant child AppHost diagnostics into the parent log under DetachedAppHost/... categories so the linked parent log is self-contained without confusing child CLI status noise.

For older AppHosts that do not support the readiness RPC, aspire start uses the V2 resource snapshot RPC as a short compatibility probe before detaching. If the snapshot probe succeeds, the parent detaches immediately; if the child exits before that probe succeeds, the parent reports the failure and shows the collected child output. AppHosts that do not support the V2 probe still use the short stability wait to preserve compatibility.

Sending Ctrl+C while aspire start is waiting for the child AppHost to start will terminate startup.

User-facing output

For a TypeScript guest AppHost syntax error, foreground aspire run remains concise:

apphost.ts(1,15): error TS1109: Expression expected.
❌ An unexpected error occurred: The TypeScript (Node.js) apphost failed.

Detached aspire start now stays attached long enough to report the same relevant compiler output without replaying successful setup noise such as npm install, audit/funding output, or Executing: debug lines:

Starting Aspire AppHost in the background...
❌ Failed to start the AppHost.
ℹ️ Recent AppHost startup output:
apphost.ts(1,15): error TS1109: Expression expected.
❌ AppHost process exited with code 2.
📄 See logs at ...

For a .NET AppHost compile error, detached aspire start includes the recent build output, including compiler diagnostics:

Starting Aspire AppHost in the background...
❌ Failed to start the AppHost.
ℹ️ Recent AppHost startup output:
  Determining projects to restore...
  All projects are up-to-date for restore.
/work/BrokenAppHost/Program.cs(3,41): error CS1002: ; expected [/work/BrokenAppHost/BrokenAppHost.csproj]
Build FAILED.
/work/BrokenAppHost/Program.cs(3,41): error CS1002: ; expected [/work/BrokenAppHost/BrokenAppHost.csproj]
    0 Warning(s)
    1 Error(s)
Time Elapsed 00:00:00.66
❌ AppHost failed to build.
📄 See logs at ...

The parent log also includes the replayed child excerpt with explicit source separation:

[INFO] [AppHostLauncher] Begin detached AppHost startup log excerpt from child process 64142.
[INFO] [DetachedAppHost/AppHost] apphost.ts(1,15): error TS1109: Expression expected.
[FAIL] [DetachedAppHost/GuestAppHostProject] TypeScript (Node.js) apphost exited with code 2
[INFO] [AppHostLauncher] End detached AppHost startup log excerpt. Child log: ...

Testing

  • dotnet test --project tests/Aspire.Cli.Tests/Aspire.Cli.Tests.csproj --no-launch-profile -- --filter-class "*.AppHostLauncherTests" --filter-not-trait "quarantined=true" --filter-not-trait "outerloop=true"
  • dotnet build tests/Aspire.Cli.EndToEnd.Tests/Aspire.Cli.EndToEnd.Tests.csproj --no-restore
  • git diff --check
  • Manual smoke comparison of aspire run and aspire start with TypeScript guest and .NET AppHosts containing syntax errors.

Fixes #16918

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
      • If yes, have you done a threat model and had a security review?
        • Yes
        • No
    • No

Wait for detached AppHost startup readiness before reporting success, surface curated startup output on failure, and replay child AppHost diagnostics into the parent start log under detached categories.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17141

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17141"

danegsta and others added 5 commits May 15, 2026 13:06
Avoid killing detached child AppHost processes immediately after they report startup failure so they can finish flushing their logs. Use the shared process signaling helper for timeout and cancellation cleanup, matching the graceful-then-force shutdown shape used by stop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve conflicts in guest process launch logging and detached child environment tests while preserving main's profiling telemetry changes and this branch's startup readiness diagnostics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stop rewriting detached guest command log lines to shell-style output. Display the original Executing: prefix while keeping the same relevance filtering for startup diagnostics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the detached AppHost replay header in the footer so both boundaries identify the child process and child log file that produced the excerpt.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the detached AppHost replay header concise and report the full child log path only on the footer boundary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danegsta
Copy link
Copy Markdown
Member Author

PR testing results

Tested this PR with the dogfood CLI from PR artifacts.

CLI version verification

  • PR head: fb5c1fe23b65afb2470993f0d2eb5a2d527b4c06
  • Installed CLI: 13.4.0-pr.17141.gfb5c1fe2
  • Result: ✅ installed CLI matches the PR head short SHA

Scenarios

Scenario Result Notes
Dogfood CLI install/version check ✅ Passed Installed and verified PR CLI from artifacts.
TypeScript AppHost syntax failure diagnostics ✅ Passed aspire start stayed attached, exited non-zero, printed Recent AppHost startup output, preserved Executing: ..., included TS1109, and linked the parent log.
Parent log replay shape ✅ Passed Replay used DetachedAppHost/...; header stayed concise; footer included Child log: ....
Successful TypeScript empty AppHost start/stop ✅ Passed aspire start --apphost ... reached success and aspire stop --apphost ... stopped it.

Observed output shape

Terminal failure output included:

❌ Failed to start the AppHost.
ℹ️ Recent AppHost startup output:
Executing: /usr/bin/npm install
...
Executing: /usr/bin/npx --no-install tsc --noEmit -p tsconfig.apphost.json
apphost.ts(8,22): error TS1109: Expression expected.
GuestAppHostProject: TypeScript (Node.js) apphost exited with code 2

Parent log replay included:

[INFO] [AppHostLauncher] Begin detached AppHost startup log excerpt from child process 3911.
[DBUG] [DetachedAppHost/GuestAppHostProject] Executing: /usr/bin/npm install
[INFO] [DetachedAppHost/AppHost] apphost.ts(8,22): error TS1109: Expression expected.
[FAIL] [DetachedAppHost/GuestAppHostProject] TypeScript (Node.js) apphost exited with code 2
[INFO] [AppHostLauncher] End detached AppHost startup log excerpt. Child log: /workspace/.aspire/logs/cli_..._detach-child_....log

Caveat

The default repo container runner on this machine is Linux arm64, but the selected workflow run did not yet have cli-native-archives-linux-arm64. I reran the repo-local container runner as linux/amd64, where the Linux x64 artifact was available, and completed testing there.

@danegsta danegsta marked this pull request as ready for review May 15, 2026 22:10
Copilot AI review requested due to automatic review settings May 15, 2026 22:10
@danegsta danegsta requested a review from davidfowl as a code owner May 15, 2026 22:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Makes aspire start wait for the detached child aspire run process to report readiness (or failure) via a JSON status file before returning, so early startup failures (e.g. TypeScript syntax errors in a polyglot AppHost) surface in the parent terminal and parent log instead of being lost. A new CliLogFormat utility centralizes the file/console log token format so the parent can parse the child's log file and replay relevant lines under DetachedAppHost/... categories.

Changes:

  • New cross-process readiness handshake via ASPIRE_CLI_START_READY_FILE (atomic-rename JSON file) used by RunCommand (child) and AppHostLauncher (parent); child waits a 2 s stability window before declaring success.
  • New Aspire.Cli.Diagnostics.CliLogFormat consolidates level/category tokens and a TryParseFileLogLine parser; callers across CLI now use the shared constants instead of literals.
  • On detached-start failure the parent renders a bounded, filtered tail of the child log to the terminal and replays a richer tail into the parent's file log; ProcessSignaler gains a graceful-then-force-kill helper used on timeout / cancellation.

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/Aspire.Cli/Commands/AppHostLauncher.cs Major rewrite of detached launch wait loop: reads child status file, replays/displays child log tail, force-kills on timeout/cancel; new helpers for status file IO and log-tail parsing.
src/Aspire.Cli/Commands/RunCommand.cs Child side: writes DetachedStartupStatus (ready/failure/exit code), observes early exits within a 2 s window, new JSON-serializable record.
src/Aspire.Cli/Commands/DashboardRunCommand.cs Uses CliLogFormat.Categories.Dashboard constant.
src/Aspire.Cli/Commands/TelemetryCommandHelpers.cs Uses shared FileLevelTokens constants instead of literal strings.
src/Aspire.Cli/Diagnostics/CliLogFormat.cs New shared log format helpers (tokens, categories, parser, detached-category prefix).
src/Aspire.Cli/Diagnostics/FileLoggerProvider.cs Delegates level/category formatting to CliLogFormat.
src/Aspire.Cli/Interaction/SpectreConsoleLoggerProvider.cs Delegates to CliLogFormat helpers.
src/Aspire.Cli/Interaction/ConsoleInteractionService.cs Uses CliLogFormat.Categories.Stdout/Stderr.
src/Aspire.Cli/Projects/DotNetAppHostProject.cs Replaces literal collector categories with CliLogFormat.Categories.
src/Aspire.Cli/Projects/ProcessGuestLauncher.cs Uses shared Executing: prefix constant + Categories.AppHost.
src/Aspire.Cli/Utils/OutputCollector.cs Default category sourced from CliLogFormat.Categories.AppHost.
src/Aspire.Cli/Resources/RunCommandStrings.resx + Designer + xlf×14 Adds RecentAppHostStartupOutput localized string.
src/Shared/KnownConfigNames.cs Adds ASPIRE_CLI_START_READY_FILE.
src/Shared/ProcessSignaler.cs Adds RequestGracefulShutdownThenForceKillAsync with 10 s window.
tests/Aspire.Cli.Tests/Diagnostics/CliLogFormatTests.cs Unit tests for the new log format parser/token mappings.
tests/Aspire.Cli.Tests/Commands/RunCommandTests.cs Adds tests for status-file round-trip, child log tail/replay, and env var propagation.
Files not reviewed (1)
  • src/Aspire.Cli/Resources/RunCommandStrings.Designer.cs: Language not supported

Comment thread src/Aspire.Cli/Commands/AppHostLauncher.cs Outdated
Comment thread src/Aspire.Cli/Resources/RunCommandStrings.Designer.cs Outdated
Comment thread src/Aspire.Cli/Commands/RunCommand.cs Outdated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danegsta danegsta requested a review from karolz-ms as a code owner May 15, 2026 23:31
Comment thread src/Aspire.Cli/Backchannel/AppHostAuxiliaryBackchannel.cs Outdated
@JamesNK
Copy link
Copy Markdown
Member

JamesNK commented May 16, 2026

Add a CLI E2E test that verifies this.

I think it would good to have 4 tests:

  • aspire run + invalid .NET app host
  • aspire run + invalid guest app host (e.g. TypeScript)
  • aspire start + invalid .NET app host
  • aspire start + invalid guest app host (TS)

Should be able to have two core test methods. One for .NET and one for TS. Then have parameters for whether it is run vs start. Verify the experience is the same.

Comment thread src/Aspire.Cli/Commands/AppHostLauncher.cs Outdated
Comment thread src/Aspire.Cli/Commands/AppHostLauncher.cs Outdated
Comment thread tests/Aspire.Cli.Tests/Commands/RunCommandTests.cs Outdated
Comment thread tests/Aspire.Cli.Tests/Commands/RunCommandTests.cs Outdated
Comment thread tests/Aspire.Cli.Tests/Commands/RunCommandTests.cs
@davidfowl
Copy link
Copy Markdown
Contributor

What happens on an older apphost?

danegsta and others added 3 commits May 16, 2026 11:43
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danegsta
Copy link
Copy Markdown
Member Author

What happens on an older apphost?

If the apphost backchannel supports the V2 contracts it'll poll for a command that will only succeed after the apphost has started successfully. For even older apphosts there's a short stabilization delay to make sure the child stays alive after the backchannel starts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danegsta and others added 5 commits May 16, 2026 15:22
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Matched test failure patterns (1 test)
  • Aspire.Cli.EndToEnd.Tests.KubernetesDeployWithGarnetTests.DeployK8sWithGarnet — Unable to access container registry during publish

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

CLI E2E Tests unknown — 90 passed, 0 failed, 1 unknown (commit f6b2a75)

View all recordings
Status Test Recording
AddPackageInteractiveWhileAppHostRunningDetached ▶️ View recording
AddPackageWhileAppHostRunningDetached ▶️ View recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View recording
AgentInitCommand_DefaultSelection_InstallsDefaultSkills ▶️ View recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View recording
AspireAddPackageVersionToDirectoryPackagesProps ▶️ View recording
AspireInitSingleFileAppHostRunsViaDotnetRunAppHost ▶️ View recording
AspireInitWithExistingAppHostDirRecreatesMissingNuGetConfigAndPreservesFiles ▶️ View recording
AspireInitWithSolutionFileGeneratesAppHostThatBuildsAgainstChannelHive ▶️ View recording
AspireUpdateRemovesAppHostPackageVersionFromDirectoryPackagesProps ▶️ View recording
AspireUpdateRemovesOrphanAppHostPackageVersionWhenSdkAlreadyCurrent ▶️ View recording
Banner_DisplayedOnFirstRun ▶️ View recording
Banner_DisplayedWithExplicitFlag ▶️ View recording
Banner_NotDisplayedWithNoLogoFlag ▶️ View recording
CertificatesClean_RemovesCertificates ▶️ View recording
CertificatesTrust_WithNoCert_CreatesAndTrustsCertificate ▶️ View recording
CertificatesTrust_WithUntrustedCert_TrustsCertificate ▶️ View recording
ConfigSetGet_CreatesNestedJsonFormat ▶️ View recording
CreateAndRunAspireStarterProject ▶️ View recording
CreateAndRunAspireStarterProjectWithBundle ▶️ View recording
CreateAndRunEmptyAppHostProject ▶️ View recording
CreateAndRunJavaEmptyAppHostProject ▶️ View recording
CreateAndRunJsReactProject ▶️ View recording
CreateAndRunPythonReactProject ▶️ View recording
CreateAndRunTypeScriptEmptyAppHostProject ▶️ View recording
CreateAndRunTypeScriptStarterProject ▶️ View recording
CreateJavaAppHostWithViteApp ▶️ View recording
CreateTypeScriptAppHostWithViteApp_UsesConfiguredToolchain ▶️ View recording
DashboardRunWithOtelTracesReturnsNoTraces ▶️ View recording
DeployK8sBasicApiService ▶️ View recording
DeployK8sWithExternalHelmChart ▶️ View recording
DeployK8sWithGarnet ▶️ View recording
DeployK8sWithMongoDB ▶️ View recording
DeployK8sWithMySql ▶️ View recording
DeployK8sWithPostgres ▶️ View recording
DeployK8sWithRabbitMQ ▶️ View recording
DeployK8sWithRedis ▶️ View recording
DeployK8sWithSqlServer ▶️ View recording
DeployK8sWithValkey ▶️ View recording
DeployTypeScriptAppToKubernetes ▶️ View recording
DescribeCommandResolvesReplicaNames ▶️ View recording
DescribeCommandShowsRunningResources ▶️ View recording
DetachFormatJsonProducesValidJson ▶️ View recording
DetachFormatJsonProducesValidJsonWhenRestartingExistingInstance ▶️ View recording
DoListStepsShowsPipelineSteps ▶️ View recording
DocsCommand_RendersInteractiveMarkdownFromLocalSource ▶️ View recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View recording
DoctorCommand_TypeScriptAppHostReportsMissingConfiguredToolchain ▶️ View recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View recording
GlobalMigration_HandlesCommentsAndTrailingCommas ▶️ View recording
GlobalMigration_HandlesMalformedLegacyJson ▶️ View recording
GlobalMigration_PreservesAllValueTypes ▶️ View recording
GlobalMigration_SkipsWhenNewConfigExists ▶️ View recording
GlobalSettings_MigratedFromLegacyFormat ▶️ View recording
InitTypeScriptAppHost_AugmentsExistingViteRepoAtRoot ▶️ View recording
InteractiveCSharpInitCreatesExpectedFiles ▶️ View recording
InvalidAppHostPathWithComments_IsHealedOnRun ▶️ View recording
LatestCliCanStartStableChannelAppHost ▶️ View recording
LatestCliCanStartStableChannelTypeScriptAppHost ▶️ View recording
LegacySettingsMigration_AdjustsRelativeAppHostPath ▶️ View recording
LogLevelTrace_ProducesTraceEntriesInCliLogFile ▶️ View recording
LogsCommandShowsResourceLogs ▶️ View recording
OtelLogsReturnsStructuredLogsFromStarterApp ▶️ View recording
OtelLogsReturnsStructuredLogsFromStarterAppIsolated ▶️ View recording
PsCommandListsRunningAppHost ▶️ View recording
PsFormatJsonOutputsOnlyJsonToStdout ▶️ View recording
PublishWithConfigureEnvFileUpdatesEnvOutput ▶️ View recording
PublishWithDockerComposeServiceCallbackSucceeds ▶️ View recording
PublishWithoutOutputPathUsesAppHostDirectoryDefault ▶️ View recording
ResourceCommand_FailedExecution_DisplaysAppHostLogPathAndLogContainsEntries ▶️ View recording
ResourceCommand_FailsWhenInteractionServiceIsRequired ▶️ View recording
ResourceCommand_SetAndDeleteParameterUpdatesDescribeOutput ▶️ View recording
RestoreGeneratesSdkFiles ▶️ View recording
RestoreGeneratesSdkFiles_WithConfiguredToolchain ▶️ View recording
RestoreRefreshesGeneratedSdkAfterAddingIntegration ▶️ View recording
RestoreSupportsConfigOnlyHelperPackageAndCrossPackageTypes ▶️ View recording
RunFromParentDirectory_UsesExistingConfigNearAppHost ▶️ View recording
RunReportsSyntaxErrorsForDotNetAppHost ▶️ View recording
RunReportsSyntaxErrorsForTypeScriptAppHost ▶️ View recording
SecretCrudOnDotNetAppHost ▶️ View recording
SecretCrudOnTypeScriptAppHost ▶️ View recording
StagingChannel_ConfigureAndVerifySettings_ThenSwitchChannels ▶️ View recording
StartAndWaitForTypeScriptSqlServerAppHostWithNativeAssets ▶️ View recording
StartReportsSyntaxErrorsForDotNetAppHost ▶️ View recording
StartReportsSyntaxErrorsForTypeScriptAppHost ▶️ View recording
StopAllAppHostsFromAppHostDirectory ▶️ View recording
StopNonInteractiveSingleAppHost ▶️ View recording
StopWithNoRunningAppHostExitsSuccessfully ▶️ View recording
UnAwaitedChainsCompileWithAutoResolvePromises ▶️ View recording
UpdateProjectChannelToStable_TypeScript_PicksUpStablePackages ▶️ View recording

📹 Recordings uploaded automatically from CI run #26049821187

@danegsta
Copy link
Copy Markdown
Member Author

Verified on Mac, Windows, and Linux (via the CLI E2E tests).

@danegsta danegsta merged commit 3955424 into main May 18, 2026
298 checks passed
@github-actions github-actions Bot added this to the 13.4 milestone May 18, 2026
aspire-repo-bot Bot added a commit to microsoft/aspire.dev that referenced this pull request May 18, 2026
Documents the new startup readiness behavior introduced in microsoft/aspire#17141:
- aspire start now waits for AppHost to reach a stable running state before detaching
- Early startup failures (TypeScript/C# compile errors) are surfaced in the terminal
- Curated startup excerpts filter noise and highlight relevant error messages
- Ctrl+C cancels startup while waiting for readiness

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@aspire-repo-bot
Copy link
Copy Markdown
Contributor

Pull request created: #1002

Generated by PR Documentation Check

@aspire-repo-bot
Copy link
Copy Markdown
Contributor

📝 Documentation has been drafted in microsoft/aspire.dev#1002 targeting release/13.4.

Updated src/frontend/src/content/docs/reference/cli/commands/aspire-start.mdx to document the new startup readiness and failure diagnostics behavior. The description now states that aspire start waits for the AppHost to reach a stable running state before detaching. A new Startup readiness and failure diagnostics subsection was added explaining how early startup failures (TypeScript and C# compile errors) are surfaced in the terminal with curated excerpts, how the parent log captures replayed output under DetachedAppHost/... categories, and that Ctrl+C terminates startup while waiting.

Note

This draft PR needs human review before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface TypeScript compiler diagnostics when TypeScript AppHost preflight fails

4 participants