Skip to content

[C#DK v2 Aspire integration] Expose apphost query API and aspire resource management API from aspire extension#17705

Open
LittleLittleCloud wants to merge 14 commits into
microsoft:mainfrom
LittleLittleCloud:u/xiaoyun/aspireC#DK
Open

[C#DK v2 Aspire integration] Expose apphost query API and aspire resource management API from aspire extension#17705
LittleLittleCloud wants to merge 14 commits into
microsoft:mainfrom
LittleLittleCloud:u/xiaoyun/aspireC#DK

Conversation

@LittleLittleCloud

@LittleLittleCloud LittleLittleCloud commented May 29, 2026

Copy link
Copy Markdown

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
      • If yes, have you done a threat model and had a security review?
        • Yes
        • No
    • No

C#DK integration: https://devdiv.visualstudio.com/DevDiv/_git/vs-green/pullrequest/741646

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17705

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17705"

}

/**
* Performs a one-shot `aspire ps --resources` call and returns the results directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is gone.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with aspire ps --format json then aspire describe --follow --apphost <path2apphost> --format json?

@adamint adamint left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed (as a WIP/draft) by three parallel agents focused on public-API design, CLI integration correctness, and event/dispose lifecycle. There's strong cross-agent convergence on three structural issues that need to be settled before this is non-draft: the API is built on the removed aspire ps --resources flag, the onDidChangeAppHosts contract the JSDoc promises is not what the implementation delivers, and the public methods silently swallow several distinct failure modes (CLI missing, wrong appHostPath, transient CLI error). All of these matter once C# Dev Kit codes against the surface — once shipped, breaking changes get painful fast.

Blockers / High

  1. aspire ps --resources was removed from main in PR #17479fetchAppHostsOnce will silently return [] for every consumer after rebase. This is what @davidfowl's "This API is gone." comment refers to. Verified: upstream/main src/Aspire.Cli/Commands/PsCommand.cs no longer has s_resourcesOption, and AppHostDisplayInfo.Resources has been removed. After rebase the CLI will emit Unrecognized command or argument '--resources' and exit non-zero; fetchAppHostsOnce swallows that into resolve([]) so getRunningAppHosts() returns empty for every caller. Not a flag-rename fix: post-#17479 main fans resources in via _globalDescribeStreams (one aspire describe --follow --apphost <path> per AppHost merged client-side). A correct one-shot needs either (a) aspire ps --format json for the AppHost list + per-AppHost bounded describe, or (b) a new aspire describe --snapshot mode on the CLI side. (Inline comment.)

  2. getRunningAppHosts() silently swallows every failure mode → C# Dev Kit cannot tell aspire-cli-missing from no-apphosts-running. fetchAppHostsOnce (AppHostDataRepository.ts:256–276) funnels non-zero exit, JSON parse failure, AND spawn ENOENT (when the CLI is not installed) all into resolve([]). Note: getAspireCliExecutablePath() does not throw on missing CLI — resolveCliPath returns { cliPath: 'aspire', available: false } and the literal 'aspire' propagates through, so spawn fails with ENOENT and the consumer sees a clean empty array. The polling code (_runPsCommand) correctly surfaces these via _setError; the public API regresses that behavior. C# Dev Kit needs to be able to surface "Install the Aspire CLI" diagnostics, not just render an empty list. (Inline comment.)

  3. onDidChangeAppHosts only fires while the Aspire UI is active — broken contract for any consumer that doesn't open the Aspire panel. The JSDoc on api.ts:28–31 says "Event fired when AppHost or resource state changes." Both data sources backing the event (describe-follow and ps polling) are gated on _dataActive = _panelVisible || _appHostFileOpen (AppHostDataRepository.ts:297–331). For a C# Dev Kit consumer that activates with the Aspire panel hidden and no AppHost file open, the consumer subscribes, the user runs a session, resources move Starting → Running → Failed → Restarting over 60 seconds, and the consumer receives zero events because nothing is polling. Fix is one of: (a) start an API-owned polling source when ≥1 external listener is subscribed (forces _shouldPoll to be true regardless of UI state), or (b) update the JSDoc to say "best-effort; only fires while the Aspire view is active" and tell consumers to poll. Option (a) matches the JSDoc as written. (Inline comment.)

  4. stopResource('foo', '') silently triggers an interactive AppHost picker via the extension RPC channel. Both stopResource and startResource declare appHostPath: string as required, then use if (appHostPath) args.push('--apphost', ...) so empty string falls through. The CLI then calls _connectionResolver.ResolveConnectionAsync(null, ...)PromptForAppHostSelectionAsync when multiple AppHosts are running. Critically, these two methods do NOT pass noExtensionVariables: true (unlike fetchAppHostsOnce), so ASPIRE_EXTENSION_PROMPT_ENABLED=true plus the RPC endpoint/token are set in the child env. The CLI then issues the prompt over the extension's RPC channel → a QuickPick appears to the user out of nowhere, and the promise hangs until the user picks. C# Dev Kit's reasonable test of await api.stopResource('myapi', '') becomes "unexpected UI prompt to the user." Fix: validate appHostPath is non-empty and absolute at the API boundary (reject otherwise), AND pass noExtensionVariables: true to all three CLI invocations so the CLI cannot prompt through the extension at all. (Inline comment.)

  5. stopResource/startResource rejection drops stderr — consumer gets exited with code N with no diagnostic. Neither method sets stdoutCallback or stderrCallback, so the CLI's actual error text ("resource 'foo' not found", "command 'stop' is not available for resource type 'Container'", "AppHost not running") is lost. Combined with overlapping exit codes from CliExitCodes.FailedToFindProject, CliExitCodes.InvalidCommand, and the inner ExecuteResourceCommandAsync, the consumer cannot tell what went wrong. Fix: buffer stderr/stdout and include them in the rejection. (Inline comment.)

Medium

  1. noExtensionVariables: true inconsistency. fetchAppHostsOnce correctly passes it. stopResource/startResource do not, so the CLI inherits ASPIRE_EXTENSION_ENDPOINT, ASPIRE_EXTENSION_TOKEN, ASPIRE_EXTENSION_CERT, ASPIRE_EXTENSION_PROMPT_ENABLED=true, DEBUG_SESSION_PORT, DEBUG_SESSION_TOKEN, DEBUG_SESSION_SERVER_CERTIFICATE. These RPC channels were established for the run-debug flow that owns them; reusing them outside that flow is wrong — see #4 for the user-visible symptom. Pass noExtensionVariables: true for all three CLI invocations.

  2. onDidChangeAppHosts over-fires per NDJSON line and on UI-state transitions → thundering CLI invocations. _handleDescribeLine fires _onDidChangeData for every snapshot (a single resource emits ~10 snapshots in 5 seconds: Building → Starting → progress ticks → Running → probing → Healthy). With 10 resources, that's ~80 fires in 5 seconds. The emitter is also fired from setViewMode, _clearWorkspaceAppHost, _setError, and view-mode toggles — intentionally over-fired because the tree-view re-render is cheap. Each fire causes a programmatic consumer to call getRunningAppHosts(), which spawns a fresh aspire ps process. 80 process spawns + 80 backchannel scans in 5 seconds — measurable CPU and CLI-startup latency while the AppHost itself is launching. Either coalesce/debounce (e.g., 250 ms), or expose a narrower event scoped to actual snapshot deltas (vscode.Event<{ appHostPath: string; changedResourceNames: readonly string[] }> matches VS Code Debug API conventions).

  3. No CancellationToken / AbortSignal / timeout on any method. A hung CLI (slow backchannel scan, wedged resource command) leaves the Promise pending forever. The polling code tracks processes in _psProcesses and dispose() terminates them; promises returned from the public API have no such escape valve, and a consumer cancellation cannot tear down the child. Add an optional cancellationToken?: vscode.CancellationToken per method; terminate the child on cancel and reject with vscode.CancellationError. Add a default ~30 s timeout with a typed timeout error.

  4. dcpServerInfo is newly exported from activate(), leaking the DCP TLS token + certificate to any extension. Previously only rpcServerInfo was exposed. DcpServerConnectionInfo is { address, token, certificate } — any extension can now vscode.extensions.getExtension('aspire-vscode').exports.dcpServerInfo and impersonate DCP against the running session. If C# Dev Kit needs it, scope and document the trust boundary; otherwise drop it.

  5. Spreading ...api into the activation return collides with the existing/future *ServerInfo keys. return { rpcServerInfo, dcpServerInfo, ...api } puts API methods in the same namespace as RPC bootstrap data, prevents semver versioning, and risks accidental property collisions. Prefer the Azure Functions extension's pattern (see extension/src/debugger/languages/azureFunctions.ts:25): getApi(version: string): AspireExtensionApi | undefined, or at minimum nest under api: {...}.

  6. ResourceInfo is a lossy subset — silently drops isInternal, health, exit code, commands. mapResource filters !u.isInternal unconditionally and exposes only { name, url } for endpoints. Consumers doing service discovery cannot opt back in. healthStatus, healthReports, exitCode, commands, stateStyle from ResourceJson are all dropped — exactly the data a "resource management API" consumer needs to render status. Either expose isInternal and let consumers filter, or take an options object (includeInternal?: boolean); surface healthStatus/exitCode if resource management is the value prop.

  7. state: string | null is opaque with no typed value set, no forward-compat story. Consumers will write if (r.state === 'Running'). The CLI/hosting layer adds states over time; every new state silently breaks consumer logic. Export a union type with the (string & {}) escape hatch ('Running' | 'Stopped' | 'Starting' | 'Stopping' | 'Failed' | 'Exited' | 'Finished' | 'Building' | (string & {})) and document that consumers must default-branch unknown values.

  8. No restartResource and no generic executeResourceCommand. The extension already implements aspire-vscode.restartResource and aspire-vscode.executeResourceCommand (see extension/src/extension.ts:115, 117), and PR #17698 lands a custom-commands tree. Shipping start/stop now and adding the rest later is either two partial release cycles or a churning interface. Decide whether executeResourceCommand(resourceName, commandName, args, appHostPath?) is the right primitive and have start/stop call it, or commit to the full triad now.

  9. No per-AppHost filter on getRunningAppHosts(). Every call forks a child process and scans all AppHost backchannels. C# Dev Kit polling a single solution pays this cost on every tick. Add getRunningAppHosts(appHostPath?: string) and pass through as aspire ps --apphost <path>.

  10. Positional args lock out future options without breaking changes. stopResource(resourceName, appHostPath) cannot grow force, timeout, signal, cancellationToken without overloads or a breaking signature change. Once C# Dev Kit codes against this, every option addition is breaking. Use an options bag: stopResource(resourceName: string, options?: { appHostPath?: string; cancellationToken?: vscode.CancellationToken; ... }).

  11. fetchAppHostsOnce desynchronizes from the polling cache → suppresses real change events. It runs an independent aspire ps --resources and never updates _appHosts. The polling change-detection at AppHostDataRepository.ts:668 uses JSON.stringify(parsed) !== JSON.stringify(this._appHosts) — out-of-band readers leave the cached baseline inconsistent, so transient transitions (A: Running → A: Failed → A: Running between the consumer's one-shot call and the next poll) compare equal to cache and fire no event; the consumer was told nothing changed when in fact it oscillated.

  12. No event-ordering guarantee for stop/startResource. Promise resolves when the CLI process exits, but onDidChangeAppHosts only fires on the next polling tick (default 30 s — see _getPollingIntervalMs) or the next describe-stream snapshot. Consumer pattern await api.stopResource(...); const apps = await api.getRunningAppHosts(); may still see Running. Either kick off a fresh fetch after a successful resource command (within a bounded window) or document the pattern.

Low

  1. No lifecycle guard. fetchAppHostsOnce doesn't check _disposed; stopResource/startResource don't track a disposed flag at all. After deactivation, stale handles continue spawning CLI processes. onDidChangeAppHosts is the now-disposed EventEmitter (dataRepository.dispose() calls this._onDidChangeData.dispose()), so post-dispose event(handler) may throw. Add an _disposed flag the API closes over; reject calls with a typed error post-dispose.

  2. No concurrency dedupe → N parallel callers spawn N processes. The polling path coalesces via _fetchInProgress; the public API does not. Cache the in-flight Promise.

  3. Promise rejection semantics not documented. "Resolves when the CLI command completes" doesn't say what happens when the resource doesn't exist, is already in the target state, the apphost isn't running, the CLI is not installed, or two simultaneous stopResource calls race. C# Dev Kit will discover these empirically. Spell them out per method now before consumers code against undocumented behavior.


Verified safe (not flagged)

  • Argument injection via resourceName: spawnCliProcess uses child_process.spawn(command, args, { shell: false }) (extension/src/debugger/languages/cli.ts:29–33) — shell metacharacters not interpreted. A resourceName starting with - is rejected by ResourceCommand's IsOptionLikeToken validator (ResourceCommand.cs:408–411) with a clear error.
  • appHostPath containing spaces/quotes: Safe — passed as a single args element with shell: false, no shell-quoting needed.
  • aspire resource <name> start|stop --apphost <path> syntax: Verified to exist in both the PR branch and upstream/main (src/Aspire.Cli/Commands/ResourceCommand.cs; start/stop are in s_wellKnownCommands). Unlike aspire ps --resources, this command will not break after rebase.
  • JSON.parse(stdout) stdout/stderr mixing: PsCommand uses _interactionService.DisplayRawText(json, ConsoleOutput.Standard) for the JSON payload and routes progress through ShowStatusAsync (suppressed in non-interactive contexts). With noExtensionVariables: true, no RPC backchannel is established and Spectre's spinner is suppressed. Stdout should remain clean JSON.
  • mapResource correctness: appHostPath propagated from parent (not re-extracted), projectPath extracted via properties?.['project.path'] matches the canonical KnownProperties.Path (src/Shared/Model/KnownProperties.cs:56). Optional-chain plus nullish-coalescing handles all of properties === null, properties === undefined, and missing/null value.
  • resolved guard in fetchAppHostsOnce (lines 250–275) correctly prevents double-resolution if exitCallback and errorCallback both fire.
  • onDidChangeAppHosts: dataRepository.onDidChangeData is the public surface of a vscode.EventEmitter, so consumer-side .dispose() of the subscription works correctly. The lifecycle issue is in #3/#7/#18, not in event-disposal semantics.

Comment thread extension/src/views/AppHostDataRepository.ts Outdated
Comment thread extension/src/api.ts Outdated
Comment thread extension/src/api.ts Outdated
Comment thread extension/src/api.ts Outdated
Comment thread extension/src/api.ts Outdated
adamint and others added 10 commits June 1, 2026 17:09
# Conflicts:
#	extension/src/extension.ts
Enable VS Code test hosts to acquire a scoped Aspire DCP lease for test-launched AppHosts. Lease-backed requests use per-session tokens, scoped DCP prefixes, awaitable cleanup, and DCP notification tracking without requiring a parent Aspire debug session.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Run lease notification cleanup again after stopping owned debug sessions so async termination notifications cannot remain queued after the lease token is revoked.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@LittleLittleCloud LittleLittleCloud marked this pull request as ready for review June 3, 2026 17:18
Copilot AI review requested due to automatic review settings June 3, 2026 17:18
@LittleLittleCloud LittleLittleCloud changed the title [WIP][C#DK v2 Aspire integration] Expose apphost query API and aspire resource management API from aspire extension [C#DK v2 Aspire integration] Expose apphost query API and aspire resource management API from aspire extension Jun 3, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a public Aspire extension API and test-run DCP session leasing, plus “one-shot” CLI queries to fetch running AppHosts/resources outside of the normal polling/view lifecycle.

Changes:

  • Introduces a new exported API (createAspireExtensionApi) with AppHost/resource operations and test-run session acquire/release.
  • Adds one-shot CLI execution helpers (with new CLI error types) and fetchAppHostsOnce() in AppHostDataRepository.
  • Extends the DCP server to support lease-based authorization and lifecycle management for test-run sessions, with accompanying unit/e2e tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
extension/src/views/AppHostDataRepository.ts Adds one-shot CLI JSON runner, one-shot resource describe, and new CLI error types.
extension/src/api.ts Introduces the public extension API surface for other extensions (and tests).
extension/src/extension.ts Switches exported API to the new public API and includes RPC server info in exports.
extension/src/dcp/TestRunSessionManager.ts Implements lease acquisition, expiration, and request authorization for test runs.
extension/src/dcp/AspireDcpServer.ts Adds lease-based auth path, run cleanup for leases, and debug-session creation without an Aspire debug session.
extension/src/debugger/languages/dotnet.ts Makes debug session optional and changes logging fallback behavior.
extension/src/dcp/types.ts Makes debug session optional and allows async stopSession().
extension/src/test/appHostDataRepository.test.ts Adds tests for one-shot fetching and CLI failure/timeout behavior.
extension/src/test/api.test.ts Adds tests for the new exported API behaviors (resource commands and event exposure).
extension/src/test/testRunSessionManager.test.ts Adds unit tests for lease acquisition/release/expiry and authorization behavior.
extension/src/test/rpc/e2eServerTests.test.ts Adds e2e test verifying exported test-run session API availability.
extension/src/test/aspireDcpServer.test.ts Adds end-to-end-ish server tests for lease-backed run sessions and websocket behavior.

Comment thread extension/src/extension.ts Outdated
Comment thread extension/src/debugger/languages/dotnet.ts
Comment thread extension/src/api.ts Outdated
Comment thread extension/src/views/AppHostDataRepository.ts
Comment thread extension/src/views/AppHostDataRepository.ts
Comment thread extension/src/dcp/AspireDcpServer.ts
Comment thread extension/src/dcp/AspireDcpServer.ts
LittleLittleCloud and others added 3 commits June 3, 2026 10:40
- Updated activate() to merge the E2E test API (createExtensionApi) with
  the public API (createAspireExtensionApi) so apiVersion, dcpServerInfo,
  logDirectory, __testOnlyRpcServerInfo, and other E2E fields are exposed.
- Updated e2eServerTests assertion to check sensitive fields aren't leaked
  rather than asserting dcpServerInfo doesn't exist (which contradicted
  extensionApi.test.ts).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nConnections

- Wrap stopSession() calls in try/finally so runsBySession and
  testRunSessionLeaseIdByRunId are always cleaned up even if
  stopSession() throws.
- Remove the early closeLeaseNotificationConnections call in
  releaseTestRunSession, keeping only the one after stop/cleanup
  completes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@adamint adamint self-assigned this Jun 4, 2026

@adamint adamint left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-confidence finding from review:

stopRunSession now uses try/finally, which guarantees the maps are cleared, but the loop still awaits each debugSession.stopSession() sequentially without isolating failures. If one session rejects, the loop exits, the finally deletes the run bookkeeping, and the remaining debug sessions in the array are never stopped. Please catch/log per session, or collect failures after attempting all stops, so one bad debug session cannot orphan the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants