Extension: implement VS Code telemetry signals (#17721)#17723
Conversation
…board signals Implements the VS-Code-side asks from microsoft#17721 (CLI signals tracked separately): Foundation - Singleton telemetry reporter accessor + sendTelemetryErrorEvent helper - withCommandTelemetry wrapper records outcome (success/error/canceled) and duration_ms - Common-properties helpers publish apphost_languages + apphost_present once known - Test seams (__setReporterForTests / __resetCommonPropertiesForTests) for unit tests Command coverage (S3) - tryExecuteCommand routed through withCommandTelemetry - registerInstrumentedCommand wraps the bypass paths (runAppHost / debugAppHost editor and tree commands, code lens commands, walkthrough commands, restore, etc.) - Each event carries a source dimension (command_palette / editor / tree / codelens / walkthrough) for downstream cohort analysis Meaningful engagement (S1) - MeaningfulEngagementReporter fires engagement/active at most once per session on the first of: AppHost present in workspace, any command run, any debug session - Publishes apphost_languages summary via summarizeAppHostLanguages classifier Running AppHosts view (S2) - AppHostsViewTelemetry subscribes to tree-view visibility with a 1s debounce - Emits runningAppHostsView/shown with view_mode + running_apphosts / total_resources Debug lifecycle (S4) - debug/runSession/start on PUT /run_session in AspireDcpServer - debug/runSession/end from sessionTerminated (and the launch-failure catch block) with exit_code_bucket + duration_ms - debug/appHost/start emitted from AspireDebugSession on the launch DAP request - debug/appHost/end emitted from AspireDebugSession.dispose() with aggregated stats (total_child_sessions, distinct_resource_type_count + names, ended_with_error, duration_ms). DCP server tracks aggregates via takeDebugSessionAggregateStats. Dashboard passthrough (S5) - Replaces the hardcoded GET /telemetry/enabled stub with the full dashboard telemetry contract (start/startOperation/endOperation/startUserTask/endUserTask/ operation/userTask/fault/asset/property/recurringProperty/commandLineFlags) - Property flattening of AspireTelemetryProperty into TelemetryReporter's flat key/value model, documented encoding rules - Bearer-only auth middleware for routes the dashboard reaches (it does not send the DCP instance-id header) - Per-operation correlation Map with a 1h TTL for abandoned operations - /telemetry/enabled now honors the global VS Code telemetry setting README data-collection notice documents every category of event the extension now emits and confirms we do not report source, paths, env values, or resource names. Tests: - 13 new unit tests for telemetry + appHostLanguage helpers - Full extension test suite passes (635 passing) Refs microsoft#17721 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17723Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17723" |
Fixes from a parallel pass by Opus 4.7, Opus 4.8, and GPT 5.5 reviewers: - AspireDcpServer: extract emitRunSessionFailureEnd helper so the three early-return failure paths in PUT /run_session (unsupported launch config, missing debug session, debugger did not start) emit debug/runSession/end and update the AppHost aggregate. Previously a debug/runSession/start without a matching /end was emitted and anyNonZeroExit was not flipped, so debug/appHost/end under-reported failures. - AspireDcpServer: route nonzero debug/runSession/end through sendTelemetryErrorEvent (matches the catch-block path) so faults get the reporter's stricter scrubbing. - DashboardTelemetryPassthrough: dispatch flattenProperties on PropertyType.Metric (the C# dashboard sends metric values as invariant-culture strings) instead of typeof === 'number'; drop PropertyType.Pii properties defensively; stringify other numbers instead of promoting to measurements. - AspireDebugSession.dispose: snapshot start-event metadata, run disposables before emitting telemetry, and defer debug/appHost/end via setTimeout(500ms) so child sessionTerminated notifications have a chance to flow through the adapter tracker before the aggregate is taken. - README: scope the absolute privacy claim to events the extension originates, and call out the Pii-drop behavior on dashboard passthrough. - Add dashboardTelemetryPassthrough.test.ts (8 tests) covering Metric-string parsing, Pii drop, numeric non-promotion, and complex object stringification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nsion
Wire-format / correctness:
- Dashboard sends JSON via HttpClient.PostAsJsonAsync which since .NET 9
uses JsonSerializerOptions.Web (camelCase property naming, numeric enum
serialization). The TS interfaces in DashboardTelemetryPassthrough were
PascalCase, so every dashboard request bound to undefined fields:
- flattenProperties dropped all dashboard properties (Value/PropertyType
vs value/propertyType).
- PostOperationRequest.Result was undefined, so failure routing to
sendTelemetryErrorEvent never fired.
- All other route handlers (operation, userTask, fault, asset, property,
recurringProperty, commandLineFlags) read undefined fields.
Rewrite the interface definitions to camelCase and add numeric-enum
label mappers (telemetryResultLabel, faultSeverityLabel, isFailureResult).
- Route Failure/UserFault results from PostOperation/PostUserTask/endOperation
through sendTelemetryErrorEvent so they participate in the reporter's
stricter scrubbing pass.
Privacy:
- Add scrubFreeformDiagnosticText to truncate forwarded dashboard exception
messages and fault descriptions to 1024 chars (defense-in-depth on top of
the reporter's pattern-based PII scrubbing).
- Clamp resource_type / distinct_resource_types telemetry to the supported
resource-type set; unsupported launch-configuration types are now reported
as 'unsupported' instead of leaking the raw CLI-supplied type string.
- Validate the launch.json command against the AspireCommandType allowlist
('run' | 'deploy' | 'publish' | 'do') before emitting it; unknown values
collapse to 'other'.
- Tighten README to be explicit that forwarded dashboard exception messages
may contain user-controlled fragments and document the new resource_type
clamping.
Performance:
- Fix _handleStart abandonment-timer closure leak: project eventName out
of payload before capturing it in the 1h setTimeout closure so the
arbitrarily large payload.settings.startEventProperties bag is not
retained for the entire TTL.
AppHost classification:
- Add classifyAppHostDirectory which synchronously enumerates marker files
(apphost.{ts,mts,cts,js,mjs,cjs}, AppHost.cs, *.csproj). Previously,
every directory-style AppHost was classified as 'unknown', which would
miscategorize the entire TypeScript AppHost cohort once it landed.
- AspireDebugSession now uses classifyAppHostDirectory when appHostPath
refers to a directory.
Security:
- Fix loose bearer-prefix check in requireHeaders/requireBearerOnly.
auth.split('Bearer ').length === 2 accepted other schemes that happened
to contain 'Bearer ' as a substring (e.g. 'X-Bearer <token>'). Replace
with startsWith('Bearer ') + slice and factor the two middlewares onto
a shared validateBearerToken helper.
Tests:
- Update dashboardTelemetryPassthrough.test.ts fixtures from PascalCase to
camelCase and add coverage for enum label mappers and scrub helper.
- Add classifyAppHostDirectory tests covering csharp / typescript /
unknown / mixed cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
e4745da to
6e2a01a
Compare
6e2a01a to
22af314
Compare
The Aspire dashboard sends arbitrary event names and arbitrary property keys over HTTP. Previously the passthrough forwarded those names verbatim, which meant every new dashboard event or property silently added rows to the (EntityName, PropertyName) classification catalog and required someone to chase down the new entries to classify them. This change: - Introduces src/utils/telemetryRegistry.ts: a per-event schema mapping every event name to its allowed properties + measurements. The schema mirrors the catalog's pair-based model so every "row" the extension is allowed to produce is enumerable in one file. - Makes sendTelemetryEvent / sendTelemetryErrorEvent generic over the registered event names, so TypeScript rejects calls with unregistered event names or unregistered property/measurement keys at compile time. - Restricts setCommonTelemetryProperties to the registered common-property keys, since every common property duplicates into a row per event. - Replaces the dashboard passthrough's per-route emit-with-payload-name pattern with a fixed set of extension event names (dashboard/operation, dashboard/userTask, dashboard/fault, dashboard/asset, dashboard/scope/start, dashboard/scope/end, dashboard/property/set, dashboard/property/recurring, dashboard/commandLineFlags). The dashboard's original event name now lives in 'dashboard_event_name'. - Replaces flattenProperties with bundleDashboardData, which JSON-encodes the per-event property/metric bag into stable 'dashboard_properties' / 'dashboard_measurements' string fields, with size guardrails (100 entries / 8 KB per bundle, with a __truncated__ marker on overflow). - Drops the getTelemetryReporter export to remove a bypass path that let callers skip the typed wrapper layer. - Updates the README to describe the new bounded surface and points reviewers at the registry. Tests: 664 passing (was 658 — 6 new bundleDashboardData / formatCorrelations tests cover Pii drops, truncation by count, truncation by size, independent property/measurement budgets, and correlation formatting). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
22af314 to
a5f505e
Compare
Adversarial multi-model review of the VS Code dashboard telemetry
passthrough surfaced several ways user/workspace content or attacker
input could reach the telemetry channel despite the README's "no
resource names or workspace contents" guarantee.
Privacy hardening — drop free-form, dashboard-composed strings instead
of merely truncating them (truncation bounds volume, not sensitivity):
- bundleDashboardData now drops the Basic-tagged exception message and
stack trace keys (Aspire.Dashboard.Exception.Message/StackTrace) which
embed resource names and home-directory paths. Structured
Exception.Type/RuntimeVersion are retained.
- /telemetry/fault no longer forwards `description` (the dashboard builds
it as `${type}: ${ex.Message}`).
- /telemetry/endOperation no longer forwards `error_message` (the only
producer passes a caught `ex.Message`, see DashboardCommandExecutor).
- /telemetry/operation and /telemetry/userTask no longer forward
`result_summary` (latent free-form field at the network boundary).
- Removed the now-unused description/error_message/result_summary members
from the telemetry registry so they can't be reintroduced accidentally.
Additional boundary hardening:
- formatFlagPrefixes strips flag values (keeps only the name before the
first `=`, `:`, or whitespace) so `--token=secret` can't leak the value.
- formatCorrelations clamps each eventType/id (previously only count-capped).
- scrubFreeformDiagnosticText accepts `unknown` and coerces non-strings to
'' so a malformed JSON body can't throw a TypeError -> Express 500.
- Telemetry `mode` is clamped to Debug/NoDebug/Unknown/other instead of
forwarding the CLI-controlled value verbatim.
Updated the README data/telemetry section and added unit + route tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Final multi-model privacy/security review of the telemetry passthrough surfaced several JSON-boundary hardening gaps. Fixes: - Property routes now bundle the value under the real property name so the free-form drop-list and key clamping apply (they were bypassed by the synthetic 'value' wrapper key). - asset_event_version, result/severity labels, and the Pii discriminator now coerce numeric input instead of interpolating/forwarding arbitrary strings or arrays verbatim. - Truncation markers are reserved inside the cap so clamped values never exceed their documented maximum. - formatCorrelations now bounds total serialized size, not just per-element and count caps. - Dropped the console-logs resource-name key as defense-in-depth. - Removed a dead isExtensionTelemetryEnabled import. Added unit tests for each behavior; full extension suite passes (708). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Implements the next round of telemetry instrumentation in the Aspire VS Code extension, including meaningful engagement, view visibility, command outcome/duration tracking, debug lifecycle signals, and a dashboard telemetry passthrough bridge (with an explicit privacy notice in the extension README).
Changes:
- Adds a typed telemetry event/property registry and expands the telemetry utilities to support common properties plus command outcome/duration tracking.
- Instruments extension commands, view visibility, meaningful engagement triggers, and debug session lifecycle events.
- Implements the
/telemetry/*HTTP contract for dashboard telemetry passthrough with bundling/scrubbing guardrails and adds extension unit/route-level tests.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| extension/src/views/AppHostsViewTelemetry.ts | Emits runningAppHostsView/shown on debounced hidden→visible transitions with counts/measurements. |
| extension/src/utils/telemetryRegistry.ts | Central registry of allowed event names and per-event property/measurement keys for classification control. |
| extension/src/utils/telemetry.ts | Adds common-property merging, enablement gating, error event helper, and withCommandTelemetry outcome/duration wrapper. |
| extension/src/utils/meaningfulEngagement.ts | Adds once-per-session engagement/active emission gated on AppHost detection/command/debug-session triggers. |
| extension/src/utils/appHostLanguage.ts | Adds AppHost language-family classifiers for telemetry (workspace and debug-path variants). |
| extension/src/test/telemetry.test.ts | Unit tests for telemetry utilities (common props merge, command outcome/duration, cancellation detection). |
| extension/src/test/dashboardTelemetryRoutes.test.ts | Route-level integration tests for dashboard→extension passthrough normalization and privacy guarantees. |
| extension/src/test/dashboardTelemetryPassthrough.test.ts | Unit tests for bundling/scrubbing/clamping and defensive parsing in passthrough helpers. |
| extension/src/test/appHostLanguage.test.ts | Unit tests for AppHost language summary and path/directory classification. |
| extension/src/extension.ts | Wires engagement reporter, adds command instrumentation wrapper, instruments many commands, and routes tryExecuteCommand through telemetry. |
| extension/src/debugger/AspireDebugSession.ts | Emits debug/appHost/start and delayed debug/appHost/end summary using DCP aggregate stats. |
| extension/src/dcp/DashboardTelemetryPassthrough.ts | Implements the full /telemetry/* dashboard contract, normalization, and privacy/scrubbing guardrails. |
| extension/src/dcp/AspireDcpServer.ts | Adds dashboard telemetry routes, tightens bearer parsing, emits debug/runSession/*, and hardens IDs with crypto.randomBytes. |
| extension/README.md | Documents telemetry collection categories and privacy posture/guarantees. |
- tryExecuteCommand: when the CLI is unavailable and the user is redirected to install it, throw a cancellation so withCommandTelemetry records the invocation as 'canceled' instead of a false 'success'. The existing catch suppresses the error toast since the redirect already informed the user. - requireBearerOnly (/telemetry/*): the 'missing' bearer failure now returns an Authorization-only message. Those routes intentionally do not require the DCP instance-id header, so the combined 'Authorization and ...DCP-Instance-ID headers are required' message was misleading. This branch is only reachable from requireBearerOnly (requireHeaders handles missing headers inline). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address final review findings in the VS Code extension telemetry implementation by tightening dashboard passthrough privacy, binding DCP to localhost, preserving AppHost failure telemetry, and classifying user cancellations correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Require dashboard telemetry requests to carry a scoped DCP instance id, propagate that id from AppHost configuration into the dashboard sender, and register/prefix the extension telemetry catalog for VS Code visibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Brings 43 release-branch commits forward onto main now that 13.4.0 has shipped. This PR replaces the original automated merge (microsoft#17804) which had to be closed so that conflict resolution and post-merge cleanups could be made on a non-protected branch. Conflict resolution summary (33 files): * Equivalent backports (took main's commit identity): ChannelUpdateWorkflowTests, LoggingHelpersTests, the four extension test files, AspireEditorCommandProvider, appHostDiscovery. * Release-only forwards (preserved): microsoft#17732 / microsoft#17756 Foundry hosted-agent protocol selection and cross-compute-environment endpoint references, microsoft#17573 stabilize PrebuiltAppHostServer staging globalPackagesFolder path, microsoft#17743 staging-identity CLI darc feed routing. * Main-only forwards (preserved): microsoft#17506 Show discovered AppHosts in Aspire pane, microsoft#17547 Localize Aspire skills metadata errors, microsoft#17801 VS Code v1.12.0, microsoft#17297 Aspire CLI npm package release integration, microsoft#17576 TerminalRun IAsyncDisposable, microsoft#17721 / microsoft#17723 VS Code telemetry, microsoft#17671 ATS baseline fix (re-applied manually on top of Foundry source taken from release). * Hybrid (manually spliced): docs/contributing.md - kept main's restructured layout and inserted release's staging-validation paragraph; HostedAgentBuilder- Extension - took release base then re-applied microsoft#17671 asHostedAgent rename; UpdateCommandTests - took main and injected microsoft#17743's OverrideCliInformationalVersionConfigKey block. Post-merge cleanups included in this PR: * eng/Versions.props: revert StabilizePackageVersion to false (was flipped to true on release/13.4 by microsoft#17520 for shipping 13.4.0; main must stay in preview mode). * .github/workflows/generate-api-diffs.yml: retarget back to main (was pointed at release/13.4 by microsoft#17696 release prep). * .github/workflows/backmerge-release.yml: update from release/13.3 to release/13.4 (was stale - missed the 13.4 release-time bump). * .github/workflows/milestone-assignment.yml: audited - already correct (main -> 13.5, release/13.4 -> 13.4.x); no change. This merge commit must be preserved - do not squash on merge. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Implements the VS Code extension telemetry signals from #17721. Adds five categories of events plus a foundation for outcome/duration tracking, with a documented data-collection notice in the extension README. CLI-side telemetry signals listed in #17721 are intentionally out of scope for this PR and tracked separately.
Fixes #17721
What's included
engagement/active— once per session on first AppHost-detected / command / debug session trigger; common propertyapphost_languagesis published once knownrunningAppHostsView/shown— 1s-debounced hidden→visible transitions;view_modeproperty +running_apphosts/total_resourcesmeasurementscommand/invoked— bothtryExecuteCommandand bypass paths (~30 commands across editor / tree / code lens / walkthrough) routed through the newregisterInstrumentedCommandhelper withsource,outcome(success / error / canceled), andduration_msdebug/runSession/{start,end}(per child resource) anddebug/appHost/{start,end}(per AppHost). End event aggregatestotal_child_sessions,distinct_resource_type_count+ names,ended_with_error, andduration_msso we can see multi-language debugging at a glance/telemetry/*contract bridged to the extension's reporter, replacing the hardcoded/telemetry/enabled = falsestub. Honors the global VS Code telemetry setting. Bearer-only auth middleware for routes the dashboard hits (it does not send the DCP instance-id header). Per-operation correlation Map with a 1h abandoned-op TTL.Foundation work
sendTelemetryErrorEventhelperwithCommandTelemetrywrapper records outcome and duration_ms; cancellation-aware so cancel errors don't show error toastsapphost_languages,apphost_present) attached automatically to every eventsummarizeAppHostLanguages(candidates) → csharp | typescript | polyglot | unknown | noneclassifierclassifyAppHostPath(path) → csharp | typescript | unknownfor the debug-session path__setReporterForTests,__resetCommonPropertiesForTests) for unit testsPrivacy
The README's new "Data and telemetry" section enumerates every event category and explicitly documents that source code, file paths, env values, and resource names are not reported. All events are gated on VS Code's global
telemetry.telemetryLevelsetting via@vscode/extension-telemetry.Checklist
yarn run test) passes with 635 tests<remarks />and<code />elements on your triple slash comments?Security notes
requireBearerOnlymiddleware on/telemetry/*routes still requires the DCP bearer token; it only relaxes the DCP-instance-id header requirement because the dashboard's HttpClient does not send it. The token is the same shared secret already used to gate every other DCP route.@vscode/extension-telemetry, which respects the global VS Codetelemetry.telemetryLevelsetting. The extension never logs or persists payloads outside the existing extension log output channel.JSON.stringify, but no per-event allow-list is enforced. Future work could add one if we observe the dashboard sending unexpectedly large or sensitive properties; current dashboard surface area is limited to its own structuredAspireTelemetryPropertyshape.