Harden Language Model Tool telemetry against PII leaks by wenytang-ms · Pull Request #1644 · microsoft/vscode-java-debug

wenytang-ms · 2026-05-20T07:50:36Z

Centralise all LMT telemetry through src/lmToolTelemetry.ts so user-supplied strings (target, expression, sessionName, file paths, class names, JVM stack traces, etc.) can no longer reach the telemetry pipeline. The new module exposes a typed sanitizedSend choke point that only accepts enums, booleans, numbers and opaque session IDs.

Telemetry changes:

Drop sendError(error) on debug_java_application failure (stack trace leaked user class / method names).
Strip PII fields from every existing event: target, sessionName, currentFile, currentLine, simpleClassName, detectedClassName, error: String(error), input.reason.
Replace bare String(error) propagation with classifyError() -> ErrorCategory enum (mainClassMissing, classpathUnresolved, buildFailure, projectNotDetected, sessionAlreadyRunning, timeout, lsNotReady, noActiveSession, noSuspendedThread, noStackFrame, cancelled, other).
Add per-invoke recording for all 10 tools with outcome, errorCategory, durationMs, and a tool-specific enum (targetType / breakpointKind / stepKind / scopeType / evalContext / removeScope). The previous build only emitted telemetry on the launch tool and the session-info tool.
Add chatActivationSnapshot one-shot at registration time so we can measure adoption of the chat surfaces without per-turn cost (counts only).
evaluate_debug_expression: the expression text is NEVER logged. Only the evalContext enum and outcome are emitted.

Policy:

src/lmToolTelemetry.ts is now the only file in the LMT code path allowed to call sendInfo. The top-of-file policy comment is the single source of truth for what may be logged.
The recorder is typed against ToolInvocationRecord so excess raw strings are rejected at compile time.

Validated with: npm run tslint, npm run compile.

Centralise all LMT telemetry through src/lmToolTelemetry.ts so user-supplied strings (target, expression, sessionName, file paths, class names, JVM stack traces, etc.) can no longer reach the telemetry pipeline. The new module exposes a typed sanitizedSend choke point that only accepts enums, booleans, numbers and opaque session IDs. Telemetry changes: - Drop sendError(error) on debug_java_application failure (stack trace leaked user class / method names). - Strip PII fields from every existing event: target, sessionName, currentFile, currentLine, simpleClassName, detectedClassName, error: String(error), input.reason. - Replace bare String(error) propagation with classifyError() -> ErrorCategory enum (mainClassMissing, classpathUnresolved, buildFailure, projectNotDetected, sessionAlreadyRunning, timeout, lsNotReady, noActiveSession, noSuspendedThread, noStackFrame, cancelled, other). - Add per-invoke recording for all 10 tools with outcome, errorCategory, durationMs, and a tool-specific enum (targetType / breakpointKind / stepKind / scopeType / evalContext / removeScope). The previous build only emitted telemetry on the launch tool and the session-info tool. - Add chatActivationSnapshot one-shot at registration time so we can measure adoption of the chat surfaces without per-turn cost (counts only). - evaluate_debug_expression: the expression text is NEVER logged. Only the evalContext enum and outcome are emitted. Policy: - src/lmToolTelemetry.ts is now the only file in the LMT code path allowed to call sendInfo. The top-of-file policy comment is the single source of truth for what may be logged. - The recorder is typed against ToolInvocationRecord so excess raw strings are rejected at compile time. Validated with: npm run tslint, npm run compile.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Centralizes Language Model Tool (LMT) telemetry behind a new sanitized sender to reduce risk of PII leaking into telemetry, while expanding per-tool invocation metrics (outcome/error category/duration and tool-specific enums).

Changes:

Added src/lmToolTelemetry.ts with classification helpers and a recordToolInvocation “choke point” for telemetry.
Replaced ad-hoc sendInfo/sendError usage in LMT tools with sanitized recording and error classification.
Added a one-shot chat activation snapshot during extension activation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
src/lmToolTelemetry.ts	Introduces telemetry policy, classifiers, and sanitized recording APIs.
src/languageModelTool.ts	Routes tool telemetry through `recordToolInvocation` and removes raw string properties from events.
src/extension.ts	Emits a one-shot “chat activation snapshot” telemetry event with counts only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- classifyStep: unknown step operations now report 'unknown' instead of being silently mislabeled as 'over'. Also adds a runtime guard in debug_step_operation so an unknown operation no longer reaches commandMap[op]/executeCommand(undefined) or session.customRequest with an arbitrary string. - recordToolInvocation: introduces a private normalizeToolInvocationRecord that keeps 'outcome' and 'errorCategory' in lock-step for the six shared terminal values (cancelled / timeout / lsNotReady / noActiveSession / noSuspendedThread / noStackFrame). Fixes the case where debug_java_application returns {success:false,message:'Operation cancelled by user'} but outcome was 'failure' while errorCategory was 'cancelled'. - get_debug_stack_trace: empty-stack-frame early return now sets errorCategory='noStackFrame' alongside outcome (was only setting outcome). - recordLaunchInternal: signature is now a discriminated union (LaunchInternalEvent) instead of (operationName: string, properties: Record<string, SafeValue>). Unknown event names and unexpected property keys are now rejected at compile time. Updated all 8 call sites. - elapsedTime (string from .toFixed) split from elapsedMs (number) so the telemetry value is numeric and aggregable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI and others added 2 commits May 20, 2026 15:50

Merge branch 'main' into feat/lmt-telemetry-privacy-hardening

516ac8f

wenytang-ms marked this pull request as ready for review May 20, 2026 08:41

wenytang-ms requested review from chagong, jdneo and testforstephen as code owners May 20, 2026 08:41

wenytang-ms requested a review from Copilot May 20, 2026 08:43

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread src/lmToolTelemetry.ts

Comment thread src/lmToolTelemetry.ts Outdated

Comment thread src/lmToolTelemetry.ts

Comment thread src/languageModelTool.ts

Comment thread src/languageModelTool.ts

Comment thread src/languageModelTool.ts

Comment thread src/languageModelTool.ts

chagong approved these changes May 21, 2026

View reviewed changes

wenytang-ms merged commit dcbb93f into main May 21, 2026
8 checks passed

wenytang-ms deleted the feat/lmt-telemetry-privacy-hardening branch May 21, 2026 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden Language Model Tool telemetry against PII leaks#1644

Harden Language Model Tool telemetry against PII leaks#1644
wenytang-ms merged 3 commits into
mainfrom
feat/lmt-telemetry-privacy-hardening

wenytang-ms commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wenytang-ms commented May 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants