Skip to content

feat(scratchpad): persistent worker thread for scratchpad execution (Step 6.2)#540

Merged
tnaum-ms merged 26 commits intofeature/shell-integrationfrom
dev/worker-for-scratchpad
Mar 27, 2026
Merged

feat(scratchpad): persistent worker thread for scratchpad execution (Step 6.2)#540
tnaum-ms merged 26 commits intofeature/shell-integrationfrom
dev/worker-for-scratchpad

Conversation

@tnaum-ms
Copy link
Copy Markdown
Collaborator

@tnaum-ms tnaum-ms commented Mar 26, 2026

Summary

Replaces the in-process scratchpad evaluator with a persistent worker thread (Option F from the implementation plan). The worker owns its own database client, providing:

  • Infinite loop safetyworker.terminate() actually stops while(true){} / for(;;){} (the old Promise.race timeout couldn't preempt a blocked event loop)
  • Client isolation — the scratchpad and Collection View never share a database client. A disconnect/reconnect from the scratchpad cannot affect Collection View browsing
  • Zero re-auth overhead — the worker stays alive between runs. Only the first Run pays the connection + authentication cost

Architecture

Extension Host (Main Thread)              Worker Thread (lazy, persistent)
├── ScratchpadEvaluator                   ├── Own database client (auth once)
│   ├── evaluate(connection, code)        ├── ShellInstanceState
│   ├── ensureWorker() → lazy spawn       ├── ShellEvaluator
│   ├── shutdown() → graceful close       └── vm.runInContext(code)
│   └── killWorker() → force terminate
├── ScratchpadService (unchanged)
├── CodeLens, Commands (unchanged)
└── SchemaStore (receives results)

What Changed

New Files

  • src/documentdb/scratchpad/workerTypes.ts — Typed IPC message protocol (discriminated union)
  • src/documentdb/scratchpad/scratchpadWorker.ts — Worker thread entry point
  • src/commands/schemaStore/showSchemaStoreStats.ts — Diagnostics command

Modified Files

  • src/documentdb/scratchpad/ScratchpadEvaluator.tsFull rewrite: worker lifecycle, IPC routing, credential passthrough, Entra ID token delegation, phased progress notifications, session tracking for telemetry
  • src/commands/scratchpad/executeScratchpadCode.ts — Cancellable progress, schema doc cap (100), cursor unwrapping, telemetry via callWithTelemetryAndErrorHandling
  • src/commands/scratchpad/runAll.ts / runSelected.ts — Pass runMode for telemetry
  • src/documentdb/scratchpad/resultFormatter.ts — Cursor result unwrapping, authoritative type headers
  • src/documentdb/ClustersExtension.ts — Worker disposal, connection sync, editor close detection
  • src/documentdb/utils/getClusterMetadata.ts — Extracted addDomainInfoToProperties() for reuse
  • webpack.config.ext.js — Added scratchpadWorker entry point
  • package.json — Added 'Show Schema Store Stats' command

Key Decisions

  • No feature flag — direct migration, old in-process eval path removed
  • Canonical EJSON (relaxed: false) for IPC — preserves Int32/Long/Double type fidelity for schema analysis
  • Custom IPC with discriminated union types (not tRPC — simpler for ~6 message types)
  • Entra ID auth via IPC callback — worker requests tokens from main thread, VS Code session reused (no re-authorization prompt)
  • Init failure recovery — try/catch in spawnWorker() calls terminateWorker() on failure, returning to respawnable state
  • Standard telemetry patterncallWithTelemetryAndErrorHandling wraps execution (framework tracks duration/result/error automatically)

Telemetry (scratchpad.execute event)

Property Type Purpose
sessionId property UUID correlating runs within one worker lifecycle
sessionEvalCount property Nth eval in this session (usage intensity)
authMethod property NativeAuth or MicrosoftEntraID
runMode property runAll or runSelected
resultType property mongosh result type (Cursor, Document, etc.)
domainInfo_* properties Hashed host/platform data (reused from connection metadata)
initDurationMs measurement Worker spawn + auth time (0 if reused)
codeLineCount measurement Code size (not content)
duration measurement (framework) Total end-to-end time
result property (framework) Succeeded / Failed / Canceled
error / errorMessage property (framework) Auto-captured from thrown errors

Testing

  • All 30 existing scratchpad tests pass
  • Manual testing: SCRAM auth, infinite loop recovery, cancel, editor close shutdown, schema stats
  • Build verified: main.js (~6.7 MB) + scratchpadWorker.js (~6.8 MB)

Related

  • Implementation plan: docs/plan/06.2-persistent-worker-eval.md (local, not committed)
  • Review analysis: docs/analysis/pr-540-review-step-6.2.md (local, not committed)
  • Parent feature PR: [WIP] feature: shell integration 💻 #508

tnaum-ms added 15 commits March 25, 2026 16:08
- Create workerTypes.ts with typed IPC message protocol (MainToWorkerMessage, WorkerToMainMessage)
- Create scratchpadWorker.ts with init/eval/shutdown/tokenRequest handlers
- Add scratchpadWorker webpack entry point to webpack.config.ext.js
- Worker lazy-imports @mongosh/* packages (same pattern as current evaluator)
- Supports both SCRAM and Entra ID auth (OIDC via IPC token callback)
- Worker logs lifecycle events to main thread via 'log' messages

Step 6.2 WI-1, Phase 1 of 3.
Rewrite ScratchpadEvaluator to route all execution through the worker thread:

- ScratchpadEvaluator now manages worker lifecycle (spawn, kill, shutdown, dispose)
- Worker state machine: idle → spawning → ready → executing → ready (or terminated)
- Request/response correlation via requestId UUID map
- Timeout enforced via worker.terminate() — actually stops infinite loops
- Help command stays in main thread (static text, no @MongoSH needed)
- Cluster switch detection: kills worker and respawns with new credentials
- Entra ID OIDC token requests delegated via IPC to main thread
- Worker logging routed to ext.outputChannel
- executeScratchpadCode.ts: cancellable progress notification

In-process eval path is fully replaced — no feature flag.

Step 6.2 WI-1, Phase 2 of 3.
- Export disposeEvaluator() from executeScratchpadCode.ts for clean worker shutdown
- Wire evaluator disposal into extension deactivation via ext.context.subscriptions
- Worker thread is properly terminated when the extension deactivates

Completes the SCRAM auth + kill/respawn wiring (credential passthrough was
already implemented in Phase 2's buildInitMessage).

Step 6.2 WI-1, Phase 3 of 3.
SchemaStore integration:
- Cap schema feeding at 100 documents (randomly sampled via Fisher-Yates)
- Prevents unbounded IPC/memory usage for large result sets

Connection state synchronization:
- Shutdown worker when scratchpad connection is cleared (disconnect)
- Shutdown worker when the last .documentdb editor tab closes
- Worker respawns lazily on next Run

Export shutdownEvaluator() for graceful worker cleanup.

Step 6.2 WI-2, Phase 5.
…lysis

Replace JSON.parse with EJSON.parse when deserializing worker eval results.
This preserves BSON types (ObjectId, Date, Decimal128) so that SchemaAnalyzer
correctly identifies field types for autocompletion.

With JSON.parse, BSON types became plain objects ({'$oid': '...'}) causing
SchemaAnalyzer to create wrong field paths (e.g. '_id.$oid' instead of '_id')
and wrong types (object instead of objectid).

Benchmarked on 100 documents with ~100 fields each:
- JSON.parse:  2.2ms parse, broken types (wrong paths + types)
- EJSON.parse: 9.9ms parse, correct types (only Int32/Long→Double)
- Both are negligible vs actual query time (100-5000ms)

Int32 and Long are still collapsed to Double (JavaScript number) — this is
a fundamental EJSON limitation, not a serialization bug.
Show distinct progress messages during scratchpad execution:
- 'Initializing scratchpad runtime…' — worker thread being created
- 'Authenticating with {clusterName}…' — MongoClient connecting + auth
- 'Running query…' — user code being evaluated

On subsequent runs (worker already alive), only 'Running query…' is shown.
The progress notification remains cancellable (Cancel kills worker).

Added onProgress callback parameter to ScratchpadEvaluator.evaluate().
Log levels:
- Worker init/shutdown → debug (lifecycle, not user-visible)
- Eval start/end → trace (verbose diagnostic)
- Errors/uncaught exceptions → error
- MongoClient close failure → warn
- Worker exit, connection clear, editors close → debug
- Route worker IPC log messages to matching LogOutputChannel methods
  (trace/debug/info/warn/error) instead of appendLine for all

Progress UX:
- Title changed to 'DocumentDB Scratchpad' (static bold prefix)
- Phase messages show as: 'Initializing…', 'Authenticating…', 'Running query…'
- Removed cluster name from authenticating phase (redundant in context)

Cancel handling:
- Suppress error notification when user explicitly cancels execution
- Track cancelled state to avoid showing 'Worker terminated' error panel
Fix regression: worker not shutting down when scratchpad editors close.
- Switched from onDidCloseTextDocument to tabGroups.onDidChangeTabs
- onDidCloseTextDocument fires before tab state updates (race condition)
- onDidChangeTabs fires after tabs are removed, state is consistent

Add 'Show Schema Store Stats' diagnostics command:
- Shows collection count, document count, field count in output channel
- Per-collection breakdown with key, doc count, field count
- Available via Command Palette: 'DocumentDB: Show Schema Store Stats'
…e EJSON serialization

CursorIterationResult from @MongoSH is an Array subclass with extra properties
(cursorHasMore, documents). EJSON.serialize treats it as a plain object and
includes those properties, producing:
  { cursorHasMore: true, documents: [...] }
instead of just:
  [doc1, doc2, ...]

This caused:
1. Output showed cursor wrapper object instead of document array
2. resultFormatter showed 'Result: Cursor' instead of 'N documents returned'
3. SchemaStore never received documents (no _id at top level of wrapper object)

Fix: Array.from(shellResult.printable) before EJSON.stringify normalizes
Array subclasses to plain Arrays, preserving correct serialization.
@MongoSH's CursorIterationResult extends ShellApiValueClass (not Array).
Its asPrintable() returns { ...this } which produces:
  { cursorHasMore: true, documents: [doc1, doc2, ...] }

This wrapper object was passed through as-is, causing:
1. Output showed { cursorHasMore, documents: [...] } instead of just [...]
2. Header showed 'Result: Cursor' instead of 'N documents returned'
3. SchemaStore never received documents (wrapper has no _id field)

Fix: Add unwrapCursorResult() helper that extracts the documents array
from the { documents: [...] } wrapper. Applied in both:
- resultFormatter.ts — for display formatting and document count
- executeScratchpadCode.ts — for SchemaStore feeding
Use @MongoSH's result type instead of guessing from array shape:
- Cursor results: 'Result: Cursor (20 documents)' — type + batch count
- Other typed results: 'Result: Document', 'Result: string', etc.
- No type: no header line (plain JS values)

Previous behavior tried to detect 'documents' by checking Array.isArray
which was fragile and didn't communicate what kind of result it was.
…count)

- .toArray() returns type=null with an Array: show 'N results'
- .count() returns type=null with a number: no special header (value shown)
- Cursor: 'Result: Cursor (N documents)' (unchanged)
- Typed results: 'Result: Document', etc. (unchanged)
1. Untyped array results (e.g. .toArray()): 'Result: Array (5 elements)'
2. Worker eval log: include line count '(3 lines, 51 chars, db: demo_data)'
3. Schema stats: show 'db/collection' instead of internal clusterId::db::coll
- Connect instruction dialog is now modal with title/detail separation
- Scratchpad template includes note: 'only the last result is displayed'
- Help text Tips section includes same note
- Wording differs between template and help (not identical)
Replace 'MongoClient' with 'client' or 'database client' in:
- Worker log messages visible in the output channel
- JSDoc comments describing worker behavior
- Code comments in worker and evaluator

Type references (MongoClient, MongoClientOptions) are unchanged —
these are the actual driver API names.

The extension is a DocumentDB tool using the MongoDB API wire protocol.
User-facing text should not reference MongoDB implementation details.
@tnaum-ms tnaum-ms requested a review from a team as a code owner March 26, 2026 13:45
@tnaum-ms tnaum-ms requested a review from Copilot March 26, 2026 13:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements persistent scratchpad execution by moving evaluation into a lazy, long-lived Node.js worker thread that owns its own MongoDB client, improving isolation and enabling hard termination for infinite loops/timeouts.

Changes:

  • Added a bundled scratchpadWorker entry point plus a typed IPC protocol for main↔worker messaging.
  • Rewrote ScratchpadEvaluator to manage worker lifecycle, IPC request correlation, and Entra ID token delegation.
  • Updated scratchpad execution UX (cancellable progress), result formatting (cursor unwrapping/type headers), schema feeding caps, and added a SchemaStore diagnostics command.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
webpack.config.ext.js Adds a new webpack entry for scratchpadWorker so it’s bundled into dist/.
src/documentdb/scratchpad/workerTypes.ts Defines the discriminated-union IPC protocol and serializable payload shapes.
src/documentdb/scratchpad/scratchpadWorker.ts Worker thread entry: initializes its own client and evaluates code via @mongosh + vm.
src/documentdb/scratchpad/resultFormatter.ts Improves output metadata and unwraps cursor printable results for cleaner display.
src/documentdb/scratchpad/ScratchpadEvaluator.ts Replaces in-process eval with persistent worker lifecycle + IPC + token delegation.
src/documentdb/ClustersExtension.ts Hooks evaluator disposal/shutdown into extension deactivation, connection changes, and tab closure.
src/commands/scratchpad/newScratchpad.ts Updates scratchpad header comments to reflect output behavior.
src/commands/scratchpad/executeScratchpadCode.ts Adds cancellable progress UI, evaluator shutdown helpers, cursor unwrapping, and schema doc capping.
src/commands/scratchpad/connectDatabase.ts Switches “no context” guidance to a modal info message with detail text.
src/commands/schemaStore/showSchemaStoreStats.ts Adds a diagnostics command to print SchemaStore stats to the output channel.
package.json Contributes the new “Show Schema Store Stats” command.
l10n/bundle.l10n.json Adds/removes localized strings corresponding to updated scratchpad UI messages.

…state

If buildInitMessage() throws or the worker reports initResult { success: false },
spawnWorker() now calls terminateWorker() before rethrowing. This returns the
evaluator to 'idle' state so the next evaluate() call can respawn a fresh worker
instead of being stuck in the 'spawning' state indefinitely.
…numeric types

Switch from relaxed to canonical EJSON (relaxed: false) for the worker-to-main
IPC payload. Canonical EJSON preserves Int32, Long, Double, and Decimal128 type
wrappers so that EJSON.parse on the main thread reconstructs actual BSON instances.
This allows SchemaAnalyzer.inferType() to correctly distinguish numeric subtypes,
which feeds into type-aware operator ranking in completions.

Also drops the space/indent parameter from EJSON.stringify since the IPC payload
is never displayed to users — reducing transfer size.

Fixes the incorrect comment that claimed Int32/Long collapse was a fundamental
EJSON limitation (it was caused by using relaxed mode).
Mark displayBatchSize in workerTypes.ts and ScratchpadEvaluator.ts with TODO(F11)
comments noting the field is sent but not yet read by the worker. References
future-work.md §F11 for the plan to wire documentDB.mongoShell.batchSize.
…aluator

Wrap the three error strings most likely to reach users in l10n.t():
- 'No credentials found for cluster {0}'
- 'Worker is not running'
- 'Execution timed out after {0} seconds'

These flow through to vscode.window.showErrorMessage via the catch in
executeScratchpadCode.ts, so non-English users now see translated details.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.

…nResult shape

The unwrapCursorResult() check and feedResultToSchemaStore() unwrap now require
both 'cursorHasMore' (boolean) and 'documents' (array) before unwrapping.
Previously, any object with a 'documents' array field would be unwrapped,
which could false-positive on user documents with a 'documents' field.
…quest timeout

The timeout in sendRequest() is used for init, eval, and shutdown. The previous
message 'Execution timed out' was misleading when init hangs. Changed to
'Operation timed out after {0} seconds' which is accurate for all callers.
…oded 0

Capture startTime before evaluate() and compute elapsed time on failure.
The error output panel now shows the real duration instead of 'Executed in 0ms'.
Only perform count (100) swaps instead of shuffling the entire array.
Same output distribution, simpler loop bounds.
@tnaum-ms tnaum-ms merged commit 94acf44 into feature/shell-integration Mar 27, 2026
8 checks passed
@tnaum-ms tnaum-ms deleted the dev/worker-for-scratchpad branch March 27, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants