fix(wasm): isolate tree-sitter parsing in worker thread (#965)#975
fix(wasm): isolate tree-sitter parsing in worker thread (#965)#975carlos-alm wants to merge 4 commits intomainfrom
Conversation
The WASM grammar can trigger uncatchable V8 fatal errors that kill whichever thread runs it. Running parses in a worker_threads worker means the crash kills only the worker — the pool detects the exit, marks the in-flight file as skipped, respawns the worker, and continues with the rest of the build. - wasm-worker-entry.ts: worker entry that owns all tree-sitter WASM calls; runs extractor + AST visitor walk before tree.delete() so the main thread never holds a live Tree - wasm-worker-pool.ts: single-worker pool with crash-respawn logic - wasm-worker-protocol.ts: typed message protocol with serializable ExtractorOutput (no _tree field) - parser.ts: route WASM parses through the worker pool - tests/parsers/wasm-hardening.test.ts: coverage for skip-and-continue behavior on extractor errors docs check acknowledged: internal plumbing only — no new CLI surface, no new language support, no architectural tier changes visible to users Impact: 80 functions changed, 30 affected
|
Claude finished @carlos-alm's task —— View job Reviewing PR: WASM Worker IsolationRoot Cause AnalysisProblem Being Solved: WASM grammar parses cause V8 fatal errors on Windows + Node 22, killing the entire build process and leaving users with zero progress. Why This Approach: Worker thread isolation is the correct solution - contains fatal V8 crashes to the worker thread, allowing graceful degradation where the triggering file is skipped and the build continues for all other files. Risk Assessment: Low-risk defensive enhancement. Preserves all existing functionality while adding crash resilience. Backlog Compliance
Code Quality AnalysisStrengths:
Architecture Quality:
Technical Implementation:
Critical ConcernsNone identified. This is a well-engineered defensive enhancement. Final Recommendation
This PR directly improves AI agent reliability by preventing build crashes that waste tokens and break coding workflows. The worker isolation approach is sound and well-implemented. |
Greptile SummaryThis PR isolates all tree-sitter WASM calls in a Confidence Score: 4/5Safe to merge; the single P2 race only manifests when a parse completes at almost exactly the 60-second deadline. All previously raised concerns (dead workerReady field, missing timeout, stale shutdown branch, test mocking wrong registry) have been fully addressed. The remaining finding is a theoretical race in onTimeout → onExit where the wrong in-flight job could be skipped, but it requires a >60s parse that completes at exactly the timeout boundary — rare in practice and causing only a silent file skip, not data corruption. src/domain/wasm-worker-pool.ts — the onTimeout/onExit interaction around line 188–269. Important Files Changed
Sequence DiagramsequenceDiagram
participant CLI as CLI
participant P as parser.ts
participant Pool as WasmWorkerPool
participant W as Worker Thread
CLI->>P: parseFilesAuto(files)
P->>Pool: getWasmWorkerPool()
loop per file
P->>Pool: pool.parse(filePath, code, opts)
Pool->>Pool: enqueue job and pump
Pool->>W: postMessage WorkerParseRequest
Pool->>Pool: arm 60s watchdog
alt Normal parse
W-->>Pool: postMessage WorkerParseResponseOk
Pool->>Pool: clear watchdog, resolve result, pump next
else Worker crashes with V8 fatal
W--xPool: exit non-zero
Pool->>Pool: onExit, warn, skip file, resolve null, respawn worker
else Worker hangs past 60s
Pool->>Pool: onTimeout, terminate worker
W--xPool: exit zero
Pool->>Pool: onExit, warn, skip file, resolve null, pump next
end
end
P-->>CLI: Map of relPath to ExtractorOutput
CLI->>P: disposeParsers
P->>Pool: disposeWasmWorkerPool
Pool->>W: worker.terminate
Reviews (3): Last reviewed commit: "fix(wasm): dispose parser pool on CLI sh..." | Re-trigger Greptile |
| jsEntry.extractor = () => { | ||
| throw new Error('simulated extractor failure'); | ||
| }; | ||
|
|
||
| const symbols = await parseFileAuto('boom.js', 'function hello() {}', { engine: 'wasm' }); | ||
|
|
||
| expect(symbols).toBeNull(); | ||
|
|
||
| const warnings = stderrSpy.mock.calls | ||
| .map((args) => String(args[0])) | ||
| .filter((s) => s.includes('[codegraph WARN]')); | ||
| expect(warnings.some((w) => w.includes('boom.js'))).toBe(true); | ||
| expect(warnings.some((w) => w.includes('simulated extractor failure'))).toBe(true); | ||
| expect(warnings.some((w) => /skipping/i.test(w))).toBe(true); |
There was a problem hiding this comment.
Test mocks wrong registry — worker path is never exercised
The tests mutate jsEntry.extractor on the LANGUAGE_REGISTRY exported from parser.ts, but after this PR the WASM parse path goes through the worker. The worker (wasm-worker-entry.ts) builds its own independent LANGUAGE_REGISTRY and its own _extToLang map at module load time — they are completely separate objects in a different V8 isolate. Changing the extractor in the main thread has zero effect on the worker.
As a result, all three tests in this file will fail: parseFileAuto('boom.js', ...) will return a non-null result (the worker parses successfully with the real extractor), causing the expect(symbols).toBeNull() assertion to fail, and no warning containing 'simulated extractor failure' will be emitted. The tests do not verify the skip-and-continue behaviour they are titled for.
There was a problem hiding this comment.
Fixed. Rewrote tests/parsers/wasm-hardening.test.ts to stop mocking the main-thread extractor (which the worker never sees — it has its own registry in a separate V8 isolate). The new tests use a gated worker-side crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH=1 + magic token in the source text) that calls process.exit(1) mid-parse, so we verify the actual skip-and-continue behavior: single-file crash + warn, batch with one poisoned file continuing through the rest, and repeated crashes with worker respawn.
|
|
||
| export class WasmWorkerPool { | ||
| private worker: Worker | null = null; | ||
| private workerReady: Promise<void> | null = null; |
There was a problem hiding this comment.
workerReady is declared and reset to null in onExit (line 170) but is never assigned to an actual Promise and is never awaited or read anywhere. It appears to be scaffolding left over from an earlier design. This is dead code that adds confusion about whether there is an intended start-up gate for the worker.
There was a problem hiding this comment.
Fixed. Removed the dead workerReady field and the corresponding no-op reset in onExit.
| private pump(): void { | ||
| if (this.disposed) return; | ||
| if (this.inFlight) return; | ||
| const next = this.queue.shift(); | ||
| if (!next) return; | ||
| this.inFlight = next; | ||
| const worker = this.ensureWorker(); | ||
| const req: WorkerRequest = { | ||
| type: 'parse', | ||
| id: next.id, | ||
| filePath: next.filePath, | ||
| code: next.code, | ||
| opts: next.opts, | ||
| }; | ||
| worker.postMessage(req); | ||
| } |
There was a problem hiding this comment.
The PR description mentions "crash/respawn + timeout handling," but no timeout is implemented. pump() posts a message to the worker and sets this.inFlight, but nothing ever fires if the worker hangs (e.g., WASM enters an infinite loop instead of crashing). In that case the PendingJob.resolve is never called and the parse() Promise hangs forever, stalling the entire build — a worse outcome than the crash the PR is designed to prevent.
Consider adding a per-job timer (e.g. setTimeout) after posting that calls worker.terminate() and resolves the promise with null if the deadline is exceeded.
There was a problem hiding this comment.
Fixed. Added a per-job watchdog in WasmWorkerPool — each parse arms a 60s setTimeout before postMessage; if it fires, we call worker.terminate(), blacklist the file in crashedFiles, and onExit then resolves the in-flight job with null and respawns. 60s sits comfortably above the worst-case real-parse time seen in CI (~12s) while guaranteeing a build can no longer hang forever on a stuck WASM grammar.
| private onExit(code: number): void { | ||
| const crashed = this.inFlight; | ||
| this.worker = null; | ||
| this.workerReady = null; |
There was a problem hiding this comment.
Fixed — removed alongside the workerReady field.
| if (msg.type === 'shutdown') { | ||
| process.exit(0); | ||
| } |
There was a problem hiding this comment.
shutdown message handler is unreachable from the current pool
The worker handles a 'shutdown' message type by calling process.exit(0), but WasmWorkerPool.dispose() never posts this message — it calls worker.terminate() directly. The shutdown branch in the worker is dead code. This is harmless but could mislead future readers into thinking there is a graceful-shutdown handshake.
There was a problem hiding this comment.
Fixed. Removed the unreachable shutdown branch from the worker's message handler and dropped the unused WorkerShutdownRequest from the protocol — the pool disposes workers via Worker.terminate() directly, so there's no graceful-shutdown handshake.
…solation Impact: 3 functions changed, 8 affected
The worker pool previously tried to load the worker entry from src/*.ts under Node's strip-types runtime, but Node's worker_threads loader does not rewrite .js import specifiers to .ts — so the worker exited with "module not found" for every file, breaking all WASM engine tests and the engine-parity CI jobs. Changes: - wasm-worker-pool.ts: resolveWorkerEntry now always points at the compiled .js. If the sibling .js exists (dist build) use it; otherwise walk up to dist/domain/wasm-worker-entry.js (tests/dev from src/). Throws a clear error instead of silently crashing the worker. - wasm-worker-pool.ts: add per-job timeout (60s) so a hung WASM grammar (infinite loop instead of crash) can no longer stall the whole build; remove dead `workerReady` field and its no-op reset in onExit. - wasm-worker-entry.ts: add a gated test-crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH=1 + magic token in code) so we can unit-test pool crash recovery without a real V8 abort. Remove dead `shutdown` message handler — the pool uses Worker.terminate() directly. - wasm-worker-protocol.ts: drop unused WorkerShutdownRequest. - tests/parsers/wasm-hardening.test.ts: rewrite. The old tests mutated jsEntry.extractor on the main-thread registry, but the worker has its own independent registry in a separate V8 isolate, so the mock was never exercised. New tests use the worker-side crash hook to verify single-file skip+warn, batch skip-and-continue, and repeated-crash recovery with worker respawn. Impact: 10 functions changed, 10 affected
Impact: 1 functions changed, 1 affected
Codegraph Impact Analysis82 functions changed → 33 callers affected across 11 files
|
Fixes #965.
Summary
worker_threadsworker so a V8 fatal error kills only the worker, not the whole build.Changes
src/domain/wasm-worker-entry.ts— worker entry that owns all tree-sitter WASM calls.src/domain/wasm-worker-pool.ts— single-worker pool with crash/respawn + timeout handling.src/domain/wasm-worker-protocol.ts— typed message protocol (serializableExtractorOutput, no_treefield).src/domain/parser.ts— routes WASM parses through the worker pool; preserves single-file + batch APIs.tests/parsers/wasm-hardening.test.ts— coverage for the skip-and-continue behavior (single file + batch).Why
The WASM engine reproducibly kills V8 on Windows + Node 22 when building the codegraph source repo (see #965). A fatal in the main thread leaves the user with zero progress. Isolating parses in a worker gives us graceful degradation: the file that triggers the crash is skipped with a warn, and every other file parses normally.
Test plan
npx vitest run tests/parsers/wasm-hardening.test.tsnpx vitest run tests/parsers(no regressions in other parsers)codegraph build . --engine wasmon Windows + Node 22 completes without crashing (the original bug: WASM engine crashes V8 reproducibly on Windows + Node 22 when building codegraph source #965 repro)codegraph build . --engine nativestill works (native path untouched)