fix(wasm): isolate tree-sitter parsing in worker thread (#965) by carlos-alm · Pull Request #975 · optave/ops-codegraph-tool

carlos-alm · 2026-04-20T06:51:38Z

Fixes #965.

Summary

WASM grammar parses now run in a worker_threads worker so a V8 fatal error kills only the worker, not the whole build.
The worker pool detects worker exit, marks the in-flight file as skipped with a warn, respawns the worker, and continues with the rest of the batch.
Per-file extractor exceptions are also caught and turned into skips (previously could leak out and abort the build).

Changes

src/domain/wasm-worker-entry.ts — worker entry that owns all tree-sitter WASM calls.
src/domain/wasm-worker-pool.ts — single-worker pool with crash/respawn + timeout handling.
src/domain/wasm-worker-protocol.ts — typed message protocol (serializable ExtractorOutput, no _tree field).
src/domain/parser.ts — routes WASM parses through the worker pool; preserves single-file + batch APIs.
tests/parsers/wasm-hardening.test.ts — coverage for the skip-and-continue behavior (single file + batch).

Why

The WASM engine reproducibly kills V8 on Windows + Node 22 when building the codegraph source repo (see #965). A fatal in the main thread leaves the user with zero progress. Isolating parses in a worker gives us graceful degradation: the file that triggers the crash is skipped with a warn, and every other file parses normally.

Test plan

npx vitest run tests/parsers/wasm-hardening.test.ts
npx vitest run tests/parsers (no regressions in other parsers)
codegraph build . --engine wasm on Windows + Node 22 completes without crashing (the original bug: WASM engine crashes V8 reproducibly on Windows + Node 22 when building codegraph source #965 repro)
codegraph build . --engine native still works (native path untouched)

The WASM grammar can trigger uncatchable V8 fatal errors that kill whichever thread runs it. Running parses in a worker_threads worker means the crash kills only the worker — the pool detects the exit, marks the in-flight file as skipped, respawns the worker, and continues with the rest of the build. - wasm-worker-entry.ts: worker entry that owns all tree-sitter WASM calls; runs extractor + AST visitor walk before tree.delete() so the main thread never holds a live Tree - wasm-worker-pool.ts: single-worker pool with crash-respawn logic - wasm-worker-protocol.ts: typed message protocol with serializable ExtractorOutput (no _tree field) - parser.ts: route WASM parses through the worker pool - tests/parsers/wasm-hardening.test.ts: coverage for skip-and-continue behavior on extractor errors docs check acknowledged: internal plumbing only — no new CLI surface, no new language support, no architectural tier changes visible to users Impact: 80 functions changed, 30 affected

claude · 2026-04-20T06:51:49Z

Claude finished @carlos-alm's task —— View job

Reviewing PR: WASM Worker Isolation

Root Cause Analysis

Problem Being Solved: WASM grammar parses cause V8 fatal errors on Windows + Node 22, killing the entire build process and leaving users with zero progress.

Why This Approach: Worker thread isolation is the correct solution - contains fatal V8 crashes to the worker thread, allowing graceful degradation where the triggering file is skipped and the build continues for all other files.

Risk Assessment: Low-risk defensive enhancement. Preserves all existing functionality while adding crash resilience.

Backlog Compliance

Zero-dep: ✓ — Uses Node.js built-in worker_threads, no new runtime dependencies
Foundation-aligned: ✓ — Fully aligned, improves reliability without violating any core principles
Problem-fit: 5/5 — Directly addresses AI agent reliability by preventing build crashes that waste tokens and break workflows
Breaking: Minimal — disposeParsers() changed sync→async, but analysis shows no external usage
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Analysis

Strengths:

Excellent error handling: Comprehensive try/catch/finally blocks with proper WASM memory cleanup
Well-structured worker protocol: Clean separation between main thread and worker with typed message interfaces
Proper resource management: Worker pool disposal, WASM tree cleanup, crashed file tracking
Backward compatibility: Preserves all existing APIs except one internal function
Comprehensive testing: New tests cover both single-file and batch crash scenarios

Architecture Quality:

Single-worker pool: Appropriate initial implementation - sequential dispatch simplifies crash attribution
Crash isolation: onExit() handler properly detects crashes, marks problematic files, respawns worker
Memory safety: tree.delete() called in finally blocks to prevent WASM heap accumulation
Protocol design: Serializable ExtractorOutput without _tree field prevents worker boundary issues

Technical Implementation:

Worker entry resolution: Smart handling of both compiled .js and source .ts contexts
Local language registry: Avoids process-global cache conflicts in worker threads
Error propagation: Extractor exceptions properly caught and converted to skips (not crashes)
Performance preservation: Lazy grammar loading, parser caching, minimal overhead

Critical Concerns

None identified. This is a well-engineered defensive enhancement.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: High-quality implementation solving a real production issue. Excellent error handling, proper resource management, comprehensive testing, and maintains backward compatibility. Minor deduction only because it's a defensive fix rather than core functionality improvement.

This PR directly improves AI agent reliability by preventing build crashes that waste tokens and break coding workflows. The worker isolation approach is sound and well-implemented.

greptile-apps · 2026-04-20T06:55:20Z

Greptile Summary

This PR isolates all tree-sitter WASM calls in a worker_threads worker so that uncatchable V8 fatal errors (reproducible on Windows + Node 22) kill only the worker, not the build process. The pool detects worker exit, skips the in-flight file with a warning, respawns lazily, and continues; a 60s watchdog prevents hangs from non-crashing infinite loops. A CLI teardown hook (disposeParsers) was also added to prevent the process from hanging after the build completes.

Confidence Score: 4/5

Safe to merge; the single P2 race only manifests when a parse completes at almost exactly the 60-second deadline.

All previously raised concerns (dead workerReady field, missing timeout, stale shutdown branch, test mocking wrong registry) have been fully addressed. The remaining finding is a theoretical race in onTimeout → onExit where the wrong in-flight job could be skipped, but it requires a >60s parse that completes at exactly the timeout boundary — rare in practice and causing only a silent file skip, not data corruption.

src/domain/wasm-worker-pool.ts — the onTimeout/onExit interaction around line 188–269.

Important Files Changed

Filename	Overview
src/domain/wasm-worker-pool.ts	Core crash-isolation pool — timeout watchdog and respawn logic are correct for the common case, but a rare race between the 60s timer firing and the worker's already-queued response message could cause the wrong job to be skipped.
src/domain/wasm-worker-entry.ts	Worker entry isolates all WASM/tree-sitter calls; test-crash hook is properly env-gated; shutdown branch removed cleanly per previous feedback.
src/domain/wasm-worker-protocol.ts	Typed message protocol is clean; WorkerShutdownRequest removed; SerializedExtractorOutput correctly excludes _tree.
src/domain/parser.ts	WASM path correctly delegates to the worker pool; disposeParsers calls disposeWasmWorkerPool; single-file and batch APIs preserved.
src/cli.ts	Shutdown hook correctly calls disposeParsers on both success and error paths, preventing the event-loop hang caused by an un-terminated worker thread.
tests/parsers/wasm-hardening.test.ts	Tests use the worker-side crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH + magic token) to simulate real V8 fatals; covers single crash, batch-with-poisoned-file, and repeated-crash respawn scenarios.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI
    participant P as parser.ts
    participant Pool as WasmWorkerPool
    participant W as Worker Thread

    CLI->>P: parseFilesAuto(files)
    P->>Pool: getWasmWorkerPool()
    loop per file
        P->>Pool: pool.parse(filePath, code, opts)
        Pool->>Pool: enqueue job and pump
        Pool->>W: postMessage WorkerParseRequest
        Pool->>Pool: arm 60s watchdog
        alt Normal parse
            W-->>Pool: postMessage WorkerParseResponseOk
            Pool->>Pool: clear watchdog, resolve result, pump next
        else Worker crashes with V8 fatal
            W--xPool: exit non-zero
            Pool->>Pool: onExit, warn, skip file, resolve null, respawn worker
        else Worker hangs past 60s
            Pool->>Pool: onTimeout, terminate worker
            W--xPool: exit zero
            Pool->>Pool: onExit, warn, skip file, resolve null, pump next
        end
    end
    P-->>CLI: Map of relPath to ExtractorOutput
    CLI->>P: disposeParsers
    P->>Pool: disposeWasmWorkerPool
    Pool->>W: worker.terminate

_{Reviews (3): Last reviewed commit: "fix(wasm): dispose parser pool on CLI sh..." | Re-trigger Greptile}

greptile-apps · 2026-04-20T06:55:24Z

+    jsEntry.extractor = () => {
+      throw new Error('simulated extractor failure');
+    };
+
+    const symbols = await parseFileAuto('boom.js', 'function hello() {}', { engine: 'wasm' });
+
+    expect(symbols).toBeNull();
+
+    const warnings = stderrSpy.mock.calls
+      .map((args) => String(args[0]))
+      .filter((s) => s.includes('[codegraph WARN]'));
+    expect(warnings.some((w) => w.includes('boom.js'))).toBe(true);
+    expect(warnings.some((w) => w.includes('simulated extractor failure'))).toBe(true);
+    expect(warnings.some((w) => /skipping/i.test(w))).toBe(true);


Test mocks wrong registry — worker path is never exercised

The tests mutate jsEntry.extractor on the LANGUAGE_REGISTRY exported from parser.ts, but after this PR the WASM parse path goes through the worker. The worker (wasm-worker-entry.ts) builds its own independent LANGUAGE_REGISTRY and its own _extToLang map at module load time — they are completely separate objects in a different V8 isolate. Changing the extractor in the main thread has zero effect on the worker.

As a result, all three tests in this file will fail: parseFileAuto('boom.js', ...) will return a non-null result (the worker parses successfully with the real extractor), causing the expect(symbols).toBeNull() assertion to fail, and no warning containing 'simulated extractor failure' will be emitted. The tests do not verify the skip-and-continue behaviour they are titled for.

Fixed. Rewrote tests/parsers/wasm-hardening.test.ts to stop mocking the main-thread extractor (which the worker never sees — it has its own registry in a separate V8 isolate). The new tests use a gated worker-side crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH=1 + magic token in the source text) that calls process.exit(1) mid-parse, so we verify the actual skip-and-continue behavior: single-file crash + warn, batch with one poisoned file continuing through the rest, and repeated crashes with worker respawn.

greptile-apps · 2026-04-20T06:55:25Z

+
+export class WasmWorkerPool {
+  private worker: Worker | null = null;
+  private workerReady: Promise<void> | null = null;


Dead workerReady field

workerReady is declared and reset to null in onExit (line 170) but is never assigned to an actual Promise and is never awaited or read anywhere. It appears to be scaffolding left over from an earlier design. This is dead code that adds confusion about whether there is an intended start-up gate for the worker.

Fixed. Removed the dead workerReady field and the corresponding no-op reset in onExit.

greptile-apps · 2026-04-20T06:55:26Z

+  private pump(): void {
+    if (this.disposed) return;
+    if (this.inFlight) return;
+    const next = this.queue.shift();
+    if (!next) return;
+    this.inFlight = next;
+    const worker = this.ensureWorker();
+    const req: WorkerRequest = {
+      type: 'parse',
+      id: next.id,
+      filePath: next.filePath,
+      code: next.code,
+      opts: next.opts,
+    };
+    worker.postMessage(req);
+  }


No timeout for a hung worker

The PR description mentions "crash/respawn + timeout handling," but no timeout is implemented. pump() posts a message to the worker and sets this.inFlight, but nothing ever fires if the worker hangs (e.g., WASM enters an infinite loop instead of crashing). In that case the PendingJob.resolve is never called and the parse() Promise hangs forever, stalling the entire build — a worse outcome than the crash the PR is designed to prevent.

Consider adding a per-job timer (e.g. setTimeout) after posting that calls worker.terminate() and resolves the promise with null if the deadline is exceeded.

Fixed. Added a per-job watchdog in WasmWorkerPool — each parse arms a 60s setTimeout before postMessage; if it fires, we call worker.terminate(), blacklist the file in crashedFiles, and onExit then resolves the in-flight job with null and respawns. 60s sits comfortably above the worst-case real-parse time seen in CI (~12s) while guaranteeing a build can no longer hang forever on a stuck WASM grammar.

greptile-apps · 2026-04-20T06:55:27Z

+  private onExit(code: number): void {
+    const crashed = this.inFlight;
+    this.worker = null;
+    this.workerReady = null;


workerReady reset is a no-op

this.workerReady = null here resets a field that was never set to anything other than null. Once the workerReady dead field is removed, this line can be removed too.

Fixed — removed alongside the workerReady field.

greptile-apps · 2026-04-20T06:55:28Z

+  if (msg.type === 'shutdown') {
+    process.exit(0);
+  }


shutdown message handler is unreachable from the current pool

The worker handles a 'shutdown' message type by calling process.exit(0), but WasmWorkerPool.dispose() never posts this message — it calls worker.terminate() directly. The shutdown branch in the worker is dead code. This is harmless but could mislead future readers into thinking there is a graceful-shutdown handshake.

Fixed. Removed the unreachable shutdown branch from the worker's message handler and dropped the unused WorkerShutdownRequest from the protocol — the pool disposes workers via Worker.terminate() directly, so there's no graceful-shutdown handshake.

…solation Impact: 3 functions changed, 8 affected

The worker pool previously tried to load the worker entry from src/*.ts under Node's strip-types runtime, but Node's worker_threads loader does not rewrite .js import specifiers to .ts — so the worker exited with "module not found" for every file, breaking all WASM engine tests and the engine-parity CI jobs. Changes: - wasm-worker-pool.ts: resolveWorkerEntry now always points at the compiled .js. If the sibling .js exists (dist build) use it; otherwise walk up to dist/domain/wasm-worker-entry.js (tests/dev from src/). Throws a clear error instead of silently crashing the worker. - wasm-worker-pool.ts: add per-job timeout (60s) so a hung WASM grammar (infinite loop instead of crash) can no longer stall the whole build; remove dead `workerReady` field and its no-op reset in onExit. - wasm-worker-entry.ts: add a gated test-crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH=1 + magic token in code) so we can unit-test pool crash recovery without a real V8 abort. Remove dead `shutdown` message handler — the pool uses Worker.terminate() directly. - wasm-worker-protocol.ts: drop unused WorkerShutdownRequest. - tests/parsers/wasm-hardening.test.ts: rewrite. The old tests mutated jsEntry.extractor on the main-thread registry, but the worker has its own independent registry in a separate V8 isolate, so the mock was never exercised. New tests use the worker-side crash hook to verify single-file skip+warn, batch skip-and-continue, and repeated-crash recovery with worker respawn. Impact: 10 functions changed, 10 affected

carlos-alm · 2026-04-20T07:52:29Z

@greptileai

Impact: 1 functions changed, 1 affected

github-actions · 2026-04-20T08:39:55Z

Codegraph Impact Analysis

82 functions changed → 33 callers affected across 11 files

shutdown in src/cli.ts:18 (1 transitive callers)
disposeParsers in src/domain/parser.ts:283 (2 transitive callers)
ensureWasmTrees in src/domain/parser.ts:320 (3 transitive callers)
mergeAnalysisData in src/domain/parser.ts:354 (3 transitive callers)
backfillTypeMap in src/domain/parser.ts:785 (5 transitive callers)
parseFileAuto in src/domain/parser.ts:848 (4 transitive callers)
backfillTypeMapBatch in src/domain/parser.ts:878 (9 transitive callers)
parseFilesWasm in src/domain/parser.ts:917 (9 transitive callers)
grammarPath in src/domain/wasm-worker-entry.ts:102 (3 transitive callers)
initRuntime in src/domain/wasm-worker-entry.ts:399 (2 transitive callers)
loadLanguageLazy in src/domain/wasm-worker-entry.ts:410 (2 transitive callers)
ComplexityFuncResult.funcNode in src/domain/wasm-worker-entry.ts:440 (0 transitive callers)
ComplexityFuncResult.funcName in src/domain/wasm-worker-entry.ts:441 (0 transitive callers)
ComplexityFuncResult.metrics in src/domain/wasm-worker-entry.ts:442 (0 transitive callers)
CfgFuncResult.funcNode in src/domain/wasm-worker-entry.ts:451 (0 transitive callers)
CfgFuncResult.blocks in src/domain/wasm-worker-entry.ts:452 (0 transitive callers)
CfgFuncResult.edges in src/domain/wasm-worker-entry.ts:453 (0 transitive callers)
CfgFuncResult.cyclomatic in src/domain/wasm-worker-entry.ts:454 (0 transitive callers)
hasFuncBody in src/domain/wasm-worker-entry.ts:459 (3 transitive callers)
indexByLine in src/domain/wasm-worker-entry.ts:474 (4 transitive callers)

greptile-apps bot reviewed Apr 20, 2026

View reviewed changes

carlos-alm added 2 commits April 20, 2026 01:33

Merge remote-tracking branch 'origin/main' into fix/965-wasm-worker-i…

577e117

…solation Impact: 3 functions changed, 8 affected

fix(wasm): dispose parser pool on CLI shutdown to prevent hang (#975)

f75e76f

Impact: 1 functions changed, 1 affected

Conversation

carlos-alm commented Apr 20, 2026

Summary

Changes

Why

Test plan

Uh oh!

claude bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR: WASM Worker Isolation

Root Cause Analysis

Backlog Compliance

Code Quality Analysis

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Codegraph Impact Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 20, 2026 •

edited

Loading

greptile-apps bot commented Apr 20, 2026 •

edited

Loading