Defer page teardown while worker scripts are evaluating by staylor · Pull Request #2398 · lightpanda-io/browser

staylor · 2026-05-08T18:57:55Z

Summary

Fixes a use-after-free segfault in worker script evaluation when a CDP message arrives mid-fetch during importScripts().

Reproduction

Drive lightpanda serve with puppeteer-core's puppeteer.connect({ browserWSEndpoint }) against any URL that loads dedicated workers calling importScripts() during initial eval. The Allbirds product page (https://www.allbirds.com/products/mens-wool-runners) loads ~8 web-pixel workers each calling importScripts(), and reliably triggered the crash within 1–10 sequential connections to the same server.

Stack signature (truncated):

Segmentation fault at address 0x...
std/hash_map.zig:798:33 in capacity            ← self.metadata.? - 1 dereferences freed page
std/hash_map.zig:1148:39 in getOrPutAssumeCapacityAdapted
src/browser/js/Context.zig:276 in addIdentity   ← identity_map.getOrPut on a freed Identity
src/browser/js/Local.zig:229    in mapZigInstanceToJs
src/browser/js/Caller.zig:382   in handleError  ← mapping a Zig error to a JS exception
src/browser/js/bridge.zig:161   in wrap         ← V8 callback into Zig
... v8 frames ...
src/browser/js/Local.zig:194    in compileAndRun
src/browser/webapi/Worker.zig:190 in loadInitialScript
src/browser/webapi/Worker.zig:178 in httpDoneCallback

Root cause

WorkerGlobalScope.importScripts performs a synchronous HTTP request via HttpClient.syncRequest. To stay responsive during a long fetch, syncRequest pumps the CDP socket via cdp.blocking_read while waiting for the HTTP response. If a CDP message such as Target.closeTarget arrives on that socket mid-fetch, the dispatcher synchronously tore down the page:

Worker JS → importScripts → syncRequest → blocking_read
  → CDP dispatch → Target.closeTarget
  → Session.removePage → Page.deinit → Frame.deinit
  → Worker.deinit (frees worker arena + identity_map)

When control unwound back into the worker's eval, the next addIdentity call dereferenced the freed identity_map metadata pointer and segfaulted (sometimes immediately on the same connection, sometimes a few connections later as the arena pool recycled the freed memory and a different worker's identity got positioned over the old one).

Session.removePage already had a guard for this exact reentrancy pattern via frame._script_manager.base.is_evaluating, but it never tripped in the worker case because worker scripts don't go through the frame's ScriptManager — they have their own _script_manager on WorkerGlobalScope.

Fix

Two small changes:

Worker.loadInitialScript now flips _worker_scope._script_manager.is_evaluating around the eval, with was_evaluating save/restore so nested worker evals (e.g. one worker's importScripts synchronously triggering another worker's done-callback via the curl pump) compose correctly.
New helper Session.anyScriptEvaluating(frame) recursively walks the frame tree (the frame's own ScriptManager + every owned worker's ScriptManager + child frames) and returns true if any is mid-eval. Session.removePage and CDP.disposeBrowserContext use this in place of the frame-only check, so teardown is deferred whenever any script — frame, worker, or subframe — is on the call stack. Final cleanup happens at CDP.deinit on connection close, matching the existing deferred-teardown contract documented in Session.removePage.

Diff is +38 / -2 across three files: src/browser/Session.zig, src/browser/webapi/Worker.zig, src/cdp/CDP.zig.

Verification

Repro fixed: 25 consecutive puppeteer-core connect() runs against the Allbirds URL on the same lightpanda serve process. All returned status=200 with the expected <title> and ~922 KB body, server alive throughout. Pre-fix this crashed within 1–10 runs.
Mixed clients: interleaved Puppeteer and Playwright connectOverCDP runs against the same server, no crashes (Playwright still times out on page.goto due to a separate, unrelated bug in the synthetic STARTUP session — out of scope here).
Unit tests: 521/521 pass (make test).

Notes / out of scope

While reproducing this I noticed that Playwright's chromium.connectOverCDP cannot navigate against lightpanda serve at all: it auto-attaches to the synthetic STARTUP target Lightpanda advertises, sends Page.navigate on that session, and Lightpanda's dispatchStartupCommand blindly replies {} and drops the message — Playwright then waits forever for Page.frameNavigated. Puppeteer's flow (createBrowserContext → createTarget → real session) is unaffected. That's a separate fix; happy to follow up with another PR if useful.

github-actions · 2026-05-08T18:58:13Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

staylor · 2026-05-08T18:58:41Z

I have read the CLA Document and I hereby sign the CLA

karlseguin

Thanks for this. We've landed a number of features lately that have, if not introduced this issue, then certainly exasperated it. It's something we'd like to fix more holistically, but these fixes are good in the meantime and they buy us time to wrap up some other stuff currently in the pipeline and then put thought into the right design.

karlseguin · 2026-05-09T00:14:42Z

+// have been drained while a Zig->JS->Zig stack (e.g. Worker importScripts
+// -> syncRequest -> blocking_read) is mid-flight. Recursive over child
+// frames so that an evaluating subframe also defers parent teardown.
+pub fn anyScriptEvaluating(frame: *const Frame) bool {


this might be a bit nicer ergonomics as a method on Frame. In CDP.zig, it would change from:

Session.anyScriptEvaluating(&page.frame)

to:

page.frame.anyScriptEvaluating();

karlseguin · 2026-05-09T00:16:19Z

+    // arena and identity_map underneath us. Session.removePage walks
+    // every frame's workers and bails out when any is_evaluating, so the
+    // teardown is deferred until the eval unwinds.
+    const sm = &self._worker_scope._script_manager;


I think we should do the same thing in WorkerGlobalScope.importScript.

Worker scripts can call importScripts(), which performs a synchronous HTTP request via HttpClient.syncRequest. To stay responsive during a long fetch, syncRequest pumps the CDP socket (cdp.blocking_read) while waiting. If a CDP message such as Target.closeTarget arrives on that socket mid-fetch, the previous code path tore down the page immediately: Worker JS -> importScripts -> syncRequest -> blocking_read -> CDP dispatch -> Target.closeTarget -> Session.removePage -> Page.deinit -> Frame.deinit -> Worker.deinit (frees worker arena + identity_map) When control unwound back into the worker's eval, the next operation that hit ctx.identity.identity_map.getOrPut dereferenced the freed metadata pointer and segfaulted (sometimes immediately, sometimes a few connections later as the arena got recycled). Reproducer: any URL that loads dedicated workers calling importScripts during initial eval, driven via puppeteer-core's connectOverCDP. The allbirds.com product page (which loads ~8 web-pixel workers each calling importScripts) reliably triggered it within ~10 connections. Session.removePage already deferred when the frame's own ScriptManager.is_evaluating was set; that guard never tripped because worker scripts don't go through the frame's ScriptManager. Fix: * Worker.loadInitialScript now sets the worker's own _worker_scope._script_manager.is_evaluating around the eval, with save/restore so nested worker evals compose correctly. * WorkerGlobalScope.importScript also sets its own _script_manager.is_evaluating around the syncRequest + runMacrotasks. The typical caller (Worker.loadInitialScript) already sets this around its outer eval, so the outer guard usually covers us; the inner mark is defense-in-depth for callers that reach importScripts() from a setTimeout / microtask outside the loadInitialScript scope. * New Frame.anyScriptEvaluating method walks the frame tree (frame ScriptManager + every worker's ScriptManager + child frames) and returns true if any is mid-eval. Session.removePage and CDP.disposeBrowserContext use this in place of the frame-only check, deferring teardown until all evals unwind. Final cleanup happens at CDP.deinit on connection close, matching the existing deferred-teardown contract. Verified by running the puppeteer-core repro back-to-back against a single Lightpanda serve; all returned 200 with the right title, no UAF crashes (was previously crashing within 1-10 runs). All 521 unit tests still pass. Note: a separate, pre-existing latent V8 issue surfaces under stress on this same code path. After many iterations a Runtime.evaluate promise tracked by V8's inspector PromiseHandlerTracker is discarded during garbage collection's first-pass weak callbacks; the discard sends a failure response which triggers v8::String::NewFromOneByte, hitting the debug-only assertion AllowHeapAllocation::IsAllowed() in heap-allocator-inl.h:79 (no allocations allowed during weak callbacks). This reproduces on a baseline build of this PR commit and on a baseline build of just the original two-line is_evaluating fix \u2014 i.e. it is not introduced by the deferral logic. The deferral makes it more visible because inspector callbacks now live longer before teardown, so they are more likely to be alive during a GC. Tracking this as a follow-up; the fix here still resolves the UAF that was crashing the server immediately.

staylor · 2026-05-09T21:27:35Z

Both review suggestions applied (force-pushed 92607ad7):

anyScriptEvaluating is now a method on Frame instead of a free function on Session, with the call sites in Session.removePage and CDP.disposeBrowserContext updated to page.frame.anyScriptEvaluating().
WorkerGlobalScope.importScript now also sets sm.is_evaluating = true (with was_evaluating save/restore) around the syncRequest + runMacrotasks. The typical caller (Worker.loadInitialScript) already covers this via the outer guard, so the inner mark is defense-in-depth for callers that reach importScripts() from a setTimeout / microtask outside the loadInitialScript scope.

Diff is now +59 / -2 across Frame.zig, Session.zig, webapi/Worker.zig, webapi/WorkerGlobalScope.zig, cdp/CDP.zig. 521/521 unit tests still pass.

While re-running the back-to-back puppeteer-core stress test against the Allbirds repro (25 sequential connections to one server), I uncovered a separate, pre-existing latent V8 inspector lifetime bug that this PR makes more visible. Filed as #2407. tl;dr: V8's inspector tries to allocate a JS string for a Runtime.evaluate failure response from inside a GC weak-callback phase, which V8 forbids in debug builds; aborts the server with Fatal error in heap-allocator-inl.h, line 79 - AllowHeapAllocation::IsAllowed(). Reproduces both on this PR's commit and on a baseline build of just the original two-line is_evaluating fix from this PR, so it is not introduced by the deferral logic. The deferral does make it more reachable because pending inspector callbacks now live longer (they would previously have been torn down with the page during the syncRequest reentrancy this PR fixes), but the underlying V8 inspector misuse exists independently. Full stack trace, reproduction recipe, and suggested fix directions in #2407.

The original reentrancy UAF that this PR fixes is straightforwardly resolved (no SEGV, page state stays valid for the duration of the worker eval); the V8 inspector issue can be tracked and fixed separately.

This was referenced May 8, 2026

Promote synthetic STARTUP session for Playwright connectOverCDP #2399

Open

Child iframe navigation invalidates main frame's executionContextId for CDP drivers #2400

Open

Add LP.setSubframeLoading + --disable-subframes opt-out for iframe loading #2401

Merged

karlseguin reviewed May 9, 2026

View reviewed changes

staylor force-pushed the fix/worker-importscripts-segfault branch from 1f761af to 92607ad Compare May 9, 2026 21:26

This was referenced May 9, 2026

V8 fatal AllowHeapAllocation::IsAllowed() during GC weak callback under CDP load #2407

Open

inspector: avoid v8::String allocation in fromStringView lightpanda-io/zig-v8-fork#178

Merged

karlseguin merged commit 520d968 into lightpanda-io:main May 10, 2026
34 of 35 checks passed

github-actions Bot locked and limited conversation to collaborators May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer page teardown while worker scripts are evaluating#2398

Defer page teardown while worker scripts are evaluating#2398
karlseguin merged 1 commit into
lightpanda-io:mainfrom
staylor:fix/worker-importscripts-segfault

staylor commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

staylor commented May 8, 2026

Uh oh!

karlseguin left a comment

Uh oh!

karlseguin May 9, 2026

Uh oh!

karlseguin May 9, 2026

Uh oh!

staylor commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

staylor commented May 8, 2026

Summary

Reproduction

Root cause

Fix

Verification

Notes / out of scope

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staylor commented May 8, 2026

Uh oh!

karlseguin left a comment

Choose a reason for hiding this comment

Uh oh!

karlseguin May 9, 2026

Choose a reason for hiding this comment

Uh oh!

karlseguin May 9, 2026

Choose a reason for hiding this comment

Uh oh!

staylor commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 8, 2026 •

edited

Loading

staylor commented May 9, 2026 •

edited

Loading