Skip to content

Promote synthetic STARTUP session for Playwright connectOverCDP#2399

Open
staylor wants to merge 1 commit into
lightpanda-io:mainfrom
staylor:fix/playwright-startup-session
Open

Promote synthetic STARTUP session for Playwright connectOverCDP#2399
staylor wants to merge 1 commit into
lightpanda-io:mainfrom
staylor:fix/playwright-startup-session

Conversation

@staylor
Copy link
Copy Markdown
Contributor

@staylor staylor commented May 8, 2026

Summary

Makes Lightpanda's CDP server work with playwright-core's chromium.connectOverCDP. Previously, Playwright would auto-attach to the synthetic STARTUP target Lightpanda advertises in Target.setAutoAttach, drive it with Page.enable / Page.navigate / etc. on sessionId="STARTUP", and time out forever waiting for Page.frameNavigated events that never came — the dispatcher silently {}-acked every STARTUP-tagged command except Page.getFrameTree.

Reproduction

import { chromium } from 'playwright-core';
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const ctx = browser.contexts()[0] ?? await browser.newContext();
const page = ctx.pages()[0] ?? await ctx.newPage();
await page.goto('https://www.allbirds.com/products/mens-wool-runners', { waitUntil: 'load' });
// → TimeoutError: page.goto: Timeout exceeded

puppeteer-core was unaffected because it uses a different protocol shape: puppeteer.connect()browser.createBrowserContext()context.newPage(), which sends Target.createBrowserContext + Target.createTarget (no sessionId) and then drives the new target on a real session_id assigned by doAttachtoTarget. It never sends a STARTUP-tagged work command.

Root cause

Lightpanda's CDP server is built around the assumption "browser starts empty, driver creates everything." Target.setAutoAttach advertises a synthetic placeholder target with sessionId="STARTUP" so drivers don't block waiting for a real target to appear. The comment in setAutoAttach makes the assumption explicit: "Hopefully, the first thing they'll do is create a real BrowserContext and progress from there."

Playwright's chromium.connectOverCDP does the opposite. It assumes a real Chrome on the other end and auto-attaches to whatever target is advertised, then immediately drives that target. The dispatcher's dispatchStartupCommand:

fn dispatchStartupCommand(command: *Command, method: []const u8) !void {
    if (std.mem.eql(u8, method, "Page.getFrameTree")) {
        return dispatchCommand(command, method);
    }
    return command.sendResult(null, .{});  // ← {}-acks EVERYTHING else
}

silently drops Page.enable, Page.navigate, Network.enable, Runtime.enable, Page.setLifecycleEventsEnabled, Page.addScriptToEvaluateOnNewDocument, Emulation.*, etc. After Page.navigate, no Page.frameNavigated / lifecycleEvent / loadEventFired events ever fire and page.goto times out.

Fix

Three coordinated changes:

1. Lazy-promote on first STARTUP work command. dispatchStartupCommand now creates a real BrowserContext + Target whose session_id is the literal string "STARTUP" the first time a STARTUP-tagged command needs real page state, then routes through the normal dispatcher. After promotion, isValidSessionId accepts "STARTUP" and every subsequent command flows through the standard handlers.

Target.* and Runtime.runIfWaitingForDebugger are explicitly held out: Puppeteer sends Target.setAutoAttach and Runtime.runIfWaitingForDebugger with sessionId="STARTUP" between puppeteer.connect() and its own Target.createBrowserContext. Promoting on those would steal the bc out from under Puppeteer's createBrowserContext (which would then error with "Cannot have more than one browser context at a time"). Silently {}-acking them, as before, keeps Puppeteer's flow intact.

Once a bc exists with session_id != "STARTUP" (i.e. a real session_id was assigned by createTarget + doAttachtoTarget), further STARTUP-tagged commands are rejected with -32001 "Unknown sessionId" rather than silently no-oped, since they're now stale.

2. Synthetic IDs match the real first frame / context. The synthetic targetId / browserContextId in setAutoAttach were "TID-STARTUP" / "BID-STARTUP". They are now "FID-0000000001" / "BID-1" — the same strings the first real Target / BrowserContext will be assigned (Session.nextFrameId returns 1 first; the bc id generator returns BID-1 first). After promotion, every event Lightpanda emits (Page.frameNavigated, Runtime.executionContextCreated, etc.) carries IDs that line up with what the driver already recorded from the synthetic event. Page.getFrameTree's synthetic placeholder was updated to match (LID-STARTUPLID-0000000001). Without this, Playwright's first getFrameTree response carried a different frame.id than the targetId it had just learned from Target.attachedToTarget, and Playwright marked the original main frame as detached — page.goto then errored synchronously with "Frame has been detached. (after 1ms)".

3. doAttachtoTarget reuses the literal "STARTUP" session_id when the synthetic STARTUP session was already advertised. A new flag cdp.startup_session_advertised is set when setAutoAttach emits the synthetic event, and consumed by doAttachtoTarget the next time it would otherwise generate a fresh session_id. Without this, Puppeteer's flow ended up with two Target.attachedToTarget events for the same targetId (one with sessionId="STARTUP", then another with sessionId="SID-1" after createTarget ran). Puppeteer treats those as two separate sessions and tries to initialize a Page over each; the STARTUP one then fails every command with "Unknown sessionId" because bc.session_id had been overwritten to SID-1.

A new helper promoteStartupSession in domains/target.zig mirrors the bootstrap portion of createTarget but skips the Target.targetCreated / Target.attachedToTarget events (the driver already received Target.attachedToTarget in setAutoAttach and a duplicate would re-trigger Playwright's confusion).

Verification

  • playwright-core 1.59.1 chromium.connectOverCDP + page.goto on https://www.allbirds.com/products/mens-wool-runners now returns status=200 and fires framenavigated / domcontentloaded / load events. Server stays alive across 10 sequential runs.
  • puppeteer-core 24.42.0 puppeteer.connect() + createBrowserContext + newPage().goto() still works end-to-end and extracts the expected <title> and ~922 KB body.
  • 523/523 unit tests pass. The existing cdp: STARTUP sessionId test was rewritten to assert the new lazy-promote behavior, and two new tests cover the rejection paths (bc with non-STARTUP session_id, bc with no session_id at all).
  • Combined with Defer page teardown while worker scripts are evaluating #2398 (the worker re-entrancy crash fix), 10 sequential mixed Puppeteer / Playwright runs against the same lightpanda serve process: 9 successful (status=200), 1 Playwright timeout (network flake on the 10th run), server stayed alive throughout.

Notes / out of scope

Playwright now navigates successfully but page.title() returns "" and page.evaluate(() => document.title) errors with "Execution context was destroyed, most likely because of a navigation." That's Lightpanda's Page.createIsolatedWorld / Runtime.executionContextCreated flow not re-binding Playwright's utility-world context after navigation — a separate gap, not fixed here. The user-facing page.goto path works.

Independent of #2398. Either order of landing is fine.

Lightpanda's CDP server advertises a synthetic STARTUP target +
session in setAutoAttach so drivers don't block waiting for a target
to appear. The dispatcher then blindly {}-acked everything that
arrived with sessionId="STARTUP" except Page.getFrameTree, on the
assumption the driver would call Target.createBrowserContext +
Target.createTarget itself before sending real work.

That assumption holds for puppeteer-core (it does call
createBrowserContext + createTarget on a real session_id) but breaks
playwright-core: chromium.connectOverCDP auto-attaches to whatever
target is advertised and immediately drives it (Page.enable,
Page.navigate, ...) on sessionId="STARTUP", never calling
createBrowserContext / createTarget itself. With the old dispatcher,
Page.navigate returned {} and no Page.frameNavigated / loadEventFired
events ever fired, so page.goto timed out indefinitely.

Reproducer: drive lightpanda serve with playwright-core's
chromium.connectOverCDP and call page.goto on any URL.

Fix:

  * dispatchStartupCommand now lazily promotes the synthetic STARTUP
    session into a real BrowserContext + Target whose session_id is
    the literal string "STARTUP" the first time a STARTUP-tagged
    command actually requires real page state, then routes through
    dispatchCommand normally. After promotion, isValidSessionId
    accepts "STARTUP" and every subsequent command flows through the
    standard handlers.

    Target.* and Runtime.runIfWaitingForDebugger are explicitly held
    out: Puppeteer sends Target.setAutoAttach and
    Runtime.runIfWaitingForDebugger with sessionId="STARTUP"
    *between* puppeteer.connect() and its own
    Target.createBrowserContext call. Promoting on those would steal
    the bc out from under Puppeteer's createBrowserContext (which
    would then error with "Cannot have more than one browser
    context at a time"). Silently {}-acking them, as before, keeps
    Puppeteer's flow intact.

    Once a bc exists with session_id != "STARTUP" (i.e. a real
    session_id was assigned by createTarget + doAttachtoTarget),
    further STARTUP-tagged commands are rejected with -32001
    "Unknown sessionId" rather than silently no-oped, since they're
    now stale.

  * promoteStartupSession (new in domains/target.zig) mirrors the
    bootstrap portion of createTarget but skips the
    Target.targetCreated / Target.attachedToTarget events: the driver
    already received Target.attachedToTarget in setAutoAttach and a
    duplicate would make Playwright think two sessions exist for one
    target and try to drive both.

  * The synthetic targetId / browserContextId in setAutoAttach were
    "TID-STARTUP" / "BID-STARTUP". They are now "FID-0000000001" /
    "BID-1" — the same strings the first real Target / BrowserContext
    will be assigned (Session.nextFrameId returns 1 first; the bc id
    generator returns BID-1 first). After promotion, every event
    Lightpanda emits (Page.frameNavigated, Runtime.executionContextCreated,
    etc.) carries IDs that line up with what the driver already
    recorded from the synthetic event. Page.getFrameTree's synthetic
    placeholder was updated to match (LID-STARTUP -> LID-0000000001).

  * doAttachtoTarget now reuses the literal string "STARTUP" as
    session_id (and suppresses the duplicate Target.attachedToTarget
    event) when setAutoAttach previously advertised the synthetic
    STARTUP session and bc.session_id is still null. This handles the
    Puppeteer flow: createTarget would otherwise generate a fresh
    SID-1 and emit a second attachedToTarget for the same targetId,
    causing Puppeteer to try to initialize Page state over both
    sessions in parallel; the STARTUP one would then fail every
    command with "Unknown sessionId" because bc.session_id had been
    overwritten to SID-1.

Verified:

  * playwright-core 1.59.1 chromium.connectOverCDP + page.goto on
    https://www.allbirds.com/products/mens-wool-runners now returns
    status=200 and fires framenavigated / domcontentloaded / load
    events. Server stays alive across 10 sequential runs.

  * puppeteer-core 24.42.0 puppeteer.connect() + createBrowserContext
    + newPage().goto() still works end-to-end and extracts the
    expected <title>. (Server then segfaults on disconnect from a
    separate worker re-entrancy bug fixed in lightpanda-io#2398.)

  * 523/523 unit tests pass; the existing "cdp: STARTUP sessionId"
    test was rewritten to assert the new lazy-promote behavior, and
    two new tests cover the rejection paths (bc with non-STARTUP
    session_id, bc with no session_id at all).
@staylor
Copy link
Copy Markdown
Contributor Author

staylor commented May 10, 2026

@krichprollsch krichprollsch self-requested a review May 11, 2026 09:03
@krichprollsch
Copy link
Copy Markdown
Member

Hello @staylor, first thank you for the PR.

Do you have a concrete example where the current behavior blocks you? ie. you can't create the first browser context by yourself?

While I agree w/ your initial analysis and the issue w/ clients trying to use the default browser context/target, I'm not sure about supporting lazy load on auto-attach.

As you mentioned, our idea is to offer an empty browser first and delegate the responsibility to create new browser context and new page itself.

Lazy load is smart, but complex since you have different paths depending on the client (which can change in the future).

And it doesn't really work in the case of a Playwright script using connectOverCDP + creating new browser context. It results in trying to create 2 successive BCs which is not currently possible. That's why e2e tests are currently failing.

So the main problem is we don't have clear paths between Playwright scripts using the default BC and the ones creating a clean new BC.

That's why we have to explain to Stagehand's users to create a new page while the official doc reuses the existing one.

    // Important: in the official documentation, Stagehand uses the default
    // existing page. But Lightpanda requires explicit page creation
    // instead.
    const page = await stagehand.context.newPage();

Even if we could create a default BC, we don't want to penalize all Playwright users by creating two if they don't use the default one.

That's why I'm in the position where I think it's better to ask users to create context manually instead of trying to use the default one.

But at least 1 thing could be improved: it would be better to return an explicit error than silently accept the command on STARTUP. But we have to choose the commands carefully, I don't want to break current support. Maybe we could safely display server-side warnings.

WDYT?

@krichprollsch krichprollsch self-assigned this May 11, 2026
gh-actions-shared Bot pushed a commit to xf-checkout/ai-panda-browser that referenced this pull request May 12, 2026
Complementary to LP.setSubframeLoading (preceding commit): exposes
the same iframe-skip behavior as a CLI option that applies to all
sessions in the process. Useful for:

  * the 'fetch' subcommand (no CDP driver to call LP.setSubframeLoading)
  * 'serve' deployments where the operator wants iframes off by
    default for every connecting client (the LP method can still
    re-enable per-session if needed)
  * Playwright's chromium.connectOverCDP, which can't reliably issue
    custom CDP methods on Lightpanda today: BrowserContext.newCDPSession
    and Browser.newBrowserCDPSession both attach a new CRSession that
    collides with the STARTUP-session reuse from lightpanda-io#2399, triggering a
    Playwright internal assertion. With --disable-subframes set on the
    server, Playwright doesn't need to issue any custom CDP \u2014 every
    session inherits subframes-off and the executionContextId churn
    from lightpanda-io#2400 never trips.

Verified:

  serve --disable-subframes + plain puppeteer-core goto
    [ok] goto status=200 elapsed=6354ms frameAttached=0

  fetch --disable-subframes --dump html https://www.allbirds.com/...
    exit=0
    html bytes: 1021562
    title: <title>Allbirds Wool Runners, Men's | ...</title>
    iframe count in dumped html: 2  (still in DOM, just not loaded)

521/521 unit tests pass.
@staylor
Copy link
Copy Markdown
Contributor Author

staylor commented May 12, 2026

this is less important for-me-personally now, since I just shifted to using Puppeteer instead - I'll come back to this if someone needs to confirm Playwright behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants