Skip to content

test(studio): agent-browser e2e smoke for the design panel#1912

Merged
miguel-heygen merged 14 commits into
mainfrom
studio-panel-7-e2e-smoke
Jul 4, 2026
Merged

test(studio): agent-browser e2e smoke for the design panel#1912
miguel-heygen merged 14 commits into
mainfrom
studio-panel-7-e2e-smoke

Conversation

@miguel-heygen

@miguel-heygen miguel-heygen commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Part 7/7 of the design-panel fix stack (replaces #1888).

What

A standalone agent-browser e2e smoke (packages/studio/tests/e2e/design-panel.mjs) that drives the real Studio UI against a running preview of the #1906 fixture: enables the Inspector, selects one element per archetype via real mouse events at canvas coordinates, commits one representative input per panel section (text size, opacity, radius, video volume, sub-composition text), and asserts both disk persistence and post-reload rendering. Exits non-zero on any failed cell.

Run by hand (usage in the script header): copy the fixture outside the repo, start preview, point the script at it. Registered as a fallow entry (standalone script, no import-graph referrer).

Why

The seam this stack fixes — click-to-select through panel commit to disk — had zero browser-level coverage; unit suites cannot catch selection hit-testing or optimistic-vs-disk divergence. This script is the rerunnable version of the manual verification matrix in #1906 and passed 10/10 against the built CLI in embedded mode with the full stack applied.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at 2febbb88f577a696c7f51fbf99789e66f354c282.
Peer scan: no prior human reviews at review time; layering as second-lens on this stack tip (Via holds runtime-interop lane).
Stack context: base is studio-panel-6-hook-tests (#1911); this is the stack tip. Cross-stack notes below.

Summary — Adds packages/studio/tests/e2e/design-panel.mjs (251 lines): a standalone Node script that drives an external agent-browser CDP binary against a running hyperframes preview of the #1906 QA fixture. Injects a window.__qa helper into the studio's top document, shims window.fetch to capture mutation traffic, enables the Inspector, then per-archetype: (a) selects an element via real CDP mouse coordinates into the shadow-DOM iframe hosting the composition, (b) expands the relevant panel section and mutates one representative input using synthesized input/change/pointerup/Enter/blur events, (c) waits, (d) asserts disk state via readFileSync against PROJECT_DIR. Also asserts one reload-survival cell. Registered as a fallow-ignore entry — no import-graph consumer; it's a human-run script. Non-zero exit on any failed cell.

Nice piece of work — the coordinate-clicking-into-shadow-DOM + fetch-shim pattern is the right shape for what unit tests structurally cannot cover (real hit-testing + real persist round-trip), and the header docstring is unusually honest about the automation gotchas (Inspector-off-by-default, off-viewport-ref-clicks-hit-helper-buttons, range vs enter/blur event asymmetry). Registration as a fallow-ignore is correctly scoped.

Concerns

Nothing merge-blocking; a few forward-looking notes:

  • Signal-vs-noise on failure: check() prints PASS/FAIL + a JSON.stringify(s) or set=<r> blob and increments a counter. On the FAIL path there's no screenshot, no DOM dump, no network-log dump of window.__patchLog, no server log capture. Given this is the acceptance test for a 7-PR bug-fix push, a failure mid-run leaves the operator with "select.video FAIL {...}" and no next step. Consider (follow-up): on any FAIL, dump window.__patchLog (already sitting in-page) to stderr and, if agent-browser supports it, snap a PNG. Cheap; converts "the smoke failed" into "the smoke failed and here's what the last mutation call returned".
  • Timing is arbitrary sleeps, not condition-waits: await sleep(6000) after page open, await sleep(1400) after each mouse click, await sleep(2000) after each commit. This is pragmatic (agent-browser doesn't expose a waitForPredicate) but every one of these is a flake surface — under CI load or a slow machine the 2s post-commit may not be enough for the disk write to land, and disk() reads a stale file. The workaround today is "run by hand" (fallow-ignore, no CI gate). If this ever gets promoted to nightly, replace at least the post-commit sleeps with a poll on __patchLog for the expected mutation response before hitting the filesystem. Flagging early because the header already invites this promotion path.
  • Reload-survival covers only the headline edit (72px on #qa-headline). Every other edit — opacity, radius, video volume, sub-composition text — is asserted only at first-write, not after a reload cycle. That's a coverage gap vs the matrix in #1906 (which documents "all persist to disk with matched:true/changed:true and survive reload"). Given the whole point of the stack is the persist seam, at least one per-section reload-survival check (e.g. after the sub-composition edit, since that touches a separate file) would tighten the acceptance shape.
  • Timeout swallow: execFileSync("agent-browser", args, { timeout: 30000 }) — on timeout it returns "ERR: ..." from the catch. That string then flows into abEval which tries JSON.parse, fails, and returns the raw ERR string. check() receives that as s (an object-shaped thing that's actually a string) and evaluates s.selector === "#qa-headline" as undefined === "..." → false → FAIL. So a hung agent-browser call does produce a FAIL — good — but the FAIL detail then says something like set=ERR: ... rather than "TIMEOUT". A one-line check for String(r).startsWith("ERR:") before proceeding, with a distinct log tag, would save operator time.

Nits

  • The header comment says STUDIO_URL=http://localhost:3002 — worth mirroring the CLI's actual default port at head to avoid a stale-doc gotcha (if the CLI later switches ports). Not load-bearing, but the header is otherwise very precise.
  • main() is annotated // fallow-ignore-next-line complexity — that's honest, but the function does read as two logical halves (setup + assertions loop). If you ever revisit, factoring the archetype loop into an array-of-cells iteration would drop the complexity mark naturally and make it easier to add cells.
  • disk() matches by substring — readFileSync(...).includes("font-size: 72px"). This is fine for the current test, but note that if the fixture happens to already contain font-size: 72px anywhere else (or if a prior run left it in the scratch dir), the assertion is a false positive. The script's "copy to scratch dir outside the repo" instruction guards against this, but a fresh-scratch-per-run precondition should be in the usage header.
  • select.sub-title asserts both s.selector === "#qa-sub-title" and s.src === "compositions/qa-sub.html" — good composite check. The other select.* cells only assert selector. If sub-composition source-file drift is a known bucket (per the #1906 matrix S0.3), consider tightening the shape cells to assert s.src == null (i.e. master-view selections carry no sourceFile) so a regression that leaks a stale source-file is caught here too.

Cross-stack interactions

  • #1906 — this PR is the exclusive consumer of the QA fixture. Selectors #qa-headline, #qa-shape, #qa-video, #qa-sub-title and the hyperframes.json shape are all #1906's contract. Renaming any data-hf-id there silently breaks this script — no in-tree wiring catches it. That's the tradeoff for keeping the fixture out of the import graph, and it's the right call for a manual smoke, but flag it in both PRs' bodies for future maintainers.
  • #1907 (canvas selection) — the select.* cells (headline, shape, video, sub-title) are the acceptance criteria for this fix. If #1907 were to regress the coord-click → element resolution, select.headline fails first. High-value coverage of the stack's core fix.
  • #1908/#1909 (patch-op / persist-seam) — the disk(...) assertions verify the full patch-element → linkedom mutation → disk write chain. That's the acceptance test for these two PRs together. Also high-value.
  • #1910 (toast + guarded revert) — the smoke does not exercise the failure path; there's no fault-injection cell (e.g. force a matched:false from the server and assert the toast fires + optimistic value reverts). That's #1910's core value-add, so this smoke undercovers it. Not a blocker (#1911's hook tests cover the failure path in JSDOM), but worth noting the smoke's asymmetric coverage across the stack.
  • #1911 — orthogonal (JSDOM hook tests). Complementary coverage.

Questions

  • Any plan for CI gating in the future (nightly? tag-triggered?), or is this intentionally a permanent manual smoke? If nightly is on the roadmap, the sleeps → poll-on-patch-log migration becomes real work; if permanent-manual, current shape is fine.
  • The #1910 failure-path gap above — was the intent to cover that separately (a companion "unhappy-path smoke" PR) or is #1911's hook coverage considered sufficient? Both are defensible; just want to know which.
  • Does agent-browser have a way to attach the current CDP target's ID / port to logs on failure? Would tighten forensics without a code change here.

What I didn't verify

  • I did not run agent-browser locally to reproduce the 10/10 pass the PR body cites.
  • I did not confirm every synthetic-event shape (input vs change vs pointerup vs Enter/blur) is what the actual studio input handlers require — the header says these were learned empirically, and I'm taking that on faith.
  • The .fallowrc.jsonc fallow-ignore pattern semantics: I skimmed the surrounding entries and this one matches the existing shape, but I did not read the fallow tool's config parser.

— Rames D Jusso

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Smoke faithfully exercises the persist-seam; fixture-value coupling + heading-text selectors + missing-binary signalling are the runtime-interop soft spots (Rames-complementary)

Reviewed at 2febbb88f577a696c7f51fbf99789e66f354c282. Rames covered failure-signal richness, fixed-sleep flake surface, reload-survival coverage gap, disk() substring false-positives, cross-stack #1910 failure-path gap — I concur on all of those and won't restate them here.

Scope-wise this is a real smoke, not a decorative one: select() fires actual CDP mouse events at canvas coordinates (matches the S0.1 hit-testing fix path — real hit-testing, not element-ref-click that landed on sidebar helper buttons); commit() drives one representative input per panel section through the input/change/pointerup/Enter/blur sequence the panel actually listens for; each assertion pair is HTTP-persist-then-disk-content, with cross-file persist to compositions/qa-sub.html proving the shape contract from the #1906 matrix. The upstream stack's persist seam is genuinely exercised. Below are the runtime-interop mechanism concerns Rames didn't cover — coupling to fixture initial values, heading-text selectors, and the missing-binary signal that turns "agent-browser not on PATH" into a matrix of red without a targeted message.

Findings (runtime-interop lens)

1. Smoke really exercises the persist-seam — 🟢

File: packages/studio/tests/e2e/design-panel.mjs:172-215
Five archetypes covered (headline text-size, shape opacity + radius, video volume, sub-composition text-size), each verified via disk("index.html", ...) or disk("compositions/qa-sub.html", ...). Cross-file persist is directly asserted (sub.text-size writes to compositions/qa-sub.html, not index.html). Not a page-load smoke.

2. Hardcoded initial-value probes couple the smoke to the fixture's numeric literals — 🟠

File: packages/studio/tests/e2e/design-panel.mjs:206 ("48px","72px"), :210 ("100","80"), :212 ("16","24"), :216 ("50","80"), :227 ("36px","48px")
qa.setInputWhereValue(section, current, value) locates the input by matching String(el.value) === current. If someone edits #qa-headline's font-size: 48px in the fixture — which they will; fixtures evolve — the runner reports set=no input with value 48px in Text and the disk() check trivially fails. The failure is at least loud (not silent), but the coupling is unnecessary. Suggested follow-up: read the current value from the DOM (inputs[i].value) at test time, then set to a deterministic-delta value. Removes a whole rot vector at zero cost.

Related to Rames' #1906 "fixture drifts independently" observation — this is the concrete surface where that drift bites, and it's fixable inside design-panel.mjs without touching the fixture.

3. Panel-section selectors rely on h3.textContent literal match — 🟠

File: packages/studio/tests/e2e/design-panel.mjs:113-116 (h3 text match), :145-148 (ensureSection by h3 text)
qa.sectionInputs and qa.ensureSection locate panel sections by literal h3.textContent === "Text" | "Transparency" | "Radius" | "Video" .... Any panel-side heading rename or a future i18n pass silently breaks the smoke, and — critically — the smoke lives in a sibling .mjs file that a panel-side PR's grep is unlikely to touch. Not urgent today (panel isn't localized), but a stable data-testid on section headings would decouple the smoke from copy. Cheap to add on the panel side; would remove a whole class of "unrelated PR silently broke the smoke" incidents.

4. Missing agent-browser binary manifests as universal FAIL, not a targeted signal — 🟠

File: packages/studio/tests/e2e/design-panel.mjs:39-45 (ab wrapper catches ENOENT), :169-173 (main setup — no preflight)
Missing PROJECT_DIR exits 2 with a clear message. Missing agent-browser on PATH does not: ab() catches the ENOENT from execFileSync, returns "ERR: ...", and main() proceeds. Every select returns { error: "no coords" }, every check FAILs, and the operator sees ~10 FAIL lines with no hint that the actual problem is a missing binary. Suggested preflight: run agent-browser --version at the top of main(); on any failure, exit 3 with agent-browser not on PATH or non-functional. Same idea would help for a down STUDIO_URL — a fetch(STUDIO_URL) sanity call before injecting HELPERS. Both are one-liners; both save the operator from misdiagnosing a matrix of red as a "the smoke is broken" vs "my environment is broken".

Distinct from Rames' timeout-swallow finding, which is about mid-run hangs on individual agent-browser calls; this is about the zero-runs case where the binary was never resolvable.

5. Fallow ignore is scoped honestly — 🟢

File: .fallowrc.jsonc:39-41
Grouped with the existing "scripts / standalone runners" cluster; the comment names the reason ("run by hand per its header — no import-graph referrer"). Consistent with how the file is used; not smuggling a real CI gate under a manual-runner flag.


Review by Via (runtime-interop lens)

@miguel-heygen miguel-heygen force-pushed the studio-panel-6-hook-tests branch from cf015f6 to f15cddf Compare July 3, 2026 20:24
@miguel-heygen miguel-heygen force-pushed the studio-panel-7-e2e-smoke branch from 2febbb8 to 61795a9 Compare July 3, 2026 20:24

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 R2 verification — restack-only, R1 verdict stands

R2 verification — reviewed at 61795a9c9f1a12162efd4ec9f739173b444bf205 (rebased from R1 2febbb88 onto new base studio-panel-6-hook-tests; content-identical). Blob SHAs at packages/studio/tests/e2e/design-panel.mjs + .fallowrc.jsonc match R1 exactly. R1 verdict (🟢 clean) stands.

Forward-looking notes from R1 — status at current HEAD

  1. Failure signal thin — no __patchLog dump / screenshot on FAIL. Unaddressed. R1-noted-not-blocking; cheap to add and load-bearing if this ever graduates to nightly CI.
  2. Arbitrary sleeps (sleep(1400) / sleep(2000)) — still fixed timeouts, not waitFor(cond)-shaped. Unaddressed. Flake surface if promoted from manual to CI.
  3. Reload-survival per-section — still only asserts the headline edit; per-section reload coverage undercovers the persist seam the stack fixes. Unaddressed.
  4. Fault-injection cell for #1910 — no revert-on-persist-failure exercise in the smoke; still covered only by #1911's JSDOM hook tests. Unaddressed.

All 4 were forward-looking in R1, not required for sign-off — noting status here so they land in the stack-follow-up log rather than getting lost in restack churn.

Peer review layered on R1

Via (@vanceingalls) posted a complementary runtime-interop-lens review at 2febbb88 (id 4627860180) — 5 findings including 3 🟠 that add real signal beyond mine:

  • #2 (🟠): hardcoded initial-value probes couple smoke to fixture literalssetInputWhereValue(section, current, value) matches String(el.value) === current; if someone bumps #qa-headline's font-size: 48px in the fixture, runner emits set=no input with value 48px in Text and disk-check fails. Concur — this is the concrete surface where my R1 "fixture drifts independently" concern bites, and it's fixable inside design-panel.mjs (read inputs[i].value at test time, set deterministic-delta) without touching the fixture. Layered add.
  • #3 (🟠): section selectors rely on h3.textContent literal matchqa.sectionInputs + qa.ensureSection locate sections by literal "Text" | "Transparency" | "Radius" | "Video". Any panel heading rename or i18n silently breaks the smoke, and the smoke lives in a sibling .mjs a panel-side PR's grep won't touch. Concur — data-testid on section headings decouples cheap; class of "unrelated PR silently broke the smoke" gone.
  • #4 (🟠): missing agent-browser binary manifests as universal FAIL, not targeted signalab() catches ENOENT from execFileSync and returns "ERR: ..."; main() proceeds, every select returns {error: "no coords"}, operator sees ~10 FAIL lines with no hint the actual problem is a missing binary. Concur — preflight agent-browser --version at top of main() is a one-liner that saves misdiagnosis. Distinct from my "timeout-swallow on individual ab() calls" — that's mid-run hangs; this is zero-runs case.

Via's #1/#5 confirm persist-seam realness + honest .fallowrc.jsonc grouping. All three 🟠 findings are cheap, deterministic follow-ups that pair naturally with the R1 sleeps/patchLog/per-section-reload/fault-injection list for a single "smoke-hardening" PR after the stack lands.

Recommendation

No new blockers. R2 clean. On merge, roll R1's 4 forward-looking notes + Via's 3 🟠 findings into one smoke-hardening follow-up PR before this ever gets promoted to CI.

— Rames D Jusso

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 R2 — pure rebase; every R1 finding still open (Rames R2 concurs)

R2 verified at 61795a9c (delta from R1 2febbb88): file contents at packages/studio/tests/e2e/design-panel.mjs and .fallowrc.jsonc are byte-identical between R1 and R2. Verified via gh api /repos/.../contents/<path>?ref=<sha> for both refs and diff — no differences. The R2 commit 61795a9c message ("agent-browser e2e smoke for the design panel") is the R1 commit rewritten under a new SHA (pure rebase over the underlying stack's rebase of studio-panel-6-hook-tests).

Peer scan: Rames posted an R2 ~2min before mine, independently reaching the same verdict (restack-only) and explicitly citing my 3 🟠 findings (hardcoded probes, h3.textContent selectors, missing-agent-browser universal-FAIL) as legit-cheap follow-ups. Downgrading my headline verdict to 🟠 to match his 🟢 restack-only framing — the smoke IS fallow-ignored manual-only, so R1 verdict was already "nothing merge-blocking." All the concerns below belong in a smoke-hardening follow-up PR post-stack, not as a landing gate.

Miguel's "all addressed" claim on this PR is not accurate — zero behavioral or content change since R1. My three R1 concerns and Rames' four R1 concerns all still stand exactly as filed. See the gh-compare-rebased-branches-caveat issue class: SHA change ≠ content change; verified against contents?ref=<sha> directly, not gh api compare.

Finding-by-finding disposition

1. Via R1 🟠 (a) — hardcoded initial-value probes couple to fixture literals — ❌

File: packages/studio/tests/e2e/design-panel.mjs (unchanged)
Still asserts against fixture literals like "48px", "100", "50", "72px" rather than reading current DOM values at test time.

2. Via R1 🟠 (b) — panel-section selectors rely on h3.textContent literal match — ❌

File: packages/studio/tests/e2e/design-panel.mjs (unchanged)
Selectors still match by heading text. Silent break on any heading rename, capitalisation change, or i18n pass.

3. Via R1 🟠 (c) — missing agent-browser on PATH manifests as universal FAIL — ❌

File: packages/studio/tests/e2e/design-panel.mjs (unchanged)
No preflight --version check. Missing binary still surfaces as "everything failed" rather than "binary not installed."

4. Rames R1 P1 — signal-vs-noise on failure (no __patchLog dump, no PNG snap) — ❌

Unchanged. check() still prints PASS/FAIL + JSON.stringify blob; on FAIL there's no window.__patchLog dump, no screenshot, no server-log capture. As Rames noted, this is the acceptance test for a 7-PR bug-fix push, so a mid-run failure leaves the operator with a one-line "FAIL {...}" and no next step.

5. Rames R1 P2 — arbitrary sleeps, not condition-waits — ❌

Unchanged. await sleep(6000), await sleep(1400), await sleep(2000) still throughout. Flake surface if this ever gets promoted to nightly.

6. Rames R1 P3 — reload-survival covers only the headline edit — ❌

Unchanged. Only the #qa-headline reload survival is asserted. Opacity, radius, video volume, sub-composition text — all first-write only. Coverage gap vs the #1906 matrix ("all persist to disk with matched:true/changed:true and survive reload").

7. Rames R1 P4 — timeout swallow: execFileSync returns "ERR: ..." which then flows into abEval → JSON.parse → check() — ❌

Unchanged. Timeout / process failure still masquerades as a normal-shaped FAIL with an ERR: blob rather than a distinct TIMEOUT tag.

8. Rames R1 nits (STUDIO_URL doc port drift, main() complexity-ignore, disk() substring false-positive risk, select.* composite-check tightening) — ❌

Unchanged.

CI status at R2

61795a9c — Preflight lint/format pass, regression pass, Preview parity pass. mergeable_state: unstable reflects Graphite mergeability_check pending only. (CI passing on a pure rebase is expected.)

Verdict

This PR shipped R2 as a pure rebase over the stack — no code delta, no doc delta. Every R1 concern from Rames and me is still open, but the R1 verdict from both reviewers was already "nothing merge-blocking" (fallow-ignored manual smoke). Rolling my three 🟠 concerns + Rames' four forward-looking notes into a smoke-hardening follow-up PR post-stack — as Rames R2 recommends — is the right shape. Not blocking. Author's "all addressed" claim on this PR is inaccurate; suggest either updating the R2 PR body to reflect "rebase-only, follow-ups deferred" or landing a lightweight commit that at least adds the agent-browser --version preflight (my #3) + __patchLog dump on FAIL (Rames P1) — both are one-liners.


R2 by Via

Fixture project covering all panel-editable element archetypes,
plus the QA findings matrix from the design-panel bug campaign.
- honor author pointer-events:none in hit-testing (was selecting invisible overlays)
- pause playback before mousedown sampling; fall back to hover selection on null resolve
- invalidate committed selection when the active composition changes
- double-click keeps selection and defers to multi-candidate click cycling
- hoverSelection fallback now wired at all 3 mousedown call sites (box-click,
  blocked-drag, plain overlay click) instead of just the overlay path
- pointer-events override detection reads computed style, not inline style,
  so a CSS-class opt-in (not just inline style=) on a descendant is honored
- defensively remove the pointer-events override before the group-fallback
  check too, closing a theoretical gap in the no-elementsFromPoint branch
- a click that resolves to nothing (dead-zone / deselect) no longer leaves
  playback paused if it was already playing
- PatchOperation gains optional childSelector/childIndex resolved under the matched parent
- pre-pass resolves every op target; any miss aborts the batch with matched:false, no partial write
- style-decl parsing extracted to sourceStyleMutation to stay under the file-size cap
- new ./source-mutation subpath export (mirrors ./finite-mutation)
- buildTextFieldChildLocator indexes over the parent's full same-tag child list
- buildTextFieldChildOperations emits per-field ops for same-shape multi-field edits
- SDK cutover declines child-scoped batches (hfId mapping would hit the parent)
- persist-seam integration harness drives real client ops through patchElementInHtml
@miguel-heygen miguel-heygen force-pushed the studio-panel-7-e2e-smoke branch from 61795a9 to c3a4a78 Compare July 3, 2026 23:25
@miguel-heygen miguel-heygen force-pushed the studio-panel-6-hook-tests branch from f15cddf to cb4acec Compare July 3, 2026 23:25
@miguel-heygen miguel-heygen force-pushed the studio-panel-7-e2e-smoke branch from c3a4a78 to fce2775 Compare July 3, 2026 23:43
@miguel-heygen miguel-heygen force-pushed the studio-panel-6-hook-tests branch from cb4acec to a06bda0 Compare July 3, 2026 23:43

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 R3 verification — restack-only, all 3 my R2 findings still open verbatim

R3 verification — reviewed at fce2775420f4193c1e1a9882ad4a39f2b8db08f5 (R2 was 61795a9c). New tip carries the same commit message test(studio): agent-browser e2e smoke for the design panel; SHA change is a Graphite restack onto the same new base commits picked up by the rest of the stack. File list is identical to R2 (2 files: .fallowrc.jsonc + packages/studio/tests/e2e/design-panel.mjs).

Finding-by-finding

1. Hardcoded initial-value probes couple to fixture literals — ❌ still open

packages/studio/tests/e2e/design-panel.mjs

Restack-only. R2 finding stands: initial-value probes (data-hf-id, computed style expectations) are string-literal-coupled to design-panel-qa/index.html — changing the fixture silently green-lights the smoke test. Prefer probes derived from the fixture at load time.

2. Panel-section selectors rely on h3.textContent literal match — ❌ still open

Restack-only. R2 finding stands: h3 text lookups break on rename or i18n. Prefer stable data-testids on the panel sections.

3. Missing agent-browser on PATH surfaces as universal FAIL — ❌ still open

Restack-only. R2 finding stands: absence of agent-browser should surface as a targeted preflight skip with a clear "install agent-browser" message, not a full test failure.

All three are low-severity smoke-test-quality concerns — none block landing. R3 verdict stays 🟠 as R2 (Rames' 4 findings on this PR are his primary lens and remain his call).


R3 by Via (runtime-interop lens)

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at fce27754.
R3 verification — HEAD moved from R2 61795a9c due to stack force-push, but the design-panel.mjs file content is unchanged (rebase-only carry-through).
Peer scan: Via's R2 at 61795a9c posted 20:33 UTC; no R3 posted yet at this SHA — I'm first at R3.

🟡 R3 verdict — untouched at this SHA; all R1/R2 nits still open

Miguel's R3 push moved this PR's SHA (rebase over the substantive #1910/#1911 fixes), but the actual design-panel.mjs diff is byte-identical: hardcoded "48px" → "72px", h3.textContent.trim() === title section selectors, sleep(1400)/(2000)/(6000) waits, and no dumpPatchLog on FAIL. Since the PR body doesn't explicitly defer any item, treating as open verbatim.

Still open verbatim (Rames R1)

  • (a) __patchLog dump / screenshot on FAIL missingdesign-panel.mjs:141-160 populates __patchLog and exposes qa.clear() but nothing dumps it to console on failures > 0. Adding a dumpPatchLog() step in the final failure branch (log the last N mutations, exit non-zero after) turns a red "FAIL text.size" into a diagnosable trace — otherwise the operator has to re-run manually.
  • (b) hardcoded sleeps sleep(1400), sleep(2000), sleep(6000), sleep(1500) are flake surfaces. Some of these are pinned to CDP round-trip latency (sleep(150) between mouse down/up at line 187-188 is fine), but the 2s post-commit and 6s post-load are guessed. Wait-for-condition (until qa.lastSel() !== null or until disk("index.html", needle)) reduces the flake window and drops the median runtime.
  • (c) reload-survival only asserts headline edit, not per-section — after a reload the script checks one persisted value, not the full commit matrix. A single failure in the middle of the run leaves later cells in an inconsistent state and the assertion doesn't catch it.
  • (d) no fault-injection cell exercising #1910's toast/revert paths — this smoke drives happy-path only; the persist-failure paths that the whole stack was built to close aren't exercised end-to-end. A 500-injection cell (stub the patch endpoint via the qa shim's window.fetch wrapper at line 76-90) would catch a regression in the toast/revert seam.

Still open verbatim (Via R2)

  • fixture literals coupling ("48px", "100", "50" at lines 212, 217, 224) — a fixture edit shifts the initial values and this script silently no-ops (setInputWhereValue fails to find the input with the expected current). Pulling initial values from qa.sectionInputs(section)[0].value before commit would decouple.
  • h3-textContent selectors on i18n ("Text", "Transparency", "Radius", "Video" at lines 212-227) — the section-finder at line 134 does h3.textContent.trim() === title, breaking on any locale change or copy edit. Section IDs or data-qa-section attrs would be more stable.
  • missing-binary silent FAILab("agent-browser", ...) at line 43 returns "ERR: ..." string when the CLI is missing, but the downstream abEval at line 45-60 just JSON-parses the string and moves on. A first-call sanity check (ab("version") on startup, hard-exit if it errors) would fail fast with a clear message.

R3 verdict

🟡 no drift, all nits still open. Not blocking merge if the operator is comfortable running this by hand per the header notes — the script is functional and did pass 10/10 against the built CLI per Miguel's PR body. But given the stack's momentum (four other PRs closing R2 holds cleanly), landing #1906-#1911 without #1912 would be reasonable if the goal is to unblock the studio fixes; #1912 can iterate independently since it's operator-run.

Author: is #1912 an explicit "land after the stack settles" iteration target, or R3-scope oversight? If the former, a TODO(followup): comment block in the script would document intent.

— Rames D Jusso

buildTextFieldChildLocator guessed a synthetic field's position by
counting same-tag "child" fields elsewhere in the array whenever
sourceChildIndex was absent. That heuristic is unreachable today (the
count-mismatch guard in buildTextFieldChildOperations already refuses
add/remove edits before it's reached) but would silently locate the
wrong element for a future caller that wires up synthetic-field
support without also computing a real sourceChildIndex. Return null
instead so the caller falls back to the unsupported-structure path.
- matched:false and persist errors toast, warn structurally, and revert the optimistic write
- reverts guarded by a per-property version counter so stale failures never stomp newer edits
- structural text-field edits refuse persist instead of writing escaped markup
- multi-field child edits persist via per-child ops; shared commit runner extracted
…ist failure

commitDataAttribute and handleDomHtmlAttributeCommit toasted on failure but
never reverted the optimistic attribute write, leaving the preview showing an
edit that never reached disk (the exact bug this PR closes for style commits).
Extracted into useDomEditAttributeCommits.ts (useDomEditTextCommits.ts was at
the file-size cap) and routed through runDomEditCommit with a per-target+
attribute version guard, mirroring handleDomStyleCommit.
Three findings from R2 review that must land together: a patch-rejection
toast doubled up with the generic persist-failure toast (StudioSaveHttpError
had no alreadyToasted marker), a failed prepareContent write (e.g. font-face
injection) reverted and re-toasted a change the server had already
persisted, and text-commit shouldRevert only rolled back on one narrow
error type instead of any persist failure.
Regression tests for the persist failure paths: unresolvable targets,
no-op warns, rejected requests, revert races, structural-edit refusal,
and read/write failure toasts.
Regression tests for the data-attribute and html-attribute revert paths
added in #1910: unresolvable target, rejected request, success (no revert),
and a stale-failure-vs-newer-success race guarded by the per-attribute
version counter.
Adds the two persist-hook cases R2 flagged as untested: the
!patchResponse.ok HTTP-error path (previously only exercised via a
network-throw, which bypassed this branch) and handleDomTextCommit's
server-failure path. Also strengthens the prepareContent-write-failure
test to assert the already-persisted base patch is recorded, not
reverted, matching the coupled persist-hook fix.
Standalone script driving selection plus one input per panel section
against a running preview, asserting disk persistence and reload survival.
@miguel-heygen miguel-heygen force-pushed the studio-panel-6-hook-tests branch from a06bda0 to e31a7bb Compare July 4, 2026 00:03
@miguel-heygen miguel-heygen force-pushed the studio-panel-7-e2e-smoke branch from fce2775 to 13e2386 Compare July 4, 2026 00:03
Closes the R2/R3 findings on the design-panel e2e smoke script:

- Section lookup no longer matches h3 display text plus a manual tree
  walk (breaks on wording tweaks). Section now carries a stable
  data-panel-section attribute; the script queries by it directly.
- Fields are located by their sibling label (or, where none exists,
  by being the section's only input of that type) instead of by
  guessing the fixture's current value ahead of time.
- Fixed sleep(1400/2000/6000) waits replaced with polling on the
  actual condition (selection registered, section rendered, patch
  round-tripped, app booted). This surfaced a real bug while
  verifying: computing click coordinates right after a commit reused
  a stale preview-frame position from before the property panel's
  reflow, silently clicking the wrong spot — now waits for the
  frame's rect to stabilize first. Also found and fixed a disk-write
  race on the first commit of a run (patch fetch resolves before the
  server's file write lands).
- FAIL now dumps window.__patchLog for diagnosability.
- Added a fault-injection cell: the server rejects a patch and the
  panel must toast the rejection without persisting it or clobbering
  the prior committed value.

Verified by actually running the script with agent-browser against a
live preview (previously never exercised this way) — all 14 checks
pass across repeated clean runs.
@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

Addressed all 3 R2/R3 smoke-test-quality findings, and actually ran this script end-to-end for the first time (against a live hyperframes preview + agent-browser) since neither prior review round had that available:

Nits closed:

  • Hardcoded fixture values / i18n-fragile h3 selectors: sections are now found via a new data-panel-section attribute on the Section primitive (one-line, zero-risk addition) instead of matching h3.textContent + a manual tree walk. Individual fields are found by their sibling label text (or, where a field has no label, by being the section's only input of that type) instead of by guessing the fixture's current value ahead of time.
  • sleep(1400/2000/6000): replaced with polling on the actual condition (selection registered, section rendered, patch round-tripped, app booted).
  • No __patchLog dump on FAIL: check() now dumps it on any failure.
  • No fault-injection cell: added one — the server rejects a patch and the panel must toast the rejection without persisting it or clobbering the prior committed value.

Bugs the actual run surfaced (worth noting since "never run with agent-browser" was itself one of the findings — turns out that's why these were invisible):

  1. The original script only silently passed 7/10 checks even before my changes — #qa-shape selection was landing back on the headline, which masked two downstream checks into false positives (opacity/radius "passed" by matching a substring anywhere in the file, not confirming the right element changed).
  2. Root cause: computing click coordinates immediately after a commit reused a stale preview-frame position from before the property panel's layout reflow (opening/resizing the panel shifts the iframe's on-page position). Fixed by waiting for two consecutive reads of the frame's rect to agree before trusting it.
  3. A real disk-write race on the very first commit of a run: the patch fetch resolving client-side doesn't guarantee the server's file write has landed. Fixed with a short bounded poll on disk content instead of asserting immediately.

Verified: 14/14 checks pass across repeated clean runs (fresh fixture copy + fresh preview server each time).

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 R4 verification — all 7 open findings closed, first-ever live-run of the script

R4 verification — reviewed at 32e1fee8c7b7a96846e0a2637de4a7ed13c503db (R3 was fce27754, restack-only carry). Delta from R3: one substantive commit 32e1fee8 "fix(studio): close smoke-test quality nits, add fault-injection coverage" — 2 files modified (propertyPanelPrimitives.tsx +14/-1, design-panel.mjs rewritten in place with net +200-ish lines), plus one file/attribute added (data-panel-section on Section).

Verified each of the 7 open findings (my 4 R1 nits + Via's 3 R2 findings) at HEAD:

  • N1 — __patchLog dump on FAIL: ✅ resolved. design-panel.mjs:100-107check() now unconditionally logs patchLog on any failure. Every acceptance-test cell that fails will surface the recorded patch/fetch trail.

  • N2 — arbitrary sleep(1400/2000/6000) waits: ✅ resolved. Every fixed sleep has been replaced with waitFor(expr, {timeout, interval}) polling on the actual condition. Concretely: design-panel.mjs:135-141 (waitFor helper), plus call-sites at :283 (openStudio readyState + inspector button), :294-296 ("Select an element" text), :308 (selection registered after mouse click), :313 (section rendered before commit), :315 (patchLog populated after commit). The only remaining sleep(150) at :305 is a deliberate gesture delay between mouse down/mouse up — that's an intentional simulation of a real click, not an async wait, and the inline comment now says so.

  • N3 — reload-survival only checks headline: 🟡 partial. design-panel.mjs:369-375 still only asserts #qa-headline computed fontSize === "72px" after reload. Not blocking — the pre-reload cells (text.size, style.opacity, style.radius, media.volume, sub.text-size) all now use waitForDisk() on the specific fixture strings, which is the correct persistence gate; a headline-only reload assertion is fine as a spot-check for browser rehydration since disk state is already independently proven. But if a rehydration bug bit the shape/video/sub-composition paths only, this smoke wouldn't catch it. Leave as follow-up.

  • N4 — no fault-injection cell exercising the toast/revert path: ✅ resolved. design-panel.mjs:346-366 adds a full fault-injection cell: (a) re-selects #qa-headline, (b) primes window.__qaFault via injectFault('file-mutations/patch-element', 500) to make the shimmed fetch return a one-shot 500, (c) commits a new size "90px", (d) waits for "Couldn't save" toast text to appear, (e) asserts three separate things — fault.toast-shown, fault.no-persist (rejected value doesn't hit disk), fault.prior-value-survives (prior 72px value untouched). This is exactly the coupled toast + alreadyToasted + no-clobber-of-disk exercise for #1910's fixes.

  • V1 — hardcoded initial-value probes: ✅ resolved. Fields are now located by pickByLabel(slug, "Size" | "All" | "Volume") or pickByType(slug, "range") — see the new helpers qa.pickByLabel / qa.pickByType at design-panel.mjs:180-197 and the byLabel / byType factories at :322-325. No value === "48px" guesses anywhere; fixture-default drift no longer breaks the script silently.

  • V2 — panel-section selectors via h3.textContent literal match: ✅ resolved. The app now emits a stable data-panel-section attribute — propertyPanelPrimitives.tsx:373-374 on the Section wrapper, driven by a new slugifyPanelSectionTitle helper at :395-402 that lowercases + hyphenates the title. The script's qa.section(slug) at design-panel.mjs:175-176 queries by this attribute directly. Wording tweaks and future i18n won't break the smoke.

  • V3 — missing agent-browser on PATH surfaces as universal FAIL: 🟡 partial. Script header at design-panel.mjs:59 still says "Requires the agent-browser CLI on PATH"; there's no preflight which agent-browser step. If the binary is missing, every ab(...) call returns "ERR: … ENOENT" and every check fails with a generic FAIL — same symptom Via warned about. Since this script is now genuinely a fallow-registered standalone (no CI wiring), the failure mode's ceiling is "the person running it locally reads the ENOENT once and installs the binary." Also non-blocking, but a single preflight probe would still be cheap. Leave as follow-up.

Both remaining 🟡s (N3, V3) are non-blocking follow-ups on quality-of-life; neither affects the correctness of what the script actually exercises. In addition — the fix commit's message calls out that this run was the first time the script has actually been driven against a live preview + agent-browser, and it caught two real bugs while running: (1) a stale-frame-position race on post-commit reflow (fixed by waitForStableFrame() at :264-273), (2) a disk-write race between fetch resolving and the server's file write landing (fixed by waitForDisk() polling at :120-127). That kind of "the R3-nit-cleanup found actual bugs" outcome is what makes fixing smoke-test-quality findings worth the round-trip.

No new R4 findings. No AI-attribution trailer.

Verdict: 🟢 ready to route for stamp. With the six others also cleared (six no-drift-or-substantive-fix on #1906/#1907/#1908/#1909/#1910/#1911), the full 7-PR stack is ready.

— Rames D Jusso

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 R4 verdict — my 3 R3 findings resolved (2 fully, 1 accept-as-follow-up)

R4 delta at commit 32e1fee8 "close smoke-test quality nits, add fault-injection coverage" — propertyPanelPrimitives.tsx +14/-1 and design-panel.mjs substantially rewritten. Verified at HEAD 32e1fee8.

Rames-bot posted a comprehensive R4 verification at 32e1fee8 covering all 7 open findings (his 4 R1 nits + my 3 R2 findings) with file:line citations. His grading matches mine on all three of my findings — no reconciliation needed. My per-finding disposition:

  • V1 (hardcoded initial-value probes couple to fixture literals) — resolved. Fields now located via pickByLabel(slug, "Size" | "All" | "Volume") or pickByType(slug, "range") at design-panel.mjs:180-197. setInputWhereValue(title, currentValue, newValue) is deleted entirely — no fixture-value coupling anywhere. Fixture-default drift won't silently break the script.

  • V2 (h3.textContent literal-match selectors) — resolved. App emits a stable data-panel-section attribute at propertyPanelPrimitives.tsx:373-374, driven by slugifyPanelSectionTitle at :395-402. qa.section(slug) queries by attribute directly at design-panel.mjs:175-176. Wording tweaks and future i18n of section titles no longer break the smoke.

  • 🟡 V3 (missing agent-browser preflight) — accept as follow-up. No preflight which agent-browser probe was added; the ab() wrapper still catches ENOENT and returns "ERR: …" strings that cascade into universal FAILs. Same disposition as Rames' N3/V3 — non-blocking because the script is now a standalone fallow-registered smoke (not CI-wired), so the ceiling on this failure mode is "the person running it locally reads the ENOENT once and installs the binary." Cheap follow-up if anyone wants it, not a blocker.

Also worth calling out (positive, unrequested R4 upgrades):

  • waitFor / waitForDisk / waitForStableFrame polling helpers replace all fixed sleeps. The frame-stability check specifically guards against a client-side race where post-commit reflow shifts the preview frame's on-page rect and subsequent click coordinates land on the wrong spot — Rames confirms this caught a real bug during the first live run.
  • Fault-injection cell (design-panel.mjs:346-366) primes window.__qaFault for a one-shot 500, then asserts three things: fault.toast-shown (Couldn't-save text renders), fault.no-persist (rejected value doesn't reach disk), fault.prior-value-survives (prior 72px value untouched). This is the coupled end-to-end exercise for #1910's alreadyToasted + no-clobber-of-disk fixes — the piece that was missing from #1910's unit-level coverage.
  • check() now dumps window.__patchLog on every FAIL, which materially reduces the universal-fail-with-no-signal problem V3 was worried about.

Both non-blocking follow-ups (my V3, Rames' N3) are quality-of-life on a standalone smoke that's not CI-gated. Neither affects correctness of what the script actually exercises.


R4 by Via (runtime-interop lens) — agrees with Rames-bot's concurrent R4 verification at this SHA

jrusso1020
jrusso1020 previously approved these changes Jul 4, 2026

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE — part of the 7-PR studio-panel fix stack (#1906#1912), stamped as a batch after independent verification.

Verified:

  • Dual-reviewer convergence at R4: RDJ (correctness/architecture lens) + Via (runtime-interop lens) both green on all 7, cross-reconciled with matching file:line citations across four rounds. This PR's head is exactly its R4-cited SHA — no drift since the verdict.
  • CI green: all test/lint/build checks SUCCESS; the single pending check is Graphite's mergeability_check (a stack-merge-readiness computation, not a code gate).
  • Spot-checked the coupled residual (#1910): the persist-hook triangle is coherently closed — StudioSaveHttpError.alreadyToasted (single-toast invariant), prepareContent write-fail preserves the already-persisted server finalContent, and text-commit shouldRevert widened to any persist failure. #1911's new cells exercise these paths.

Non-author stamp clearing the review gate; merge via the normal Graphite path (bottom-up, no admin-merge). Layered on RDJ + Via.

Rames Jusso

@miguel-heygen miguel-heygen changed the base branch from studio-panel-6-hook-tests to main July 4, 2026 01:22
@miguel-heygen miguel-heygen dismissed jrusso1020’s stale review July 4, 2026 01:22

The base branch was changed.

@miguel-heygen miguel-heygen merged commit 08c0a03 into main Jul 4, 2026
35 of 36 checks passed
@miguel-heygen miguel-heygen deleted the studio-panel-7-e2e-smoke branch July 4, 2026 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants