test(studio): agent-browser e2e smoke for the design panel#1912
Conversation
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Reviewed at 2febbb88f577a696c7f51fbf99789e66f354c282.
Peer scan: no prior human reviews at review time; layering as second-lens on this stack tip (Via holds runtime-interop lane).
Stack context: base is studio-panel-6-hook-tests (#1911); this is the stack tip. Cross-stack notes below.
Summary — Adds packages/studio/tests/e2e/design-panel.mjs (251 lines): a standalone Node script that drives an external agent-browser CDP binary against a running hyperframes preview of the #1906 QA fixture. Injects a window.__qa helper into the studio's top document, shims window.fetch to capture mutation traffic, enables the Inspector, then per-archetype: (a) selects an element via real CDP mouse coordinates into the shadow-DOM iframe hosting the composition, (b) expands the relevant panel section and mutates one representative input using synthesized input/change/pointerup/Enter/blur events, (c) waits, (d) asserts disk state via readFileSync against PROJECT_DIR. Also asserts one reload-survival cell. Registered as a fallow-ignore entry — no import-graph consumer; it's a human-run script. Non-zero exit on any failed cell.
Nice piece of work — the coordinate-clicking-into-shadow-DOM + fetch-shim pattern is the right shape for what unit tests structurally cannot cover (real hit-testing + real persist round-trip), and the header docstring is unusually honest about the automation gotchas (Inspector-off-by-default, off-viewport-ref-clicks-hit-helper-buttons, range vs enter/blur event asymmetry). Registration as a fallow-ignore is correctly scoped.
Concerns
Nothing merge-blocking; a few forward-looking notes:
- Signal-vs-noise on failure:
check()printsPASS/FAIL+ aJSON.stringify(s)orset=<r>blob and increments a counter. On the FAIL path there's no screenshot, no DOM dump, no network-log dump ofwindow.__patchLog, no server log capture. Given this is the acceptance test for a 7-PR bug-fix push, a failure mid-run leaves the operator with "select.video FAIL {...}" and no next step. Consider (follow-up): on any FAIL, dumpwindow.__patchLog(already sitting in-page) to stderr and, ifagent-browsersupports it, snap a PNG. Cheap; converts "the smoke failed" into "the smoke failed and here's what the last mutation call returned". - Timing is arbitrary sleeps, not condition-waits:
await sleep(6000)after page open,await sleep(1400)after each mouse click,await sleep(2000)after each commit. This is pragmatic (agent-browser doesn't expose awaitForPredicate) but every one of these is a flake surface — under CI load or a slow machine the 2s post-commit may not be enough for the disk write to land, anddisk()reads a stale file. The workaround today is "run by hand" (fallow-ignore, no CI gate). If this ever gets promoted to nightly, replace at least the post-commit sleeps with a poll on__patchLogfor the expected mutation response before hitting the filesystem. Flagging early because the header already invites this promotion path. - Reload-survival covers only the headline edit (
72pxon#qa-headline). Every other edit — opacity, radius, video volume, sub-composition text — is asserted only at first-write, not after a reload cycle. That's a coverage gap vs the matrix in#1906(which documents "all persist to disk with matched:true/changed:true and survive reload"). Given the whole point of the stack is the persist seam, at least one per-section reload-survival check (e.g. after the sub-composition edit, since that touches a separate file) would tighten the acceptance shape. - Timeout swallow:
execFileSync("agent-browser", args, { timeout: 30000 })— on timeout it returns"ERR: ..."from thecatch. That string then flows intoabEvalwhich triesJSON.parse, fails, and returns the raw ERR string.check()receives that ass(an object-shaped thing that's actually a string) and evaluatess.selector === "#qa-headline"asundefined === "..." → false → FAIL. So a hung agent-browser call does produce a FAIL — good — but the FAIL detail then says something likeset=ERR: ...rather than "TIMEOUT". A one-line check forString(r).startsWith("ERR:")before proceeding, with a distinct log tag, would save operator time.
Nits
- The header comment says
STUDIO_URL=http://localhost:3002— worth mirroring the CLI's actual default port at head to avoid a stale-doc gotcha (if the CLI later switches ports). Not load-bearing, but the header is otherwise very precise. main()is annotated// fallow-ignore-next-line complexity— that's honest, but the function does read as two logical halves (setup + assertions loop). If you ever revisit, factoring the archetype loop into an array-of-cells iteration would drop the complexity mark naturally and make it easier to add cells.disk()matches by substring —readFileSync(...).includes("font-size: 72px"). This is fine for the current test, but note that if the fixture happens to already containfont-size: 72pxanywhere else (or if a prior run left it in the scratch dir), the assertion is a false positive. The script's "copy to scratch dir outside the repo" instruction guards against this, but a fresh-scratch-per-run precondition should be in the usage header.select.sub-titleasserts boths.selector === "#qa-sub-title"ands.src === "compositions/qa-sub.html"— good composite check. The otherselect.*cells only assertselector. If sub-composition source-file drift is a known bucket (per the#1906matrix S0.3), consider tightening the shape cells to asserts.src == null(i.e. master-view selections carry nosourceFile) so a regression that leaks a stale source-file is caught here too.
Cross-stack interactions
#1906— this PR is the exclusive consumer of the QA fixture. Selectors#qa-headline,#qa-shape,#qa-video,#qa-sub-titleand thehyperframes.jsonshape are all#1906's contract. Renaming anydata-hf-idthere silently breaks this script — no in-tree wiring catches it. That's the tradeoff for keeping the fixture out of the import graph, and it's the right call for a manual smoke, but flag it in both PRs' bodies for future maintainers.#1907(canvas selection) — theselect.*cells (headline, shape, video, sub-title) are the acceptance criteria for this fix. If#1907were to regress the coord-click → element resolution,select.headlinefails first. High-value coverage of the stack's core fix.#1908/#1909(patch-op / persist-seam) — thedisk(...)assertions verify the fullpatch-element→ linkedom mutation → disk write chain. That's the acceptance test for these two PRs together. Also high-value.#1910(toast + guarded revert) — the smoke does not exercise the failure path; there's no fault-injection cell (e.g. force amatched:falsefrom the server and assert the toast fires + optimistic value reverts). That's #1910's core value-add, so this smoke undercovers it. Not a blocker (#1911's hook tests cover the failure path in JSDOM), but worth noting the smoke's asymmetric coverage across the stack.#1911— orthogonal (JSDOM hook tests). Complementary coverage.
Questions
- Any plan for CI gating in the future (nightly? tag-triggered?), or is this intentionally a permanent manual smoke? If nightly is on the roadmap, the sleeps → poll-on-patch-log migration becomes real work; if permanent-manual, current shape is fine.
- The
#1910failure-path gap above — was the intent to cover that separately (a companion "unhappy-path smoke" PR) or is#1911's hook coverage considered sufficient? Both are defensible; just want to know which. - Does
agent-browserhave a way to attach the current CDP target's ID / port to logs on failure? Would tighten forensics without a code change here.
What I didn't verify
- I did not run
agent-browserlocally to reproduce the 10/10 pass the PR body cites. - I did not confirm every synthetic-event shape (
inputvschangevspointerupvs Enter/blur) is what the actual studio input handlers require — the header says these were learned empirically, and I'm taking that on faith. - The
.fallowrc.jsoncfallow-ignore pattern semantics: I skimmed the surrounding entries and this one matches the existing shape, but I did not read the fallow tool's config parser.
— Rames D Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
🟠 Smoke faithfully exercises the persist-seam; fixture-value coupling + heading-text selectors + missing-binary signalling are the runtime-interop soft spots (Rames-complementary)
Reviewed at 2febbb88f577a696c7f51fbf99789e66f354c282. Rames covered failure-signal richness, fixed-sleep flake surface, reload-survival coverage gap, disk() substring false-positives, cross-stack #1910 failure-path gap — I concur on all of those and won't restate them here.
Scope-wise this is a real smoke, not a decorative one: select() fires actual CDP mouse events at canvas coordinates (matches the S0.1 hit-testing fix path — real hit-testing, not element-ref-click that landed on sidebar helper buttons); commit() drives one representative input per panel section through the input/change/pointerup/Enter/blur sequence the panel actually listens for; each assertion pair is HTTP-persist-then-disk-content, with cross-file persist to compositions/qa-sub.html proving the shape contract from the #1906 matrix. The upstream stack's persist seam is genuinely exercised. Below are the runtime-interop mechanism concerns Rames didn't cover — coupling to fixture initial values, heading-text selectors, and the missing-binary signal that turns "agent-browser not on PATH" into a matrix of red without a targeted message.
Findings (runtime-interop lens)
1. Smoke really exercises the persist-seam — 🟢
File: packages/studio/tests/e2e/design-panel.mjs:172-215
Five archetypes covered (headline text-size, shape opacity + radius, video volume, sub-composition text-size), each verified via disk("index.html", ...) or disk("compositions/qa-sub.html", ...). Cross-file persist is directly asserted (sub.text-size writes to compositions/qa-sub.html, not index.html). Not a page-load smoke.
2. Hardcoded initial-value probes couple the smoke to the fixture's numeric literals — 🟠
File: packages/studio/tests/e2e/design-panel.mjs:206 ("48px","72px"), :210 ("100","80"), :212 ("16","24"), :216 ("50","80"), :227 ("36px","48px")
qa.setInputWhereValue(section, current, value) locates the input by matching String(el.value) === current. If someone edits #qa-headline's font-size: 48px in the fixture — which they will; fixtures evolve — the runner reports set=no input with value 48px in Text and the disk() check trivially fails. The failure is at least loud (not silent), but the coupling is unnecessary. Suggested follow-up: read the current value from the DOM (inputs[i].value) at test time, then set to a deterministic-delta value. Removes a whole rot vector at zero cost.
Related to Rames' #1906 "fixture drifts independently" observation — this is the concrete surface where that drift bites, and it's fixable inside design-panel.mjs without touching the fixture.
3. Panel-section selectors rely on h3.textContent literal match — 🟠
File: packages/studio/tests/e2e/design-panel.mjs:113-116 (h3 text match), :145-148 (ensureSection by h3 text)
qa.sectionInputs and qa.ensureSection locate panel sections by literal h3.textContent === "Text" | "Transparency" | "Radius" | "Video" .... Any panel-side heading rename or a future i18n pass silently breaks the smoke, and — critically — the smoke lives in a sibling .mjs file that a panel-side PR's grep is unlikely to touch. Not urgent today (panel isn't localized), but a stable data-testid on section headings would decouple the smoke from copy. Cheap to add on the panel side; would remove a whole class of "unrelated PR silently broke the smoke" incidents.
4. Missing agent-browser binary manifests as universal FAIL, not a targeted signal — 🟠
File: packages/studio/tests/e2e/design-panel.mjs:39-45 (ab wrapper catches ENOENT), :169-173 (main setup — no preflight)
Missing PROJECT_DIR exits 2 with a clear message. Missing agent-browser on PATH does not: ab() catches the ENOENT from execFileSync, returns "ERR: ...", and main() proceeds. Every select returns { error: "no coords" }, every check FAILs, and the operator sees ~10 FAIL lines with no hint that the actual problem is a missing binary. Suggested preflight: run agent-browser --version at the top of main(); on any failure, exit 3 with agent-browser not on PATH or non-functional. Same idea would help for a down STUDIO_URL — a fetch(STUDIO_URL) sanity call before injecting HELPERS. Both are one-liners; both save the operator from misdiagnosing a matrix of red as a "the smoke is broken" vs "my environment is broken".
Distinct from Rames' timeout-swallow finding, which is about mid-run hangs on individual agent-browser calls; this is about the zero-runs case where the binary was never resolvable.
5. Fallow ignore is scoped honestly — 🟢
File: .fallowrc.jsonc:39-41
Grouped with the existing "scripts / standalone runners" cluster; the comment names the reason ("run by hand per its header — no import-graph referrer"). Consistent with how the file is used; not smuggling a real CI gate under a manual-runner flag.
Review by Via (runtime-interop lens)
cf015f6 to
f15cddf
Compare
2febbb8 to
61795a9
Compare
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
🟢 R2 verification — restack-only, R1 verdict stands
R2 verification — reviewed at 61795a9c9f1a12162efd4ec9f739173b444bf205 (rebased from R1 2febbb88 onto new base studio-panel-6-hook-tests; content-identical). Blob SHAs at packages/studio/tests/e2e/design-panel.mjs + .fallowrc.jsonc match R1 exactly. R1 verdict (🟢 clean) stands.
Forward-looking notes from R1 — status at current HEAD
- Failure signal thin — no
__patchLogdump / screenshot on FAIL. Unaddressed. R1-noted-not-blocking; cheap to add and load-bearing if this ever graduates to nightly CI. - Arbitrary sleeps (
sleep(1400)/sleep(2000)) — still fixed timeouts, notwaitFor(cond)-shaped. Unaddressed. Flake surface if promoted from manual to CI. - Reload-survival per-section — still only asserts the headline edit; per-section reload coverage undercovers the persist seam the stack fixes. Unaddressed.
- Fault-injection cell for #1910 — no revert-on-persist-failure exercise in the smoke; still covered only by #1911's JSDOM hook tests. Unaddressed.
All 4 were forward-looking in R1, not required for sign-off — noting status here so they land in the stack-follow-up log rather than getting lost in restack churn.
Peer review layered on R1
Via (@vanceingalls) posted a complementary runtime-interop-lens review at 2febbb88 (id 4627860180) — 5 findings including 3 🟠 that add real signal beyond mine:
- #2 (🟠): hardcoded initial-value probes couple smoke to fixture literals —
setInputWhereValue(section, current, value)matchesString(el.value) === current; if someone bumps#qa-headline'sfont-size: 48pxin the fixture, runner emitsset=no input with value 48px in Textand disk-check fails. Concur — this is the concrete surface where my R1 "fixture drifts independently" concern bites, and it's fixable insidedesign-panel.mjs(readinputs[i].valueat test time, set deterministic-delta) without touching the fixture. Layered add. - #3 (🟠): section selectors rely on
h3.textContentliteral match —qa.sectionInputs+qa.ensureSectionlocate sections by literal"Text" | "Transparency" | "Radius" | "Video". Any panel heading rename or i18n silently breaks the smoke, and the smoke lives in a sibling.mjsa panel-side PR's grep won't touch. Concur —data-testidon section headings decouples cheap; class of "unrelated PR silently broke the smoke" gone. - #4 (🟠): missing
agent-browserbinary manifests as universal FAIL, not targeted signal —ab()catchesENOENTfromexecFileSyncand returns"ERR: ...";main()proceeds, everyselectreturns{error: "no coords"}, operator sees ~10 FAIL lines with no hint the actual problem is a missing binary. Concur — preflightagent-browser --versionat top ofmain()is a one-liner that saves misdiagnosis. Distinct from my "timeout-swallow on individualab()calls" — that's mid-run hangs; this is zero-runs case.
Via's #1/#5 confirm persist-seam realness + honest .fallowrc.jsonc grouping. All three 🟠 findings are cheap, deterministic follow-ups that pair naturally with the R1 sleeps/patchLog/per-section-reload/fault-injection list for a single "smoke-hardening" PR after the stack lands.
Recommendation
No new blockers. R2 clean. On merge, roll R1's 4 forward-looking notes + Via's 3 🟠 findings into one smoke-hardening follow-up PR before this ever gets promoted to CI.
— Rames D Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
🟠 R2 — pure rebase; every R1 finding still open (Rames R2 concurs)
R2 verified at 61795a9c (delta from R1 2febbb88): file contents at packages/studio/tests/e2e/design-panel.mjs and .fallowrc.jsonc are byte-identical between R1 and R2. Verified via gh api /repos/.../contents/<path>?ref=<sha> for both refs and diff — no differences. The R2 commit 61795a9c message ("agent-browser e2e smoke for the design panel") is the R1 commit rewritten under a new SHA (pure rebase over the underlying stack's rebase of studio-panel-6-hook-tests).
Peer scan: Rames posted an R2 ~2min before mine, independently reaching the same verdict (restack-only) and explicitly citing my 3 🟠 findings (hardcoded probes, h3.textContent selectors, missing-agent-browser universal-FAIL) as legit-cheap follow-ups. Downgrading my headline verdict to 🟠 to match his 🟢 restack-only framing — the smoke IS fallow-ignored manual-only, so R1 verdict was already "nothing merge-blocking." All the concerns below belong in a smoke-hardening follow-up PR post-stack, not as a landing gate.
Miguel's "all addressed" claim on this PR is not accurate — zero behavioral or content change since R1. My three R1 concerns and Rames' four R1 concerns all still stand exactly as filed. See the gh-compare-rebased-branches-caveat issue class: SHA change ≠ content change; verified against contents?ref=<sha> directly, not gh api compare.
Finding-by-finding disposition
1. Via R1 🟠 (a) — hardcoded initial-value probes couple to fixture literals — ❌
File: packages/studio/tests/e2e/design-panel.mjs (unchanged)
Still asserts against fixture literals like "48px", "100", "50", "72px" rather than reading current DOM values at test time.
2. Via R1 🟠 (b) — panel-section selectors rely on h3.textContent literal match — ❌
File: packages/studio/tests/e2e/design-panel.mjs (unchanged)
Selectors still match by heading text. Silent break on any heading rename, capitalisation change, or i18n pass.
3. Via R1 🟠 (c) — missing agent-browser on PATH manifests as universal FAIL — ❌
File: packages/studio/tests/e2e/design-panel.mjs (unchanged)
No preflight --version check. Missing binary still surfaces as "everything failed" rather than "binary not installed."
4. Rames R1 P1 — signal-vs-noise on failure (no __patchLog dump, no PNG snap) — ❌
Unchanged. check() still prints PASS/FAIL + JSON.stringify blob; on FAIL there's no window.__patchLog dump, no screenshot, no server-log capture. As Rames noted, this is the acceptance test for a 7-PR bug-fix push, so a mid-run failure leaves the operator with a one-line "FAIL {...}" and no next step.
5. Rames R1 P2 — arbitrary sleeps, not condition-waits — ❌
Unchanged. await sleep(6000), await sleep(1400), await sleep(2000) still throughout. Flake surface if this ever gets promoted to nightly.
6. Rames R1 P3 — reload-survival covers only the headline edit — ❌
Unchanged. Only the #qa-headline reload survival is asserted. Opacity, radius, video volume, sub-composition text — all first-write only. Coverage gap vs the #1906 matrix ("all persist to disk with matched:true/changed:true and survive reload").
7. Rames R1 P4 — timeout swallow: execFileSync returns "ERR: ..." which then flows into abEval → JSON.parse → check() — ❌
Unchanged. Timeout / process failure still masquerades as a normal-shaped FAIL with an ERR: blob rather than a distinct TIMEOUT tag.
8. Rames R1 nits (STUDIO_URL doc port drift, main() complexity-ignore, disk() substring false-positive risk, select.* composite-check tightening) — ❌
Unchanged.
CI status at R2
61795a9c — Preflight lint/format pass, regression pass, Preview parity pass. mergeable_state: unstable reflects Graphite mergeability_check pending only. (CI passing on a pure rebase is expected.)
Verdict
This PR shipped R2 as a pure rebase over the stack — no code delta, no doc delta. Every R1 concern from Rames and me is still open, but the R1 verdict from both reviewers was already "nothing merge-blocking" (fallow-ignored manual smoke). Rolling my three 🟠 concerns + Rames' four forward-looking notes into a smoke-hardening follow-up PR post-stack — as Rames R2 recommends — is the right shape. Not blocking. Author's "all addressed" claim on this PR is inaccurate; suggest either updating the R2 PR body to reflect "rebase-only, follow-ups deferred" or landing a lightweight commit that at least adds the agent-browser --version preflight (my #3) + __patchLog dump on FAIL (Rames P1) — both are one-liners.
R2 by Via
Fixture project covering all panel-editable element archetypes, plus the QA findings matrix from the design-panel bug campaign.
- honor author pointer-events:none in hit-testing (was selecting invisible overlays) - pause playback before mousedown sampling; fall back to hover selection on null resolve - invalidate committed selection when the active composition changes - double-click keeps selection and defers to multi-candidate click cycling
- hoverSelection fallback now wired at all 3 mousedown call sites (box-click, blocked-drag, plain overlay click) instead of just the overlay path - pointer-events override detection reads computed style, not inline style, so a CSS-class opt-in (not just inline style=) on a descendant is honored - defensively remove the pointer-events override before the group-fallback check too, closing a theoretical gap in the no-elementsFromPoint branch - a click that resolves to nothing (dead-zone / deselect) no longer leaves playback paused if it was already playing
- PatchOperation gains optional childSelector/childIndex resolved under the matched parent - pre-pass resolves every op target; any miss aborts the batch with matched:false, no partial write - style-decl parsing extracted to sourceStyleMutation to stay under the file-size cap - new ./source-mutation subpath export (mirrors ./finite-mutation)
- buildTextFieldChildLocator indexes over the parent's full same-tag child list - buildTextFieldChildOperations emits per-field ops for same-shape multi-field edits - SDK cutover declines child-scoped batches (hfId mapping would hit the parent) - persist-seam integration harness drives real client ops through patchElementInHtml
61795a9 to
c3a4a78
Compare
f15cddf to
cb4acec
Compare
c3a4a78 to
fce2775
Compare
cb4acec to
a06bda0
Compare
vanceingalls
left a comment
There was a problem hiding this comment.
🟠 R3 verification — restack-only, all 3 my R2 findings still open verbatim
R3 verification — reviewed at fce2775420f4193c1e1a9882ad4a39f2b8db08f5 (R2 was 61795a9c). New tip carries the same commit message test(studio): agent-browser e2e smoke for the design panel; SHA change is a Graphite restack onto the same new base commits picked up by the rest of the stack. File list is identical to R2 (2 files: .fallowrc.jsonc + packages/studio/tests/e2e/design-panel.mjs).
Finding-by-finding
1. Hardcoded initial-value probes couple to fixture literals — ❌ still open
packages/studio/tests/e2e/design-panel.mjs
Restack-only. R2 finding stands: initial-value probes (data-hf-id, computed style expectations) are string-literal-coupled to design-panel-qa/index.html — changing the fixture silently green-lights the smoke test. Prefer probes derived from the fixture at load time.
2. Panel-section selectors rely on h3.textContent literal match — ❌ still open
Restack-only. R2 finding stands: h3 text lookups break on rename or i18n. Prefer stable data-testids on the panel sections.
3. Missing agent-browser on PATH surfaces as universal FAIL — ❌ still open
Restack-only. R2 finding stands: absence of agent-browser should surface as a targeted preflight skip with a clear "install agent-browser" message, not a full test failure.
All three are low-severity smoke-test-quality concerns — none block landing. R3 verdict stays 🟠 as R2 (Rames' 4 findings on this PR are his primary lens and remain his call).
R3 by Via (runtime-interop lens)
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Reviewed at fce27754.
R3 verification — HEAD moved from R2 61795a9c due to stack force-push, but the design-panel.mjs file content is unchanged (rebase-only carry-through).
Peer scan: Via's R2 at 61795a9c posted 20:33 UTC; no R3 posted yet at this SHA — I'm first at R3.
🟡 R3 verdict — untouched at this SHA; all R1/R2 nits still open
Miguel's R3 push moved this PR's SHA (rebase over the substantive #1910/#1911 fixes), but the actual design-panel.mjs diff is byte-identical: hardcoded "48px" → "72px", h3.textContent.trim() === title section selectors, sleep(1400)/(2000)/(6000) waits, and no dumpPatchLog on FAIL. Since the PR body doesn't explicitly defer any item, treating as open verbatim.
Still open verbatim (Rames R1)
- (a)
__patchLogdump / screenshot on FAIL missing —design-panel.mjs:141-160populates__patchLogand exposesqa.clear()but nothing dumps it to console onfailures > 0. Adding adumpPatchLog()step in the final failure branch (log the last N mutations, exit non-zero after) turns a red "FAIL text.size" into a diagnosable trace — otherwise the operator has to re-run manually. - (b) hardcoded sleeps
sleep(1400),sleep(2000),sleep(6000),sleep(1500)are flake surfaces. Some of these are pinned to CDP round-trip latency (sleep(150)betweenmouse down/upat line 187-188 is fine), but the 2s post-commit and 6s post-load are guessed. Wait-for-condition (until qa.lastSel() !== nulloruntil disk("index.html", needle)) reduces the flake window and drops the median runtime. - (c) reload-survival only asserts headline edit, not per-section — after a reload the script checks one persisted value, not the full commit matrix. A single failure in the middle of the run leaves later cells in an inconsistent state and the assertion doesn't catch it.
- (d) no fault-injection cell exercising #1910's toast/revert paths — this smoke drives happy-path only; the persist-failure paths that the whole stack was built to close aren't exercised end-to-end. A 500-injection cell (stub the patch endpoint via the qa shim's
window.fetchwrapper at line 76-90) would catch a regression in the toast/revert seam.
Still open verbatim (Via R2)
- fixture literals coupling (
"48px","100","50"at lines 212, 217, 224) — a fixture edit shifts the initial values and this script silently no-ops (setInputWhereValuefails to find the input with the expectedcurrent). Pulling initial values fromqa.sectionInputs(section)[0].valuebefore commit would decouple. - h3-textContent selectors on i18n (
"Text","Transparency","Radius","Video"at lines 212-227) — the section-finder at line 134 doesh3.textContent.trim() === title, breaking on any locale change or copy edit. Section IDs ordata-qa-sectionattrs would be more stable. - missing-binary silent FAIL —
ab("agent-browser", ...)at line 43 returns"ERR: ..."string when the CLI is missing, but the downstreamabEvalat line 45-60 just JSON-parses the string and moves on. A first-call sanity check (ab("version")on startup, hard-exit if it errors) would fail fast with a clear message.
R3 verdict
🟡 no drift, all nits still open. Not blocking merge if the operator is comfortable running this by hand per the header notes — the script is functional and did pass 10/10 against the built CLI per Miguel's PR body. But given the stack's momentum (four other PRs closing R2 holds cleanly), landing #1906-#1911 without #1912 would be reasonable if the goal is to unblock the studio fixes; #1912 can iterate independently since it's operator-run.
Author: is #1912 an explicit "land after the stack settles" iteration target, or R3-scope oversight? If the former, a TODO(followup): comment block in the script would document intent.
— Rames D Jusso
buildTextFieldChildLocator guessed a synthetic field's position by counting same-tag "child" fields elsewhere in the array whenever sourceChildIndex was absent. That heuristic is unreachable today (the count-mismatch guard in buildTextFieldChildOperations already refuses add/remove edits before it's reached) but would silently locate the wrong element for a future caller that wires up synthetic-field support without also computing a real sourceChildIndex. Return null instead so the caller falls back to the unsupported-structure path.
- matched:false and persist errors toast, warn structurally, and revert the optimistic write - reverts guarded by a per-property version counter so stale failures never stomp newer edits - structural text-field edits refuse persist instead of writing escaped markup - multi-field child edits persist via per-child ops; shared commit runner extracted
…ist failure commitDataAttribute and handleDomHtmlAttributeCommit toasted on failure but never reverted the optimistic attribute write, leaving the preview showing an edit that never reached disk (the exact bug this PR closes for style commits). Extracted into useDomEditAttributeCommits.ts (useDomEditTextCommits.ts was at the file-size cap) and routed through runDomEditCommit with a per-target+ attribute version guard, mirroring handleDomStyleCommit.
Three findings from R2 review that must land together: a patch-rejection toast doubled up with the generic persist-failure toast (StudioSaveHttpError had no alreadyToasted marker), a failed prepareContent write (e.g. font-face injection) reverted and re-toasted a change the server had already persisted, and text-commit shouldRevert only rolled back on one narrow error type instead of any persist failure.
Regression tests for the persist failure paths: unresolvable targets, no-op warns, rejected requests, revert races, structural-edit refusal, and read/write failure toasts.
Regression tests for the data-attribute and html-attribute revert paths added in #1910: unresolvable target, rejected request, success (no revert), and a stale-failure-vs-newer-success race guarded by the per-attribute version counter.
Adds the two persist-hook cases R2 flagged as untested: the !patchResponse.ok HTTP-error path (previously only exercised via a network-throw, which bypassed this branch) and handleDomTextCommit's server-failure path. Also strengthens the prepareContent-write-failure test to assert the already-persisted base patch is recorded, not reverted, matching the coupled persist-hook fix.
Standalone script driving selection plus one input per panel section against a running preview, asserting disk persistence and reload survival.
a06bda0 to
e31a7bb
Compare
fce2775 to
13e2386
Compare
Closes the R2/R3 findings on the design-panel e2e smoke script: - Section lookup no longer matches h3 display text plus a manual tree walk (breaks on wording tweaks). Section now carries a stable data-panel-section attribute; the script queries by it directly. - Fields are located by their sibling label (or, where none exists, by being the section's only input of that type) instead of by guessing the fixture's current value ahead of time. - Fixed sleep(1400/2000/6000) waits replaced with polling on the actual condition (selection registered, section rendered, patch round-tripped, app booted). This surfaced a real bug while verifying: computing click coordinates right after a commit reused a stale preview-frame position from before the property panel's reflow, silently clicking the wrong spot — now waits for the frame's rect to stabilize first. Also found and fixed a disk-write race on the first commit of a run (patch fetch resolves before the server's file write lands). - FAIL now dumps window.__patchLog for diagnosability. - Added a fault-injection cell: the server rejects a patch and the panel must toast the rejection without persisting it or clobbering the prior committed value. Verified by actually running the script with agent-browser against a live preview (previously never exercised this way) — all 14 checks pass across repeated clean runs.
|
Addressed all 3 R2/R3 smoke-test-quality findings, and actually ran this script end-to-end for the first time (against a live Nits closed:
Bugs the actual run surfaced (worth noting since "never run with agent-browser" was itself one of the findings — turns out that's why these were invisible):
Verified: 14/14 checks pass across repeated clean runs (fresh fixture copy + fresh preview server each time). |
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
🟢 R4 verification — all 7 open findings closed, first-ever live-run of the script
R4 verification — reviewed at 32e1fee8c7b7a96846e0a2637de4a7ed13c503db (R3 was fce27754, restack-only carry). Delta from R3: one substantive commit 32e1fee8 "fix(studio): close smoke-test quality nits, add fault-injection coverage" — 2 files modified (propertyPanelPrimitives.tsx +14/-1, design-panel.mjs rewritten in place with net +200-ish lines), plus one file/attribute added (data-panel-section on Section).
Verified each of the 7 open findings (my 4 R1 nits + Via's 3 R2 findings) at HEAD:
-
N1 —
__patchLogdump on FAIL: ✅ resolved.design-panel.mjs:100-107—check()now unconditionally logspatchLogon any failure. Every acceptance-test cell that fails will surface the recorded patch/fetch trail. -
N2 — arbitrary
sleep(1400/2000/6000)waits: ✅ resolved. Every fixed sleep has been replaced withwaitFor(expr, {timeout, interval})polling on the actual condition. Concretely:design-panel.mjs:135-141(waitForhelper), plus call-sites at:283(openStudioreadyState + inspector button),:294-296("Select an element" text),:308(selection registered after mouse click),:313(section rendered before commit),:315(patchLog populated after commit). The only remainingsleep(150)at:305is a deliberate gesture delay betweenmouse down/mouse up— that's an intentional simulation of a real click, not an async wait, and the inline comment now says so. -
N3 — reload-survival only checks headline: 🟡 partial.
design-panel.mjs:369-375still only asserts#qa-headlinecomputedfontSize === "72px"after reload. Not blocking — the pre-reload cells (text.size,style.opacity,style.radius,media.volume,sub.text-size) all now usewaitForDisk()on the specific fixture strings, which is the correct persistence gate; a headline-only reload assertion is fine as a spot-check for browser rehydration since disk state is already independently proven. But if a rehydration bug bit the shape/video/sub-composition paths only, this smoke wouldn't catch it. Leave as follow-up. -
N4 — no fault-injection cell exercising the toast/revert path: ✅ resolved.
design-panel.mjs:346-366adds a full fault-injection cell: (a) re-selects#qa-headline, (b) primeswindow.__qaFaultviainjectFault('file-mutations/patch-element', 500)to make the shimmedfetchreturn a one-shot 500, (c) commits a new size"90px", (d) waits for "Couldn't save" toast text to appear, (e) asserts three separate things —fault.toast-shown,fault.no-persist(rejected value doesn't hit disk),fault.prior-value-survives(prior 72px value untouched). This is exactly the coupled toast +alreadyToasted+ no-clobber-of-disk exercise for #1910's fixes. -
V1 — hardcoded initial-value probes: ✅ resolved. Fields are now located by
pickByLabel(slug, "Size" | "All" | "Volume")orpickByType(slug, "range")— see the new helpersqa.pickByLabel/qa.pickByTypeatdesign-panel.mjs:180-197and thebyLabel/byTypefactories at:322-325. Novalue === "48px"guesses anywhere; fixture-default drift no longer breaks the script silently. -
V2 — panel-section selectors via
h3.textContentliteral match: ✅ resolved. The app now emits a stabledata-panel-sectionattribute —propertyPanelPrimitives.tsx:373-374on theSectionwrapper, driven by a newslugifyPanelSectionTitlehelper at:395-402that lowercases + hyphenates the title. The script'sqa.section(slug)atdesign-panel.mjs:175-176queries by this attribute directly. Wording tweaks and future i18n won't break the smoke. -
V3 — missing
agent-browseron PATH surfaces as universal FAIL: 🟡 partial. Script header atdesign-panel.mjs:59still says "Requires the agent-browser CLI on PATH"; there's no preflightwhich agent-browserstep. If the binary is missing, everyab(...)call returns"ERR: … ENOENT"and every check fails with a generic FAIL — same symptom Via warned about. Since this script is now genuinely a fallow-registered standalone (no CI wiring), the failure mode's ceiling is "the person running it locally reads the ENOENT once and installs the binary." Also non-blocking, but a single preflight probe would still be cheap. Leave as follow-up.
Both remaining 🟡s (N3, V3) are non-blocking follow-ups on quality-of-life; neither affects the correctness of what the script actually exercises. In addition — the fix commit's message calls out that this run was the first time the script has actually been driven against a live preview + agent-browser, and it caught two real bugs while running: (1) a stale-frame-position race on post-commit reflow (fixed by waitForStableFrame() at :264-273), (2) a disk-write race between fetch resolving and the server's file write landing (fixed by waitForDisk() polling at :120-127). That kind of "the R3-nit-cleanup found actual bugs" outcome is what makes fixing smoke-test-quality findings worth the round-trip.
No new R4 findings. No AI-attribution trailer.
Verdict: 🟢 ready to route for stamp. With the six others also cleared (six no-drift-or-substantive-fix on #1906/#1907/#1908/#1909/#1910/#1911), the full 7-PR stack is ready.
— Rames D Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
🟢 R4 verdict — my 3 R3 findings resolved (2 fully, 1 accept-as-follow-up)
R4 delta at commit 32e1fee8 "close smoke-test quality nits, add fault-injection coverage" — propertyPanelPrimitives.tsx +14/-1 and design-panel.mjs substantially rewritten. Verified at HEAD 32e1fee8.
Rames-bot posted a comprehensive R4 verification at 32e1fee8 covering all 7 open findings (his 4 R1 nits + my 3 R2 findings) with file:line citations. His grading matches mine on all three of my findings — no reconciliation needed. My per-finding disposition:
-
✅ V1 (hardcoded initial-value probes couple to fixture literals) — resolved. Fields now located via
pickByLabel(slug, "Size" | "All" | "Volume")orpickByType(slug, "range")atdesign-panel.mjs:180-197.setInputWhereValue(title, currentValue, newValue)is deleted entirely — no fixture-value coupling anywhere. Fixture-default drift won't silently break the script. -
✅ V2 (
h3.textContentliteral-match selectors) — resolved. App emits a stabledata-panel-sectionattribute atpropertyPanelPrimitives.tsx:373-374, driven byslugifyPanelSectionTitleat:395-402.qa.section(slug)queries by attribute directly atdesign-panel.mjs:175-176. Wording tweaks and future i18n of section titles no longer break the smoke. -
🟡 V3 (missing
agent-browserpreflight) — accept as follow-up. No preflightwhich agent-browserprobe was added; theab()wrapper still catches ENOENT and returns"ERR: …"strings that cascade into universal FAILs. Same disposition as Rames' N3/V3 — non-blocking because the script is now a standalone fallow-registered smoke (not CI-wired), so the ceiling on this failure mode is "the person running it locally reads the ENOENT once and installs the binary." Cheap follow-up if anyone wants it, not a blocker.
Also worth calling out (positive, unrequested R4 upgrades):
waitFor/waitForDisk/waitForStableFramepolling helpers replace all fixed sleeps. The frame-stability check specifically guards against a client-side race where post-commit reflow shifts the preview frame's on-page rect and subsequent click coordinates land on the wrong spot — Rames confirms this caught a real bug during the first live run.- Fault-injection cell (
design-panel.mjs:346-366) primeswindow.__qaFaultfor a one-shot 500, then asserts three things:fault.toast-shown(Couldn't-save text renders),fault.no-persist(rejected value doesn't reach disk),fault.prior-value-survives(prior 72px value untouched). This is the coupled end-to-end exercise for #1910'salreadyToasted+ no-clobber-of-disk fixes — the piece that was missing from #1910's unit-level coverage. check()now dumpswindow.__patchLogon every FAIL, which materially reduces the universal-fail-with-no-signal problem V3 was worried about.
Both non-blocking follow-ups (my V3, Rames' N3) are quality-of-life on a standalone smoke that's not CI-gated. Neither affects correctness of what the script actually exercises.
R4 by Via (runtime-interop lens) — agrees with Rames-bot's concurrent R4 verification at this SHA
jrusso1020
left a comment
There was a problem hiding this comment.
APPROVE — part of the 7-PR studio-panel fix stack (#1906–#1912), stamped as a batch after independent verification.
Verified:
- Dual-reviewer convergence at R4: RDJ (correctness/architecture lens) + Via (runtime-interop lens) both green on all 7, cross-reconciled with matching file:line citations across four rounds. This PR's head is exactly its R4-cited SHA — no drift since the verdict.
- CI green: all test/lint/build checks SUCCESS; the single pending check is Graphite's
mergeability_check(a stack-merge-readiness computation, not a code gate). - Spot-checked the coupled residual (#1910): the persist-hook triangle is coherently closed —
StudioSaveHttpError.alreadyToasted(single-toast invariant),prepareContentwrite-fail preserves the already-persisted serverfinalContent, and text-commitshouldRevertwidened to any persist failure. #1911's new cells exercise these paths.
Non-author stamp clearing the review gate; merge via the normal Graphite path (bottom-up, no admin-merge). Layered on RDJ + Via.
— Rames Jusso

Part 7/7 of the design-panel fix stack (replaces #1888).
What
A standalone agent-browser e2e smoke (
packages/studio/tests/e2e/design-panel.mjs) that drives the real Studio UI against a running preview of the #1906 fixture: enables the Inspector, selects one element per archetype via real mouse events at canvas coordinates, commits one representative input per panel section (text size, opacity, radius, video volume, sub-composition text), and asserts both disk persistence and post-reload rendering. Exits non-zero on any failed cell.Run by hand (usage in the script header): copy the fixture outside the repo, start
preview, point the script at it. Registered as a fallow entry (standalone script, no import-graph referrer).Why
The seam this stack fixes — click-to-select through panel commit to disk — had zero browser-level coverage; unit suites cannot catch selection hit-testing or optimistic-vs-disk divergence. This script is the rerunnable version of the manual verification matrix in #1906 and passed 10/10 against the built CLI in embedded mode with the full stack applied.