Skip to content

feat(capture): pipeline improvements — contact sheets, design styles, snapshot#988

Open
ukimsanov wants to merge 2 commits into
graphite-base/988from
feat/capture-pipeline-improvements
Open

feat(capture): pipeline improvements — contact sheets, design styles, snapshot#988
ukimsanov wants to merge 2 commits into
graphite-base/988from
feat/capture-pipeline-improvements

Conversation

@ukimsanov
Copy link
Copy Markdown
Collaborator

What

Capture pipeline work that came out of the 11-round website-to-video eval branch. Five clusters of changes touch 12 files in packages/cli/src/capture/ and packages/cli/src/commands/:

  1. Paginated labeled contact sheets (contactSheet.ts, new)
  2. Design-styles extractor (designStyleExtractor.ts, new)
  3. Snapshot tool fixes — HyperShader-aware wait + local-time seek for sub-comps + Gemini per-frame analysis runs by default
  4. Screenshot capture — overlay-scan replaced with TreeWalker + early-exit; cookie-dismiss selectors scoped under cookie/consent/gdpr ancestors
  5. CLI ergonomics — .env auto-load from CWD, transcribe defaults output dir next to the input

2 of 5 in the pipeline-quality stack. Stacked on #987.

Why

The wins that actually moved video quality across the evals were the artifacts agents read (contact sheets, design-styles JSON) and the snapshot-tool visual-verification fixes. Without the contact sheet pagination, sites with 30+ assets showed asset-1 at full size and nothing else. Without local-time seek, every beat after the first showed as black or final-opacity-zero in the snapshot. The Gemini frame-by-frame descriptions.md is the only objective check on what the video actually shows.

How

Contact sheets (contactSheet.ts, new) — paginated labeled grids (3-col screenshots / 4-col raster / 5-col SVG; 9–15 cells per page with filename labels via SVG text overlay). fit: "contain" keeps every asset visible at its real aspect ratio. createSvgContactSheet scans both assets/svgs/ and assets/ root, de-dupes by filename — sites with all-external SVGs (huly.io) now get coverage. escapeXml covers &<>"'.

Design styles (designStyleExtractor.ts, new) — walks the live DOM, reads computed styles, writes extracted/design-styles.json: typography hierarchy with exact metrics, button variants, card/container/nav styles, spacing scale with base unit, border-radius scale, box-shadow values with usage counts. Primary data source for DESIGN.md at Step 1.

Snapshot tool (snapshot.ts) — wait signal is now window.__hf.shaderTransitions[].ready (set after both warm and cold cache paths complete); local-time seek for sub-comps means exit fades read at their own t=0..duration. --describe resolution: runs by default, --describe "custom Q" overrides the prompt, --describe false opts out.

Screenshot capture (screenshotCapture.ts)querySelectorAll('*') + getComputedStyle overlay scan replaced with a TreeWalker that early-exits on cheap rect checks before the expensive style read. Caps at 5000 elements per page. Cookie/consent dismissal selectors scoped under cookie/consent/gdpr ancestors so we don't click "Accept invitation" on unrelated buttons.

Agent prompt (agentPromptGenerator.ts) — auto-discovers contact-sheet page count (matches base name plus paginated -NNN digits only; regex metachars escaped; numeric sort for 10+ pages). inferColorRole classifies extracted hex colors as bg-dark / bg-light / accent / surface / neutral so the agent prompt shows #533AFD (accent). design-styles.json row gated on existsSync.

Other CLI ergonomicscli.ts auto-loads .env from CWD on startup (handles export FOO=bar, quoted values, inline # comments). commands/transcribe.ts default output dir is the input file's directory.

Test plan

  • Unit tests added/updated — no new tests; functionality is end-to-end. Existing CLI tests pass.
  • Manual testing performed — capture run on 7 sites (arc, daylight, framer, huly, mercury, raycast, workos) confirms contact sheets, design-styles.json, fonts-manifest.json, and snapshot/descriptions.md all produced correctly. Typecheck clean. Build succeeds.
  • Documentation updated (if applicable) — skill prose documentation for the new artifacts lands in PR feat(studio): consolidate into single OSS-ready NLE editor #4 of the stack.

Copy link
Copy Markdown
Collaborator Author

ukimsanov commented May 20, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good capture pipeline work. Issues from the original #984 review addressed in the split:

  • Cookie-dismiss selectors now scoped under cookie/consent/gdpr ancestors (not bare button[id*="accept"])
  • Regex interpolation uses escapedBase — no longer fragile for special chars
  • escapeXml covers all 5 entities including apostrophe
  • --describe running by default is now documented as intentional (opt-out with --describe false)

Contact sheet pagination, design-styles extractor, and TreeWalker overlay scan are all clean.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve at 29baca1e. Magi covered the cookie-selector ancestor scope, regex escape, and escapeXml 5-entity coverage. Additive notes:

  • --describe resolution — PR body explicitly addresses my prior hf#984 concern: "runs by default, --describe \"custom Q\" overrides the prompt, --describe false opts out." The three-valued semantic is well-thought-out; verify the boolean-parsing layer accepts "false" / "0" / "no" consistently with citty conventions elsewhere in the CLI (snapshot defaults to a numeric flag pattern; the docs and the parser need to agree).
  • .env auto-load from CWD — same loader implementation I reviewed in hf#984. Safe shape (only sets if !(key in process.env), parses quoted values, silent on missing file). Standard dotenv convention. Acceptable.
  • Contact sheet paginationagentPromptGenerator.ts auto-discovers page count by matching <base>-NNN.<ext> digits with regex metachars escaped + numeric sort. Solid; the 10+-pages numeric-sort fix is the kind of off-by-one-character edge case worth pinning by test. Recommend adding a unit test for the discovery function (12 → 9, 10, 11, 12; not lexicographic).
  • TreeWalker overlay scan + 5000-element cap — sensible defense-in-depth. The 5000 cap will silently truncate on extreme pages (e.g., asset-heavy CRM dashboards). Worth a one-line log when the cap is hit so we know the scan was incomplete.
  • Snapshot wait signal via window.__hf.shaderTransitions[].ready — cross-PR consistency: this assumes hyperframes core's runtime exposes that object. Verify the contract is documented in the runtime API surface (or pin it by test on the engine side).

Stack base correctly set to feat/capture-font-extractor. Direction is the bag of wins Ular identified as actually moving quality.

— Rames Jusso

ukimsanov added 2 commits May 20, 2026 17:57
… snapshot

Capture pipeline work that came out of the 11-round website-to-video
eval branch. The wins that actually moved quality were the artifacts
agents read (contact sheets, design-styles) and the snapshot tool
visual-verification fixes; the rest are smaller follow-ons.

**Contact sheets (`contactSheet.ts`, new)**
- Replaces the embedded one-image-per-asset listing with paginated
  labeled grids (3-col screenshots / 4-col raster / 5-col SVG). Each
  page contains 9–15 cells with filename labels baked in via SVG
  text overlay (`escapeXml` covers `&<>"'`).
- `fit: "contain"` keeps every asset visible at its real aspect
  ratio; the old `fit: "cover"` cropped to the first image's box.
- Returns `string[]` (page paths) — single-page captures get one
  file, multi-page produce `contact-sheet-1.jpg`, `contact-sheet-2.jpg`,
  etc.
- `createSvgContactSheet` scans both `assets/svgs/` (inline-extracted
  SVGs) and `assets/` root (external SVGs from `<img src="*.svg">`)
  and de-dupes by filename. Sites with all-external SVGs (huly.io)
  now get coverage they previously didn't.

**Design styles extractor (`designStyleExtractor.ts`, new)**
- Walks the live DOM and reads computed styles to produce
  `extracted/design-styles.json`: typography hierarchy (every text
  role with exact font-size / weight / line-height / letter-spacing),
  button variants (background / padding / radius / shadow), card /
  container / nav styles, spacing scale with base unit, border-radius
  scale, box-shadow values with usage counts.
- Primary data source for DESIGN.md authoring at Step 1. Replaces
  the prior "guess from screenshots" workflow.

**Snapshot tool (`snapshot.ts`)**
- HyperShader pre-rendering used to swallow the entire snapshot
  capture window (every frame after the first showed the loading
  overlay or final-opacity-zero exit fades). Wait signal is now
  `window.__hf.shaderTransitions[].ready` (set after both warm and
  cold cache paths complete); local-time seek for sub-comps means
  exit fades read at their own t=0..duration, not global time.
- Gemini vision per-frame analysis runs by default (`descriptions.md`
  next to the contact sheet). `--describe "custom Q"` overrides the
  prompt; `--describe false` opts out.
- 3-column contact sheet generation for snapshot frames so reviewers
  see all beats at a glance.

**Screenshot capture (`screenshotCapture.ts`)**
- Replaces `querySelectorAll('*') + getComputedStyle` overlay scan
  with a TreeWalker that early-exits on cheap rect checks before
  reaching the expensive style read. Caps at 5000 elements per page.
- Cookie/consent dismissal selectors are scoped under cookie /
  consent / gdpr ancestors so we don't click "Accept invitation" or
  similar unrelated buttons.

**Agent prompt (`agentPromptGenerator.ts`)**
- Auto-discovers contact-sheet page count (matches base name plus
  paginated `-NNN` variants only, with regex escaping on the base
  name and numeric sort for 10+ pages).
- `inferColorRole`: classifies extracted hex colors as bg-dark /
  bg-light / accent / surface / neutral via luminance + saturation,
  so the agent prompt shows `#533AFD (accent)` instead of bare hex.
- `design-styles.json` row is gated on `existsSync` — the upstream
  write is wrapped in try/catch and may skip on failure, so the
  prompt only points to files actually on disk.

**Other CLI ergonomics**
- `cli.ts`: auto-load `.env` from CWD on startup so subcommands like
  `snapshot` don't need explicit `export GEMINI_API_KEY=…`. Handles
  `export FOO=bar`, quoted values, inline `# comments`.
- `commands/transcribe.ts`: default output dir is the input file's
  directory, not CWD. Stops the "wrote transcript.json somewhere
  unexpected" footgun.
- `assetDownloader.ts`: improved asset naming uses catalog context;
  de-duplicates inline SVG filenames.
- `contentExtractor.ts`: captions SVGs via Gemini (code-as-text) and
  integrates them into asset descriptions.
- `tokenExtractor.ts` + `types.ts`: SVG bounding box dimensions and
  new DesignStyles schema added.
Required by the contact-sheet pagination code added on this PR
(uses Sharp APIs that landed in 0.34.5). Originally bumped on
#987 by mistake — moved here per Copilot review.
@ukimsanov ukimsanov force-pushed the feat/capture-pipeline-improvements branch from 29baca1 to eeacacc Compare May 21, 2026 01:13
@ukimsanov ukimsanov requested a review from Copilot May 21, 2026 02:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ukimsanov ukimsanov changed the base branch from feat/capture-font-extractor to graphite-base/988 May 21, 2026 03:17
@ukimsanov ukimsanov requested a review from miguel-heygen May 21, 2026 03:21
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after new commit eeacaccc.

Single-line change: bumps sharp floor from ^0.34.0 to ^0.34.5 in packages/cli/package.json. The ^ range already allowed 0.34.5, so this just raises the minimum (likely for a fix in 0.34.1–0.34.4). No other files touched.

Still good. Ship it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants