Skip to content

feat(capture): extract chips/stat-cells/tabs, detect icon fonts, transparent grounds#1827

Merged
xuanruli merged 1 commit into
mainfrom
feat/capture-component-extraction
Jul 1, 2026
Merged

feat(capture): extract chips/stat-cells/tabs, detect icon fonts, transparent grounds#1827
xuanruli merged 1 commit into
mainfrom
feat/capture-component-extraction

Conversation

@xuanruli

@xuanruli xuanruli commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What

Extends the capture engine's design-style + font extraction so a wider range of real-world sites produce faithful, usable component/typography tokens.

designStyleExtractor.ts

  • Extracts three more component families beyond buttons/cards/nav: chips (pill/badge/tag), stat/metric cells, and tabs — by class-substring selector plus a shape fallback (small + fully-rounded + short text) so hashed/utility class names (Tailwind, CSS-modules, Next.js) are still caught.
  • Emits a "transparent" sentinel for fully-transparent (rgba(...,0)) grounds instead of collapsing them to #000000, so a transparent chip/tab/stat on a light-ground site no longer reads as solid black.

fontMetadataExtractor.ts

  • Flags icon fonts (isIcon) by Private-Use-Area glyph ratio (>50%). Icon fonts ship arbitrary names — swiper-icons, a custom hushly, Font Awesome — that no name-list can enumerate; without this they get mistaken for a text family and render headings as tofu/icons. A plain "no Latin letters" test is deliberately avoided: a text font served as a unicode-range subset legitimately lacks A yet is 0% PUA.

types.tsDesignStyles gains optional chips/statCells/tabs; new StatCellStyle; FontFileMetadata gains isIcon.

Why

Validated end-to-end across 7 diverse sites (Stripe, LiveKit, DoorDash, Snowflake, Linear, ElevenLabs, Kuse). Each surfaced a distinct real-world case this PR handles generally (not per-site): oklch/hsl colors, camelCase/hashed font names, icon fonts, transparent grounds, unicode-range subsets. Snowflake's hushly icon font was rendering headings as icon glyphs before the isIcon fix.

Tests

  • New unit tests for isIconCharacterSet (PUA-heavy → icon; Latin / cyrillic-subset / empty → not).
  • fontMetadataExtractor.test.ts: 39 passing. bun run build, oxlint, oxfmt, typecheck all clean on changed files.

🤖 Generated with Claude Code

…sparent grounds

designStyleExtractor now also extracts chip/pill/badge/tag, stat/metric cells, and
tab components — by class-substring selector plus a shape fallback (small + fully
rounded + short text) so hashed/utility class names (Tailwind, CSS-modules) are
still caught. It also emits a "transparent" sentinel for fully-transparent
(rgba(...,0)) grounds instead of collapsing them to #000000, so a transparent
chip/tab/stat on a light-ground site no longer reads as solid black.

fontMetadataExtractor now flags icon fonts (isIcon) by glyph coverage: a font is an
icon font only when it BOTH lacks a real Latin alphabet (<26 of A-Za-z) AND is
mostly (>50%) Private-Use-Area glyphs. The Latin gate matters — some text fonts pack
thousands of PUA glyphs yet are plainly text (Apple SF Pro is ~81% PUA but ships a
full alphabet; Descript's Booton ~50%); flagging by PUA ratio alone would strip a
brand's real typeface. Measured icon fonts: "hushly" 63% PUA / 7 letters, Font
Awesome 95% / 0 letters. Names alone can't identify icon fonts ("hushly",
"swiper-icons"), hence the glyph-based test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@xuanruli xuanruli force-pushed the feat/capture-component-extraction branch from f4f61b5 to 6cc8731 Compare July 1, 2026 03:55

@miga-heygen miga-heygen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(capture): extract chips/stat-cells/tabs, detect icon fonts, transparent grounds

Summary: Extends the capture engine's design-style extraction to three new component families (chips, stat cells, tabs), adds icon-font detection via PUA glyph ratio, and fixes transparent backgrounds being collapsed to #000000. Well-structured, well-tested, validated across diverse sites.

Findings:

# Location Severity Note
1 fontMetadataExtractor.ts:206 concern font as unknown as { characterSet?: number[] } is a double-cast. Project CLAUDE.md says "Avoid any and as T assertions." The try/catch + Array.isArray guard makes it runtime-safe, but consider a type guard function instead: function hasCharacterSet(f: Font): f is Font & { characterSet: number[] }.
2 designStyleExtractor.ts:209 suggestion parseFloat(st.borderRadius) only reads the first value of shorthand like "24px 24px 0px 0px". An element with only top-rounded corners would pass the pill-shape check. Unlikely to cause false positives given the other constraints (height ≤ 44, width ≤ 260, short text, has skin).
3 designStyleExtractor.ts:262 suggestion [class*="tab"] could match tabpanel, tabindex, tabular, establish, stable. Downstream size filters mitigate most false positives, but consider additional :not() exclusions if observed in practice.
4 fontMetadataExtractor.ts:192 nit isIcon not propagated to FontFamilySummary. Consumers need to iterate files to discover if a family is an icon font. Worth noting for future consumers.
5 designStyleExtractor.ts:216-226 nit Same DOM element can appear in both chipEls and shapeChips, getting getStyles() called twice before dedup by key. No correctness issue, just redundant work.
6 rgbToHex — transparent fix nit No unit test for the transparent sentinel. The function lives inside a page.evaluate script so unit testing requires extraction or e2e. Not a blocker.

What looks good:

  • Icon-font detection heuristic is clever — dual-gate (Latin alphabet + PUA ratio) handles the tricky SF Pro / Booton false-positive case that naive PUA-only would miss
  • Transparent sentinel is a clean fix for a data-loss bug (transparent → #000000)
  • Type additions (chips?, statCells?, tabs?) are backward-compatible
  • Test coverage is solid — SF Pro test case is particularly valuable (validates the Latin gate)
  • Shape-fallback chip detection fills the gap for sites that don't use class names with "chip/tag/badge/pill"
  • Stat cell extraction correctly finds the biggest-font child for the "number" style

Verdict: LGTM — Well-designed extraction with real-world validation. The main actionable suggestion is replacing the as unknown as cast with a type guard.

— Miga

@xuanruli xuanruli merged commit 8694424 into main Jul 1, 2026
41 checks passed
@xuanruli xuanruli deleted the feat/capture-component-extraction branch July 1, 2026 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants