Skip to content

v0.11.0

Choose a tag to compare

@github-actions github-actions released this 29 May 09:18
· 121 commits to main since this release
c6fae76

Highlights

⟳ Re-record β€” the answer to "my UI changed and my spec broke"

When a Hover-generated Playwright spec turns red because semantic selectors no longer match (button renamed Sign in, label split, role swapped), instead of editing the .spec.ts by hand, the agent regenerates it. Two entry points:

1. Widget β€” Open the new πŸ“œ Saved sessions overlay, switch to the Specs tab, click ⟳ Re-record next to any spec. The agent reads the spec's JSDoc Original prompt: ("log in then add a todo"), drives the current UI, and Hover overwrites the file with new selectors.

2. CLI β€” pnpm hover re-record <spec> from a terminal. Boots a temporary service, replays the prompt, prints the resulting git diff, tells you the accept/reject commands. Flags: --dry-run (run without overwriting), --cwd <path> (monorepo workspaces), --port <n> (service port).

About 30 seconds, about $0.10 per spec. CI itself stays pure Playwright β€” AI cost concentrates at authoring time, not amortised across every test run.

Saved-sessions overlay (Skills + Specs tabs)

The widget's old single-purpose "Saved skills" overlay becomes the Saved sessions overlay with two tabs:

  • Skills β€” replayable agent instructions under .claude/skills/. Self-adapt to UI changes (the agent re-resolves selectors at runtime). Same UX as before.
  • Specs β€” Playwright tests under __vibe_tests__/. Each row carries the spec slug, truncated original prompt, relative mtime ("2h ago"), and a ⟳ Re-record button. Disabled with tooltip when the spec has no Original prompt: header (hand-authored specs).

Per-tab hint paragraphs explain the distinction explicitly so users don't conflate the two artefacts.

Top-level FAQ in README + docs site

New FAQ section in both READMEs and a dedicated docs/faq.md page. Q1 is the load-bearing one: "My UI changed and my saved spec breaks. What now?" Covers the three-layer answer:

  1. Semantic selectors absorb most UI churn (the existing design).
  2. When semantics actually shift, ⟳ Re-record, hand-edit, or treat as regression.
  3. Why no auto-heal at CI time β€” Hover's stance against the Stagehand/Midscene model. CI tokens accumulate; concentrating LLM cost at deliberate Re-record moments is cheaper and more deterministic over a project's lifetime.

Also covers: Skill vs Spec semantics, why we don't ship re-record --all / --failed in v0.11, headless-Chromium concerns, data-upload boundaries, and production-build no-op behaviour.

What's NOT in this release

  • re-record --all / --failed. Rejected on purpose for v0.11. --all burns LLM tokens on specs that are fine; --failed is the right shape but needs a first-class "run Playwright, collect failures" step the CLI doesn't yet ship. Rationale in the FAQ. On the v0.12+ roadmap.

Implementation

Layer Change Where
Core lib listSpecs() + parseSpecHeader() β€” 145 lines + 13 vitest cases packages/core/src/specs/listSpecs.ts
WS protocol New list-specs request + specs-list response; new command.reRecord.slug field packages/core/src/service.ts
Service Collects SkillStep[] from tool_use events when reRecord.slug is set; on clean session_end calls writeSpec({ overwrite: true }) invocation loop in service.ts
CLI hover re-record <spec> subcommand β€” ~340 lines packages/cli/src/re-record.ts
Widget Tabbed overlay, Specs list renderer, Re-record button β†’ command { reRecord }, ~280 lines across client.js + style.css + template.html packages/widget-bootstrap/src/widget/
Docs README + zh-CN FAQ, docs/faq.md, docs/features/re-record.md, new save-as-spec.md content, nav/sidebar updates (see CHANGELOG)

Roadmap reshuffle

  • v0.11 βœ“ Spec resilience (this release)
  • v0.12 β†’ Security mode recording semantics (was v0.11)
  • v0.13+ or sibling repo β†’ Chrome extension (was v0.12+)
  • "Re-record --failed / --all" added to Beyond v0.12.x

Validation

  • pnpm typecheck clean across all 10 publishable packages.
  • pnpm --filter @hover-dev/core test: 176 tests pass (was 163; +13 for parseSpecHeader / listSpecs).
  • pnpm test:e2e: 5 Playwright tests pass on examples/basic-app β€” no regressions from the overlay rewrite.
  • Manual smoke (Re-record actually round-tripping against a running service) deferred to post-release β€” same calculus as v0.10's bench-multi-tab: ship the surface first, iterate based on real usage.

Full diff

v0.10.0...v0.11.0