v0.11.0
Highlights
β³ Re-record β the answer to "my UI changed and my spec broke"
When a Hover-generated Playwright spec turns red because semantic selectors no longer match (button renamed Sign in, label split, role swapped), instead of editing the .spec.ts by hand, the agent regenerates it. Two entry points:
1. Widget β Open the new π Saved sessions overlay, switch to the Specs tab, click β³ Re-record next to any spec. The agent reads the spec's JSDoc Original prompt: ("log in then add a todo"), drives the current UI, and Hover overwrites the file with new selectors.
2. CLI β pnpm hover re-record <spec> from a terminal. Boots a temporary service, replays the prompt, prints the resulting git diff, tells you the accept/reject commands. Flags: --dry-run (run without overwriting), --cwd <path> (monorepo workspaces), --port <n> (service port).
About 30 seconds, about $0.10 per spec. CI itself stays pure Playwright β AI cost concentrates at authoring time, not amortised across every test run.
Saved-sessions overlay (Skills + Specs tabs)
The widget's old single-purpose "Saved skills" overlay becomes the Saved sessions overlay with two tabs:
- Skills β replayable agent instructions under
.claude/skills/. Self-adapt to UI changes (the agent re-resolves selectors at runtime). Same UX as before. - Specs β Playwright tests under
__vibe_tests__/. Each row carries the spec slug, truncated original prompt, relative mtime ("2h ago"), and a β³ Re-record button. Disabled with tooltip when the spec has noOriginal prompt:header (hand-authored specs).
Per-tab hint paragraphs explain the distinction explicitly so users don't conflate the two artefacts.
Top-level FAQ in README + docs site
New FAQ section in both READMEs and a dedicated docs/faq.md page. Q1 is the load-bearing one: "My UI changed and my saved spec breaks. What now?" Covers the three-layer answer:
- Semantic selectors absorb most UI churn (the existing design).
- When semantics actually shift, β³ Re-record, hand-edit, or treat as regression.
- Why no auto-heal at CI time β Hover's stance against the Stagehand/Midscene model. CI tokens accumulate; concentrating LLM cost at deliberate Re-record moments is cheaper and more deterministic over a project's lifetime.
Also covers: Skill vs Spec semantics, why we don't ship re-record --all / --failed in v0.11, headless-Chromium concerns, data-upload boundaries, and production-build no-op behaviour.
What's NOT in this release
re-record --all/--failed. Rejected on purpose for v0.11.--allburns LLM tokens on specs that are fine;--failedis the right shape but needs a first-class "run Playwright, collect failures" step the CLI doesn't yet ship. Rationale in the FAQ. On the v0.12+ roadmap.
Implementation
| Layer | Change | Where |
|---|---|---|
| Core lib | listSpecs() + parseSpecHeader() β 145 lines + 13 vitest cases |
packages/core/src/specs/listSpecs.ts |
| WS protocol | New list-specs request + specs-list response; new command.reRecord.slug field |
packages/core/src/service.ts |
| Service | Collects SkillStep[] from tool_use events when reRecord.slug is set; on clean session_end calls writeSpec({ overwrite: true }) |
invocation loop in service.ts |
| CLI | hover re-record <spec> subcommand β ~340 lines |
packages/cli/src/re-record.ts |
| Widget | Tabbed overlay, Specs list renderer, Re-record button β command { reRecord }, ~280 lines across client.js + style.css + template.html |
packages/widget-bootstrap/src/widget/ |
| Docs | README + zh-CN FAQ, docs/faq.md, docs/features/re-record.md, new save-as-spec.md content, nav/sidebar updates |
(see CHANGELOG) |
Roadmap reshuffle
- v0.11 β Spec resilience (this release)
- v0.12 β Security mode recording semantics (was v0.11)
- v0.13+ or sibling repo β Chrome extension (was v0.12+)
- "Re-record
--failed/--all" added to Beyond v0.12.x
Validation
pnpm typecheckclean across all 10 publishable packages.pnpm --filter @hover-dev/core test: 176 tests pass (was 163; +13 forparseSpecHeader/listSpecs).pnpm test:e2e: 5 Playwright tests pass onexamples/basic-appβ no regressions from the overlay rewrite.- Manual smoke (Re-record actually round-tripping against a running service) deferred to post-release β same calculus as v0.10's
bench-multi-tab: ship the surface first, iterate based on real usage.