Release v0.10.0 · Hyperyond/Hover

Highlights

The Stripe-checkout / OAuth-redirect / "Pay with X" popup flow has been the most brittle thing the agent does. v0.10 hardens it.

System-prompt addendum — three new rules in cdpHint.ts:
- Rule 5 Popup-opening clicks → browser_tabs(list) → switch to the popup → after window.close, refocus the opener via browser_tabs(select, 0) and re-snapshot. The opener's DOM may have updated via postMessage handlers in the interim.
- Rule 6 OAuth-style redirect chains where the same tab idx silently changes origin underneath the agent. browser_tabs(list) URL is authoritative.
- Rule 7 Cross-origin cookie/session updates: do NOT browser_navigate to same-origin to force a refresh (rule #2 still applies). The most likely cause is a slow / missing postMessage handler. Wait once via browser_wait_for_text(expected, 3000) before reporting.
examples/payment-provider upgraded to a two-step card + OTP flow with simulated 600ms 3DS pre-check. Card 4242 4242 4242 4242, OTP 123456. Decline button on either step short-circuits to the same payment-result: declined postMessage. data-testid on every interactive control so e2e specs / benches target them deterministically. postMessage shape unchanged → existing examples/e-commerce handler still works.
pnpm bench-multi-tab — new benchmark script. Runs N iterations of the full e-commerce → PayHover checkout flow, reports success rate, median wall time, median turns, median cost. Companion to pnpm bench-ttfb. Use it to A/B prompt changes across branches.

Agent	Sandbox	Install	Stream	MCP	System prompt
`aider`	soft	`pipx install aider-chat`	plain-text only	— (no MCP yet)	preface prepended
`gemini-cli`	soft	`npm install -g @google/gemini-cli`	`stream-json`	`~/.gemini/settings.json`	preface prepended
`qwen-code`	soft	`npm install -g @qwen-code/qwen-code@latest`	`stream-json` (Anthropic-Messages envelope)	`~/.qwen/settings.json`	real `--append-system-prompt`

Caveats:

aider has no MCP integration today, so picking aider from the Hover dropdown gets you an LLM chat with no browser-driving ability. Documented prominently in the descriptor header as a degraded mode.
gemini-cli has no per-invocation system-prompt flag (third-party docs claim --append-system-prompt exists; it doesn't — only the GEMINI_SYSTEM_MD env var pointing at a markdown file, full replacement). HOVER preface prepends to the user prompt.
qwen-code is the only one of the three with a real --append-system-prompt flag — the cleanest of the four soft-sandbox descriptors (codex / cursor / aider / gemini all prepend).

All three soft-sandbox, ⚠ badge in the dropdown.

v0.10 ✓ (you are here)
v0.11 — security mode recording semantics (unchanged)
v0.12+ or sibling repo — Chrome extension. Moved out of v0.10 — Web Store releases are manual and the extension's release cadence shouldn't gate on monorepo PRs. Likely lives in a separate hover-extension repo.

Manual smoke + bench run. Deferred to post-release. The bench is built to A/B prompt changes against a baseline; the first baseline is what gets published on v0.10. Running the bench pre-release just to "see if the prompt works" would burn ~25 min and meaningful API cost without producing a comparison data point. Iteration happens in v0.10.1+.
Playwright MCP wrapper for tab lifecycle (the "L2" item from the design — chrome.debugger-style auto-refocus when a tab closes). v0.10 deliberately ships prompt-only first — if bench data shows prompt engineering isn't enough, that's the next investment.

+53 tests (aider 16, gemini 19, qwen 18) on top of v0.9's 110 → 163 total.
+200 lines of bench script.
payment-provider App.tsx + index.css: ~290 new lines (form layout, validation, simulated latency, OTP step).
cdpHint.ts: +39 lines (3 new rule blocks).