v0.10.0
Highlights
Multi-tab agent reliability
The Stripe-checkout / OAuth-redirect / "Pay with X" popup flow has been the most brittle thing the agent does. v0.10 hardens it.
- System-prompt addendum β three new rules in
cdpHint.ts:- Rule 5 Popup-opening clicks β
browser_tabs(list)β switch to the popup β afterwindow.close, refocus the opener viabrowser_tabs(select, 0)and re-snapshot. The opener's DOM may have updated via postMessage handlers in the interim. - Rule 6 OAuth-style redirect chains where the same tab idx silently changes origin underneath the agent.
browser_tabs(list)URL is authoritative. - Rule 7 Cross-origin cookie/session updates: do NOT browser_navigate to same-origin to force a refresh (rule #2 still applies). The most likely cause is a slow / missing postMessage handler. Wait once via
browser_wait_for_text(expected, 3000)before reporting.
- Rule 5 Popup-opening clicks β
examples/payment-providerupgraded to a two-step card + OTP flow with simulated 600ms 3DS pre-check. Card4242 4242 4242 4242, OTP123456. Decline button on either step short-circuits to the samepayment-result: declinedpostMessage.data-testidon every interactive control so e2e specs / benches target them deterministically. postMessage shape unchanged β existingexamples/e-commercehandler still works.pnpm bench-multi-tabβ new benchmark script. Runs N iterations of the full e-commerce β PayHover checkout flow, reports success rate, median wall time, median turns, median cost. Companion topnpm bench-ttfb. Use it to A/B prompt changes across branches.
3 new agents β 6 supported now
| Agent | Sandbox | Install | Stream | MCP | System prompt |
|---|---|---|---|---|---|
aider |
soft | pipx install aider-chat |
plain-text only | β (no MCP yet) | preface prepended |
gemini-cli |
soft | npm install -g @google/gemini-cli |
stream-json |
~/.gemini/settings.json |
preface prepended |
qwen-code |
soft | npm install -g @qwen-code/qwen-code@latest |
stream-json (Anthropic-Messages envelope) |
~/.qwen/settings.json |
real --append-system-prompt |
Caveats:
aiderhas no MCP integration today, so picking aider from the Hover dropdown gets you an LLM chat with no browser-driving ability. Documented prominently in the descriptor header as a degraded mode.gemini-clihas no per-invocation system-prompt flag (third-party docs claim--append-system-promptexists; it doesn't β only theGEMINI_SYSTEM_MDenv var pointing at a markdown file, full replacement). HOVER preface prepends to the user prompt.qwen-codeis the only one of the three with a real--append-system-promptflag β the cleanest of the four soft-sandbox descriptors (codex / cursor / aider / gemini all prepend).
All three soft-sandbox, β badge in the dropdown.
Roadmap reshuffle
- v0.10 β (you are here)
- v0.11 β security mode recording semantics (unchanged)
- v0.12+ or sibling repo β Chrome extension. Moved out of v0.10 β Web Store releases are manual and the extension's release cadence shouldn't gate on monorepo PRs. Likely lives in a separate
hover-extensionrepo.
What's NOT in this release
- Manual smoke + bench run. Deferred to post-release. The bench is built to A/B prompt changes against a baseline; the first baseline is what gets published on v0.10. Running the bench pre-release just to "see if the prompt works" would burn ~25 min and meaningful API cost without producing a comparison data point. Iteration happens in v0.10.1+.
- Playwright MCP wrapper for tab lifecycle (the "L2" item from the design β
chrome.debugger-style auto-refocus when a tab closes). v0.10 deliberately ships prompt-only first β if bench data shows prompt engineering isn't enough, that's the next investment.
Internal stats
- +53 tests (aider 16, gemini 19, qwen 18) on top of v0.9's 110 β 163 total.
- +200 lines of bench script.
payment-providerApp.tsx + index.css: ~290 new lines (form layout, validation, simulated latency, OTP step).cdpHint.ts: +39 lines (3 new rule blocks).