Skip to content

v0.10.0

Choose a tag to compare

@github-actions github-actions released this 29 May 08:23
· 122 commits to main since this release
1ac88e0

Highlights

Multi-tab agent reliability

The Stripe-checkout / OAuth-redirect / "Pay with X" popup flow has been the most brittle thing the agent does. v0.10 hardens it.

  • System-prompt addendum β€” three new rules in cdpHint.ts:
    • Rule 5 Popup-opening clicks β†’ browser_tabs(list) β†’ switch to the popup β†’ after window.close, refocus the opener via browser_tabs(select, 0) and re-snapshot. The opener's DOM may have updated via postMessage handlers in the interim.
    • Rule 6 OAuth-style redirect chains where the same tab idx silently changes origin underneath the agent. browser_tabs(list) URL is authoritative.
    • Rule 7 Cross-origin cookie/session updates: do NOT browser_navigate to same-origin to force a refresh (rule #2 still applies). The most likely cause is a slow / missing postMessage handler. Wait once via browser_wait_for_text(expected, 3000) before reporting.
  • examples/payment-provider upgraded to a two-step card + OTP flow with simulated 600ms 3DS pre-check. Card 4242 4242 4242 4242, OTP 123456. Decline button on either step short-circuits to the same payment-result: declined postMessage. data-testid on every interactive control so e2e specs / benches target them deterministically. postMessage shape unchanged β†’ existing examples/e-commerce handler still works.
  • pnpm bench-multi-tab β€” new benchmark script. Runs N iterations of the full e-commerce β†’ PayHover checkout flow, reports success rate, median wall time, median turns, median cost. Companion to pnpm bench-ttfb. Use it to A/B prompt changes across branches.

3 new agents β€” 6 supported now

Agent Sandbox Install Stream MCP System prompt
aider soft pipx install aider-chat plain-text only β€” (no MCP yet) preface prepended
gemini-cli soft npm install -g @google/gemini-cli stream-json ~/.gemini/settings.json preface prepended
qwen-code soft npm install -g @qwen-code/qwen-code@latest stream-json (Anthropic-Messages envelope) ~/.qwen/settings.json real --append-system-prompt

Caveats:

  • aider has no MCP integration today, so picking aider from the Hover dropdown gets you an LLM chat with no browser-driving ability. Documented prominently in the descriptor header as a degraded mode.
  • gemini-cli has no per-invocation system-prompt flag (third-party docs claim --append-system-prompt exists; it doesn't β€” only the GEMINI_SYSTEM_MD env var pointing at a markdown file, full replacement). HOVER preface prepends to the user prompt.
  • qwen-code is the only one of the three with a real --append-system-prompt flag β€” the cleanest of the four soft-sandbox descriptors (codex / cursor / aider / gemini all prepend).

All three soft-sandbox, ⚠ badge in the dropdown.

Roadmap reshuffle

  • v0.10 βœ“ (you are here)
  • v0.11 β€” security mode recording semantics (unchanged)
  • v0.12+ or sibling repo β€” Chrome extension. Moved out of v0.10 β€” Web Store releases are manual and the extension's release cadence shouldn't gate on monorepo PRs. Likely lives in a separate hover-extension repo.

What's NOT in this release

  • Manual smoke + bench run. Deferred to post-release. The bench is built to A/B prompt changes against a baseline; the first baseline is what gets published on v0.10. Running the bench pre-release just to "see if the prompt works" would burn ~25 min and meaningful API cost without producing a comparison data point. Iteration happens in v0.10.1+.
  • Playwright MCP wrapper for tab lifecycle (the "L2" item from the design β€” chrome.debugger-style auto-refocus when a tab closes). v0.10 deliberately ships prompt-only first β€” if bench data shows prompt engineering isn't enough, that's the next investment.

Internal stats

  • +53 tests (aider 16, gemini 19, qwen 18) on top of v0.9's 110 β†’ 163 total.
  • +200 lines of bench script.
  • payment-provider App.tsx + index.css: ~290 new lines (form layout, validation, simulated latency, OTP step).
  • cdpHint.ts: +39 lines (3 new rule blocks).

Full diff

v0.9.0...v0.10.0