Skip to content

Releases: pluginslab/android-agent-bridge

v0.4.2 — cookie injection working end-to-end

19 Apr 13:15

Choose a tag to compare

Fix for the Looper crash in browser_set_cookie / browser_clear_cookies.

Android's CookieManager.setCookie and removeAllCookies must be called from a thread with a running Looper. The v0.4.1 implementation ran them directly from Ktor coroutines (no Looper), which threw removeAllCookies must be called on a thread with a running Looper.

Moved the calls into BrowserManager.setCookie / clearCookies, which post to the main handler and bridge the result back via CompletableDeferred. Tool handlers now delegate to these suspend functions.

Verified end-to-end: injected LinkedIn session cookies (li_at, JSESSIONID, bscookie, bcookie) into the WebView's jar, navigated the authenticated feed, clicked through four "Show more feed updates" buttons, and scraped 43 posts directly via browser_eval — no screenshots, no OCR, no interactive login.

v0.4.1 — cookie jar injection

19 Apr 13:03

Choose a tag to compare

Two small tools: browser_set_cookie(url, cookie) and browser_clear_cookies(). Goes through Android CookieManager so HttpOnly cookies work. Lets you paste a session cookie from desktop Chrome DevTools and scrape authenticated sites without interactive login.

v0.4.0 — batch scripting (run_script + probe)

19 Apr 12:51

Choose a tag to compare

Cuts multi-step flow latency by eliminating model round-trips.

run_script — execute a whole sequence of ops in a single MCP call. Node predicates (text_contains, resource_id, etc.) are re-resolved on every step so ephemeral IDs never go stale. Supports:

  • Gestures: tap, tap_coords, long_press, swipe
  • Input: type_text, clear_text, send_key_events, paste
  • Navigation: open_url, launch_app, global_action
  • Waits: wait_for_node, wait_for_window, sleep, assert_node
  • Browser: browser_navigate, browser_eval
  • Output: capture — gathers find_nodes / screenshot / browser_screenshot / browser_info into named buckets

Returns a per-step trace ({step, op, ok, ms, error?, detail?}) and the captures.

probe — compact curated survey of the current screen. Active window plus up to N clickable / editable / focused / scrollable nodes plus distinct visible texts. Much cheaper to consume than get_ui_tree.

Typical pattern: one probe to learn the surface, one run_script to execute the whole flow.

v0.3.2 — stop auto-showing keyboard

19 Apr 12:42

Choose a tag to compare

Tiny polish: the URL bar no longer steals focus when BrowserActivity opens, and the manifest sets windowSoftInputMode=stateHidden so the soft keyboard stays down. The black strip at the bottom of the viewer was the keyboard popping up, not a layout bug.

v0.3.1 — visible embedded browser

19 Apr 12:37

Choose a tag to compare

Small patch on top of v0.3.0: the embedded WebView is now viewable inside AgentBridge.

Open AgentBridge → Open Embedded Browser for a simple chrome: URL bar, back, reload, full-width WebView below. Type a URL, hit Go.

Under the hood the WebView is a single instance owned by BrowserManager. The Activity borrows it on resume and returns it on pause, so MCP browser_eval / browser_console / browser_navigate continue to work whether the viewer is open or closed.

No new tools; no breaking changes from v0.3.0.

v0.3.0 — embedded headless browser + contributor docs

19 Apr 12:22

Choose a tag to compare

Real DevTools-without-CDP: an in-app headless WebView you can drive with JavaScript, with console output captured straight from WebChromeClient.onConsoleMessage.

New tools (all prefixed browser_):

  • browser_navigate — load a URL, wait for onPageFinished
  • browser_eval — run JavaScript, returns JSON-encoded result
  • browser_console — ring buffer of captured console.* messages
  • browser_info — current URL + document.title
  • browser_html — outerHTML (full doc or via CSS selector)
  • browser_screenshot — WebView rendering to PNG, full content height

The embedded browser is intentionally separate from Chrome — sessions don't carry over, which is actually the right thing for deterministic E2E tests.

Also: README now has a full Contributing section documenting every hard-won lesson from this project's tight inner loop — HyperOS/MIUI quirks, the crash-to-file trick, MCP transport quirks, the /mcp URL typo, and the adding-a-tool recipe.

Tested on Xiaomi Pad 8 Pro (HyperOS, Android 15). See README for setup.

v0.2.0 — screen capture + e2e tools

19 Apr 12:05

Choose a tag to compare

  • get_screenshot: PNG screenshots via MediaProjection. Open the app and tap Grant Screen Capture once per session. Returns MCP ImageContent (or data_url / base64 text).
  • get_notifications: rolling buffer of the last 50 notification events.
  • clear_text: empty an editable node's text.
  • scroll_to_text: swipe until a text match is visible.

Updated APK attached. MIUI optimization still needs to be off on HyperOS.

v0.1.0 — first public release

19 Apr 11:56

Choose a tag to compare

First tagged release of AndroidAgentBridge. Debug-signed APK attached — side-load and enable the accessibility service as described in the README.

Tool surface (22 tools):

  • UI introspection: get_active_window_info, get_ui_tree (multi-window), find_nodes, wait_for_node, wait_for_window
  • Gestures & input: tap_node, tap_coords, long_press_node, swipe, type_text, send_key_events, global_action, wait_for_idle
  • Clipboard: set_clipboard, get_clipboard, paste
  • App & intent: list_apps, launch_app, open_url, send_intent
  • Stub: get_screenshot (v2)

Tested on Xiaomi Pad 8 Pro (HyperOS, Android 15). See README for setup.