Releases: pluginslab/android-agent-bridge
v0.4.2 — cookie injection working end-to-end
Fix for the Looper crash in browser_set_cookie / browser_clear_cookies.
Android's CookieManager.setCookie and removeAllCookies must be called from a thread with a running Looper. The v0.4.1 implementation ran them directly from Ktor coroutines (no Looper), which threw removeAllCookies must be called on a thread with a running Looper.
Moved the calls into BrowserManager.setCookie / clearCookies, which post to the main handler and bridge the result back via CompletableDeferred. Tool handlers now delegate to these suspend functions.
Verified end-to-end: injected LinkedIn session cookies (li_at, JSESSIONID, bscookie, bcookie) into the WebView's jar, navigated the authenticated feed, clicked through four "Show more feed updates" buttons, and scraped 43 posts directly via browser_eval — no screenshots, no OCR, no interactive login.
v0.4.1 — cookie jar injection
Two small tools: browser_set_cookie(url, cookie) and browser_clear_cookies(). Goes through Android CookieManager so HttpOnly cookies work. Lets you paste a session cookie from desktop Chrome DevTools and scrape authenticated sites without interactive login.
v0.4.0 — batch scripting (run_script + probe)
Cuts multi-step flow latency by eliminating model round-trips.
run_script — execute a whole sequence of ops in a single MCP call. Node predicates (text_contains, resource_id, etc.) are re-resolved on every step so ephemeral IDs never go stale. Supports:
- Gestures:
tap,tap_coords,long_press,swipe - Input:
type_text,clear_text,send_key_events,paste - Navigation:
open_url,launch_app,global_action - Waits:
wait_for_node,wait_for_window,sleep,assert_node - Browser:
browser_navigate,browser_eval - Output:
capture— gathersfind_nodes/ screenshot / browser_screenshot / browser_info into named buckets
Returns a per-step trace ({step, op, ok, ms, error?, detail?}) and the captures.
probe — compact curated survey of the current screen. Active window plus up to N clickable / editable / focused / scrollable nodes plus distinct visible texts. Much cheaper to consume than get_ui_tree.
Typical pattern: one probe to learn the surface, one run_script to execute the whole flow.
v0.3.2 — stop auto-showing keyboard
Tiny polish: the URL bar no longer steals focus when BrowserActivity opens, and the manifest sets windowSoftInputMode=stateHidden so the soft keyboard stays down. The black strip at the bottom of the viewer was the keyboard popping up, not a layout bug.
v0.3.1 — visible embedded browser
Small patch on top of v0.3.0: the embedded WebView is now viewable inside AgentBridge.
Open AgentBridge → Open Embedded Browser for a simple chrome: URL bar, back, reload, full-width WebView below. Type a URL, hit Go.
Under the hood the WebView is a single instance owned by BrowserManager. The Activity borrows it on resume and returns it on pause, so MCP browser_eval / browser_console / browser_navigate continue to work whether the viewer is open or closed.
No new tools; no breaking changes from v0.3.0.
v0.3.0 — embedded headless browser + contributor docs
Real DevTools-without-CDP: an in-app headless WebView you can drive with JavaScript, with console output captured straight from WebChromeClient.onConsoleMessage.
New tools (all prefixed browser_):
browser_navigate— load a URL, wait for onPageFinishedbrowser_eval— run JavaScript, returns JSON-encoded resultbrowser_console— ring buffer of capturedconsole.*messagesbrowser_info— current URL + document.titlebrowser_html— outerHTML (full doc or via CSS selector)browser_screenshot— WebView rendering to PNG, full content height
The embedded browser is intentionally separate from Chrome — sessions don't carry over, which is actually the right thing for deterministic E2E tests.
Also: README now has a full Contributing section documenting every hard-won lesson from this project's tight inner loop — HyperOS/MIUI quirks, the crash-to-file trick, MCP transport quirks, the /mcp URL typo, and the adding-a-tool recipe.
Tested on Xiaomi Pad 8 Pro (HyperOS, Android 15). See README for setup.
v0.2.0 — screen capture + e2e tools
- get_screenshot: PNG screenshots via MediaProjection. Open the app and tap Grant Screen Capture once per session. Returns MCP
ImageContent(ordata_url/base64text). - get_notifications: rolling buffer of the last 50 notification events.
- clear_text: empty an editable node's text.
- scroll_to_text: swipe until a text match is visible.
Updated APK attached. MIUI optimization still needs to be off on HyperOS.
v0.1.0 — first public release
First tagged release of AndroidAgentBridge. Debug-signed APK attached — side-load and enable the accessibility service as described in the README.
Tool surface (22 tools):
- UI introspection:
get_active_window_info,get_ui_tree(multi-window),find_nodes,wait_for_node,wait_for_window - Gestures & input:
tap_node,tap_coords,long_press_node,swipe,type_text,send_key_events,global_action,wait_for_idle - Clipboard:
set_clipboard,get_clipboard,paste - App & intent:
list_apps,launch_app,open_url,send_intent - Stub:
get_screenshot(v2)
Tested on Xiaomi Pad 8 Pro (HyperOS, Android 15). See README for setup.