Turn an Android device into a remote-controllable automation surface for LLM agents, over LAN.
The app runs a tiny Ktor + MCP server on the device. An accessibility service reads the UI tree and dispatches gestures. Any MCP-capable client (Claude Code, Claude Desktop, a custom agent) on the same network can read the screen and perform taps, swipes, text input, and intent launches through a bearer-authenticated HTTP/SSE endpoint.
Built for end-to-end testing, lightweight device automation, and "claude, do something on my tablet" demos. v1 is debug-signed and side-loaded; no Play Store distribution.
v0.4.2 — stable batch scripting + cookie injection for the embedded browser. Tested on a Xiaomi Pad 8 Pro (HyperOS, Android 15) and a MacBook client. Other devices/Android versions are untested but nothing is hardware-specific.
See the GitHub Releases page for the full history. Recent stops:
- 0.4.2 Cookie injection fix —
CookieManagercalls must run on a thread with a Looper, so they're now dispatched throughBrowserManager's main handler.browser_set_cookieconfirmed working end-to-end with LinkedIn (authenticated feed scraping). - 0.4.1
browser_set_cookie/browser_clear_cookies— inject HttpOnly session cookies into the WebView's jar. - 0.4.0
run_script+probe— batch a multi-step flow in a single MCP call, no per-step model round-trip. - 0.3.1–0.3.2 Visible embedded-browser UI (URL bar + back/reload), no-auto-keyboard polish.
- 0.3.0 Embedded headless WebView with JS eval + console capture.
get_active_window_info— foreground package + window title, cheap.get_ui_tree— snapshot all interactive windows (app, status bar, IME, system overlays) under a syntheticAllWindowsroot.single_window=truereverts to legacyrootInActiveWindowbehavior. Output inindented_text(token-efficient) orjson.find_nodes— predicate query across the tree. Matches ontext_contains,text_equals,resource_id,class_name(substring),content_description_contains, plus boolean filters forclickable/editable/focused. Flat result list, far cheaper than a full tree.wait_for_node— same predicate asfind_nodes, polls until match or timeout. Replaces sleep-based flakiness in tests.wait_for_window— polls until a given package is foreground.get_notifications— rolling buffer of the last 50 notification events captured by the accessibility service (package, texts, timestamp). Filter bypackageorsince_ms.get_screenshot— PNG screenshot via MediaProjection. Requires a one-time-per-session consent via the app's Grant Screen Capture button. Returns an MCPImageContentblock by default;format="data_url"or"base64"for text output.
tap_node— tap a node by its registry ID. PrefersACTION_CLICK, falls back to a center-coord gesture.tap_coords— absolute-coordinate tap.long_press_node— long-press a node.swipe— straight-line swipe with configurable duration.scroll_to_text— swipes up tomax_swipestimes until a node containing the target text is visible, then returns it. Direction:down(default) orup.type_text—ACTION_SET_TEXTon a focused editable node. Does not work for terminals and other non-editable views (seeset_clipboard+paste).clear_text— empty out an editable node's text.send_key_events—ENTER,BACKSPACE,DELETE,TAB,ESCAPE, arrows,HOME/END,PAGE_UP/PAGE_DOWN,SPACE. Best-effort — Android a11y cannot inject arbitrary keystrokes without IME-level privileges.global_action—back,home,recents,notifications,quick_settings,power_dialog.wait_for_idle— wait until no a11y events have fired for a quiet period. Useful after triggering an animation.
set_clipboard— set the Android primary clipboard text.get_clipboard— read the primary clipboard. Subject to per-OEM restrictions; observed to work on HyperOS.paste—ACTION_PASTEon a node (or the focused one). For terminals you'd need to orchestrate long-press → Paste manually.
list_apps— installed apps with package + label, optional substring filter. Defaults to launcher-visible only.launch_app— foreground an app by package name.open_url—ACTION_VIEWwith a URL, optionally targeted at a specific package.send_intent— genericIntent.ACTION_*dispatch withdata,package,component,type,categories,flags, andextras. Always runs asstartActivity(FLAG_ACTIVITY_NEW_TASK)plus any flags you add.
A second browser lives inside AgentBridge — an Android WebView with a simple URL bar (tap Open Embedded Browser in the app). It's sandboxed from Chrome, so logged-in sessions and cookies don't carry over from Chrome by default — which is actually a feature for deterministic E2E testing. The tools drive it from outside; the UI lets you inspect what's happening.
browser_navigate— load a URL, wait foronPageFinished. Clears the console buffer on each call.browser_eval— run JavaScript, returns the JSON-encoded result. Multi-statement? Wrap in an IIFE and return.browser_console— ring buffer (last 500) of capturedconsole.*messages with level, source, and line.browser_info— current URL +document.title.browser_html—documentElement.outerHTML, or pass a CSSselectorto get a single element'souterHTML.browser_screenshot— draws the WebView's current rendering to a PNG. Captures full content height, below the fold included. No MediaProjection needed.browser_set_cookie— inject a cookie into the WebView's jar via Android's nativeCookieManager. Works forHttpOnlycookies that JavaScript can't set (e.g. LinkedIn'sli_at). Typical workflow: in desktop Chrome's DevTools → Application → Cookies, copy the session cookie value, paste it here, thenbrowser_navigateto an authenticated URL.browser_clear_cookies— reset the jar.
Eliminates model round-trips for multi-step flows. The agent composes a sequence of ops once, the device executes them at device speed.
probe— compact, curated screen survey: active window + up to N clickable/editable/focused/scrollable nodes + distinct visible texts. Much cheaper to consume than a fullget_ui_treewhen you just want to decide "what's actionable here?". Populates the node registry so IDs stay usable with subsequent calls.run_script— execute a sequence of device-side ops in a single MCP call. Node predicates (text_contains,resource_id,class_name, …) are re-resolved on every step, so ephemeral node IDs never go stale. Ops supported include everything under gestures & input, app & intent launching, and embedded browser, pluswait_for_node,wait_for_window,sleep,assert_node, andcapture(gathers final state:find_nodesresults, device screenshot, browser screenshot,browser_info, active window — into named buckets in the return value). Returns a per-step trace ({step, op, ok, ms, error?, detail?}) so failures are diagnosable.
Typical pattern on a new flow: one probe to learn the surface, one run_script to execute the whole thing.
- Build a debug APK (or grab one from a release):
gradle :app:assembleDebug # APK at: app/build/outputs/apk/debug/app-debug.apk - Side-load the APK (file manager → tap → install). On HyperOS/MIUI you need Developer options → MIUI optimization off before accessibility services can be enabled for sideloaded apps.
- Launch AgentBridge → Open Accessibility Settings → enable
AgentBridge. - Tap Start Server. The notification shows
http://<lan-ip>:8080. Copy the bearer token from the app.
claude mcp add --transport sse android-pad http://<pad-ip>:8080/ \
--header "Authorization: Bearer <paste-token-from-app>"Transport is legacy SSE (not Streamable HTTP). Base path is /, not /mcp.
For Claude Desktop, the equivalent config:
{
"mcpServers": {
"android-pad": {
"type": "sse",
"url": "http://<pad-ip>:8080/",
"headers": {
"Authorization": "Bearer <paste-token-from-app>"
}
}
}
}- Kotlin, min SDK 29, target SDK 34, compile SDK 35
- Ktor 3.x (CIO engine — smaller footprint than Netty on Android)
io.modelcontextprotocol:kotlin-sdk-server- kotlinx.serialization, kotlinx.coroutines
- Single debug-signed APK, side-loaded. No DI, no release build hardening.
┌────────────────────┐ LAN / SSE ┌──────────────────────────────┐
│ Client (Mac, ...) │◄────────────────────────►│ Android device │
│ MCP client │ Bearer token auth │ ┌────────────────────────┐ │
└────────────────────┘ │ │ McpForegroundService │ │
│ │ ├─ Ktor + MCP SDK │ │
│ │ └─ ToolRegistry │ │
│ │ │ │
│ │ BridgeAccessibilityService │
│ │ ├─ TreeSnapshotter │ │
│ │ ├─ NodeRegistry │ │
│ │ └─ GestureDispatcher │ │
│ └────────────────────────┘ │
└──────────────────────────────┘
- McpForegroundService — long-running foreground service, persistent notification, owns the Ktor server.
- BridgeAccessibilityService — single a11y service, self-reference exposed via companion so tool handlers can reach it.
- TreeSnapshotter — walks
service.windowsand builds a serializableTreeNodetree under a syntheticAllWindowsroot. Registers each live node intoNodeRegistry. - NodeRegistry —
Int → AccessibilityNodeInfomap. Cleared on everyget_ui_tree/find_nodes/wait_for_nodecall. IDs are valid only until the next snapshot. - ToolRegistry — registers all MCP tool handlers with the
Server.
- HyperOS dock is invisible.
AccessibilityService.getWindows()does not return the HyperOS tablet dock as a separate window, and it's not inside the launcher's accessibility tree either. Workarounds:send_intent/launch_appbypass the launcher entirely, ortap_coordswith a known y-offset. - Terminal text input is not supported.
type_textusesACTION_SET_TEXT, which only works on editableEditText-like nodes. Terminals (Termux etc.) need a different path — thepastetool plusset_clipboardis usable if you orchestrate the long-press → Paste menu manually. send_key_eventscan't inject arbitrary keystrokes. The a11y API exposesACTION_IME_ENTERand a few other node actions, but not rawKeyEventinjection. Full support would require shipping a companion IME.get_screenshotneeds per-session consent. MediaProjection requires a user-visible "Start now" confirmation when the permission is requested. One grant lasts until the app is force-stopped or the user revokes. Fine for interactive sessions, awkward for unattended CI.- Clipboard access is OEM-dependent. Android 10+ restricts
ClipboardManager.getPrimaryClip()to foreground / default-IME apps. HyperOS appears to allow it from our a11y service, but this isn't guaranteed. - Cleartext HTTP + bearer token. Fine on a trusted LAN; don't expose the port to the internet without a TLS proxy.
This is a hobby/research project — PRs and issues welcome. The feedback loop feels tight once you've internalised the ritual, but the first time through there are a surprising number of small things that can trip you up. Below is everything we've learned running it day-to-day.
Anywhere with JDK 17 + Android SDK 35 works (Mac, Linux, even Termux on the target device itself).
gradle :app:assembleDebug
# APK at app/build/outputs/apk/debug/app-debug.apkSide-load on the target device. No signing configuration needed for debug.
Once the app is installed and running, the minimum repro cycle after each edit is:
gradle :app:assembleDebug— incremental builds are 5–15s.- Push the APK (e.g.
scp/adb install -r/ copy to/sdcard/Downloadand tap). - Tap-install on the device → Update (replaces the prior install).
- Re-enable the accessibility toggle — see HyperOS gotcha below; on most OEMs it survives updates but HyperOS drops it.
- Start Server in the app if it isn't still running.
- In your MCP client, reconnect to pick up new tools (Claude Code:
/mcp→ reconnectandroid-pad). Tool schemas are cached at connect time — without a reconnect new tools are invisible to the client.
Steps 4–6 are annoying but mandatory. Build muscle memory for them.
- MIUI optimization must be OFF before any sideloaded accessibility service can be enabled. Developer options → MIUI optimization → off → reboot. Without this the Accessibility toggle either doesn't appear or silently refuses to flip.
- The accessibility toggle gets dropped after most APK updates. Same package, same signing — doesn't matter. Re-enable manually: Settings → Accessibility → AgentBridge → On.
- HyperOS dock is invisible to
service.windows. The tablet dock is drawn on screen but isn't exposed through the accessibility API at all, even withflagRetrieveInteractiveWindows. Don't waste time trying to tap dock icons — uselaunch_app/open_url/send_intentinstead, which bypass the launcher entirely. - Background activity starts from Termux uid are blocked. You can SSH in and run
am start, but HyperOS will silently refuse to foreground the target. This is why the bridge has its ownopen_url/launch_app/send_intent— they run from the accessibility service's context, which has more latitude. - HyperOS doesn't let a regular app read
logcatfrom other UIDs. See "Debugging crashes without ADB" below.
The app installs a Thread.setDefaultUncaughtExceptionHandler in App.kt that writes the stack trace to /sdcard/Download/agentbridge-crash.txt. This is readable from Termux or over scp, so you don't need root or ADB to see why the app died.
# From Termux on the device:
cat /sdcard/Download/agentbridge-crash.txt
# From your laptop:
scp <pad>:/sdcard/Download/agentbridge-crash.txt .If you're about to reproduce a crash, delete the file first so you know the new trace is fresh.
- The server speaks legacy SSE, not Streamable HTTP. When wiring up clients, the transport type must be
sseand the base URL is/, not/mcp. Example for Claude Code:claude mcp add --transport sse android-pad http://<pad-ip>:8080/ \ --header "Authorization: Bearer <token>"
- The URL shown in
SettingsActivitysays/mcp— known cosmetic bug. The actual endpoint is/. Ignore the displayed path when configuring clients. (Fix is coming; the typo predates the transport switch.) - Tool lists are snapshotted at connect time. If you've added, removed, or renamed a tool, the client won't see it until you disconnect + reconnect. In Claude Code:
/mcpdialog → pick the server → disconnect → reconnect.
- One consent dialog per capture session. Survives until the app is force-stopped or the user revokes via the notification shade.
- On Android 14+, there's a tight time window between the user granting consent and the service successfully creating the
VirtualDisplay. If screen capture crashes right after granting, it's usually aSecurityExceptionabout using the MediaProjection token after / before the foreground service was properly started. Check/sdcard/Download/agentbridge-crash.txt. - The capture service has its own foreground notification separate from the MCP server's. Both show in the shade when capture is active.
Building inside Termux + proot-debian works but has one wrinkle: AGP bundles an aapt2 that's x86_64-only. On an arm64 device this silently fails with cryptic errors. Workaround in gradle.properties:
android.aapt2FromMavenOverride=/path/to/arm64-aapt2
Most contributors won't hit this because most contributors will build on x86_64 hosts.
Minimal recipe:
- Add
registerMyTool(server: Server)toToolRegistry.kt, copying the shape of a neighbouring tool. Each tool gets: a name, a description (the client sees this — make it specific), an input schema, and a handler. - Reference the new registration from
registerAllat the top of the object. - Rebuild, reinstall, reconnect MCP. Call it.
Rules of thumb that match the existing style:
- Keep the input schema tight. Only add a parameter if it changes behaviour meaningfully.
- Prefer
jsonResultfor structured output,textResultfor strings,ImageContentfor PNGs. - If the tool needs the accessibility service, call
getService() ?: return@addTool serviceError()first. If it just needs aContext, grabsvc.applicationContext. NodeRegistryIDs are ephemeral — valid only until the nextget_ui_tree/find_nodes/wait_for_node/scroll_to_text/probecall. Tools that operate on a node ID should be called soon after the snapshot that produced it.- If the new op is also useful inside a scripted flow, add a branch for it in
runScriptStep(...)sorun_scriptcan dispatch it.
If it's just JavaScript, browser_eval probably covers it from the client side — no new tool needed. Add a dedicated tool only when:
- The feature requires the native Android surface (e.g.
CookieManagerforHttpOnlycookies, clipboard, downloads, file pickers). - You want idiomatic input/output types instead of asking callers to stringify their own JS.
The BrowserManager singleton owns the WebView. If your new tool needs a WebView-level hook (e.g. intercepting requests, subscribing to load events), add it there and expose a thin wrapper in ToolRegistry. The BrowserActivity borrows the WebView on resume and returns it on pause — don't assume the Activity is alive, but do assume the WebView is (BrowserManager.ensureInitialized is safe to call).
Reproducers that include:
- Android version + OEM skin (HyperOS, One UI, Pixel, stock AOSP…)
- AgentBridge version (from SettingsActivity — still shows Server URL with a typo, use
versionNamefrombuild.gradle.kts) - Output of
find_nodesorget_ui_treenear the failure - Contents of
agentbridge-crash.txtif the app died
are massively easier to act on than "it doesn't work". If you can record the exact sequence of tool calls that led to the failure, even better.
MIT — see LICENSE.
Built by Pluginslab as an experiment in LLM-driven device automation.
