A TypeScript agent runtime and field manual for autonomous Android development. The project combines planning, implementation, compile-fix recovery, real-device verification, memory, and messaging adapters into a repeatable Android delivery control plane.
Android Autonomous Development Agent is the Android-specific implementation layer for the broader Autonomous AI Development Framework. It is not positioned as a chatbot or a code-completion toy. Its job is to turn Android development into an auditable loop:
Requirement
-> plan
-> implement with isolated agents
-> build
-> repair compiler failures
-> install APK on a real device
-> cold launch
-> inspect logcat
-> capture screenshot
-> review specification compliance
-> review code quality
-> record reusable lessons
The key standard is simple: code exists is not done. Build success is not done. Work is done only after the APK is built, installed, launched, checked through logcat, and visually verified when a device is available.
Most AI coding tools improve the model-facing part of software development: larger context windows, stronger frontier models, faster code generation. This project takes a different position: reliability should come from the delivery system, not only from the model.
The real value is not making Opus stronger — it is making affordable models good enough. That is the path to scalable AI-assisted development.
The practical bet is architecture over model size. Smaller and cheaper models can become useful for real delivery when the system gives them fresh task contexts, bounded retry loops, independent review, build feedback, device verification, and reusable memory.
| Area | Current State |
|---|---|
| Autonomy level | L4 validated, with L5 components under development |
| Primary target | Android apps using Kotlin, Jetpack Compose, React Native or Flutter Android targets |
| Runtime | Bun and TypeScript |
| Core loop | Plan, implement, review, build, install, verify, learn |
| Verification | Gradle, ADB install, explicit Activity launch, logcat crash scan, screenshot evidence |
| Memory | Pitfalls, reusable patterns, decisions, environment details, error fixes |
| Adapters | CLI, Telegram, Feishu/Lark |
| Validated projects | TransLite, CyberDiviner, Voyager AI Mobile, Hermes Mobile, AntiScamAI, CustomCam |
This repository uses the same L1-L5 taxonomy as the upstream Autonomous AI Development Framework, adapted from SAE J3016 autonomous driving levels.
| Level | Stage | Definition | Human Role |
|---|---|---|---|
| L1 | Code Completion | AI assists with autocomplete and suggestions. The developer remains the writer and decides what code enters the project. | Writer |
| L2 | Pair Development | AI generates code through dialogue. The human prompts, reviews, and assembles the output into a working change. | Reviewer |
| L3 | Semi-autonomous Agent | AI leads implementation across a scoped task, while the human supervises key checkpoints, decisions, and fixes. | Supervisor |
| L4 | Fully Autonomous Agent | AI executes the end-to-end delivery loop from plan to implementation, build, install, runtime verification, review, and repair. The human validates the final result. | Accepter |
| L5 | AI Development Team | Multiple specialized agents collaborate in parallel with planning, implementation, verification, review, memory, and quality gates coordinated automatically. | None |
Current position: this Android implementation targets L4 validated delivery and develops the Android-specific components needed for L5: planner, implementer, independent spec reviewer, independent quality reviewer, verifier, memory, research, compile-fix recovery, and real-device evidence capture.
| Path | Purpose |
|---|---|
src/framework/androidDevFramework.ts |
Main framework class: task execution, review loop, compile-fix-loop, phase execution, engine migration |
src/framework/androidPatterns.ts |
Field-tested Android patterns from TransLite, CyberDiviner, Voyager AI Mobile, and other projects |
src/agents/ |
Planner, implementer, reviewer, verifier, and web researcher agents |
src/memory/ |
Memory storage and retrieval for reusable development lessons |
src/services/llm/ |
OpenAI, Anthropic, Ollama, vLLM, and custom endpoint support |
src/gateway/ |
CLI, Telegram, and Feishu/Lark integration layer |
tests/ |
Unit tests for memory, dialogue, GitHub research, and message bus components |
graph TB
subgraph "Interfaces"
CLI[CLI]
TG[Telegram]
LK[Feishu or Lark]
end
subgraph "Gateway"
ROUTER[Message Router]
SESSION[Session Manager]
end
subgraph "Agent Runtime"
COORD[Coordinator]
PLAN[Planner]
IMPL[Implementer]
SPEC[Spec Reviewer]
QUAL[Quality Reviewer]
VERIFY[Verifier]
RESEARCH[Researcher]
end
subgraph "Android Control Plane"
BUILD[Gradle Build]
FIX[Compile-Fix Loop]
ADB[ADB Install and Launch]
LOGCAT[Logcat Analysis]
SHOT[Screenshot Evidence]
end
subgraph "Memory"
STORE[(Memory Store)]
PATTERNS[Validated Patterns]
end
CLI --> ROUTER
TG --> ROUTER
LK --> ROUTER
ROUTER --> SESSION
SESSION --> COORD
COORD --> PLAN
COORD --> IMPL
COORD --> SPEC
COORD --> QUAL
COORD --> VERIFY
COORD --> RESEARCH
VERIFY --> BUILD
BUILD --> FIX
FIX --> BUILD
VERIFY --> ADB
ADB --> LOGCAT
ADB --> SHOT
LOGCAT --> STORE
SHOT --> STORE
STORE --> PATTERNS
PATTERNS --> IMPL
PATTERNS --> FIX
Every Android task follows this gate unless the user explicitly scopes it as documentation-only or planning-only.
./gradlew assembleDebug
adb install -t -r app/build/outputs/apk/debug/app-debug.apk
adb shell am force-stop <package>
adb logcat -c
adb shell am start -n <package>/<activity>
sleep 5
adb logcat -d | grep -E "FATAL|AndroidRuntime|Exception|ANR" || true
adb shell screencap -p /sdcard/android-agent-check.png
adb pull /sdcard/android-agent-check.png ./artifacts/android-agent-check.pngCompletion requires:
- Build result recorded.
- APK installed or install failure explained with device-specific evidence.
- App launched through explicit Activity, not only a deep link.
- Logcat checked in a clean window after launch.
- Screenshot captured and inspected when a device is attached.
- Specification compliance review completed.
- Code quality review completed.
- Findings repaired, then build and device verification repeated.
For competition evaluation, the important artifact is not a demo video alone. It is a one-pass delivery record: one continuous run from requirement to implementation, test, typecheck, build, install, launch, logcat inspection, and screenshot evidence.
A validated run should produce the following evidence bundle:
| Stage | Evidence | Pass Criteria |
|---|---|---|
| Requirement intake | Task prompt and generated plan | Scope is explicit, bounded, and testable |
| Implementation | Git diff or patch set | Changes are traceable to the plan |
| Tests | Unit or integration test output | Test command exits successfully |
| Typecheck or lint | Static check output | No blocking type or lint errors |
| Android build | Gradle output and APK path | APK is produced successfully |
| Device install | ADB install output | Install returns success on the target device |
| Runtime launch | Explicit Activity launch command | App starts from a cold state |
| Logcat scan | Clean post-launch log window | No fatal crash, ANR, or AndroidRuntime exception |
| Visual verification | Device screenshot | UI is visible and corresponds to the requested feature |
| Review | Spec and quality review notes | No unresolved blocker before acceptance |
A one-pass validation run means the system completes the full delivery loop without manual code patching after the run starts. Human involvement is limited to the initial requirement and final acceptance. If the build or runtime fails, the agent may use its bounded compile-fix or debug-fix loop, but the repair must be driven by the system rather than by a human editing source files.
This is the standard used to separate a coding assistant from a delivery agent. A coding assistant can generate plausible files. A delivery agent must produce a running artifact and the evidence that it ran.
The compile-fix-loop is the main resilience mechanism. It parses Gradle output, groups errors by file, injects known fix patterns, asks the implementation agent for a targeted patch, and rebuilds. The loop is bounded by maxRounds, defaulting to 3.
import { AndroidDevFramework } from "android-autonomous-dev-agent";
const framework = new AndroidDevFramework({
llm: {
provider: "custom",
model: "local-model",
baseUrl: "http://localhost:8000",
},
});
const result = await framework.compileFixLoop(
"cd /path/to/android/project && ./gradlew assembleDebug",
{ maxRounds: 3 }
);
if (!result.success) {
console.error(result.finalBuildOutput);
}The pattern library is available directly:
import { ANDROID_FIELD_PATTERNS, formatPatternsForPrompt } from "android-autonomous-dev-agent";
console.log(formatPatternsForPrompt(["compile-fix", "runtime-verification"]));The upstream framework defines the operating model:
- Use fresh subagents for isolated tasks.
- Run independent specification and code-quality reviews.
- Treat build, install, launch, logcat, and screenshot as quality gates.
- Keep retry loops bounded.
- Store reusable lessons as memory or structured patterns.
- Split work so subagents write code while the parent session runs slow Gradle builds.
TransLite validated the L4 Android build loop on an offline translation application.
Reusable lessons:
- Long Gradle or R8 builds should run in the parent session, not inside code-writing subagents.
- Runtime engine migration should follow
migrateEngine: scan old references, implement the new engine behind the same interface, update consumers, adjust dependencies and ProGuard, then build-fix until clean. - Large on-device model support requires runtime download state, a foreground service, notification channel, model file verification, and keep rules for inference libraries.
- Model changes must account for APK size, device RAM, and co-installed apps.
Common fixes now represented in code:
- ML Kit language enum mismatches: use stable string language codes where the API expects strings.
- Missing coroutine Play Services dependency: add
kotlinx-coroutines-play-services. - R8 stripping MediaPipe or LiteRT classes: add explicit keep rules.
- Kotlin metadata mismatch: do not force-upgrade the entire Android toolchain; isolate newer libraries behind Java reflection when needed.
CyberDiviner contributed visually intensive Compose and Kotlin patterns.
Reusable lessons:
- Visual apps should use incremental phase planning: plan Phase N, execute, inspect output, then plan Phase N+1.
- Define the navigation skeleton early. Adding a screen means route constant, composable registration, callback parameter, menu entry, and callback wiring in one batch.
- For Chinese UI, audit every
Textcomposable and set the correct font family explicitly. - When adding enum or sealed-class values, update every
whenexpression in the same batch. - Canvas is the production path for complex visual geometry. ASCII is only acceptable for debug output.
- Secrets must be scanned before commit. Partially masked API keys are still secrets.
Voyager AI Mobile contributed mobile data-flow, Expo, React Native, and native-module failure lessons.
Reusable lessons:
- React Native long-running operations should use async job polling, not SSE and not one long HTTP response.
- Completion handlers must validate payloads strictly. Do not let polling
catchblocks swallow processing errors. - Backend fields added for downstream UI must pass through every normalizer. Silent data loss often occurs outside the visible component.
- Tool schema and system prompt must list the same required fields. If a field is critical, it cannot be optional.
- NativeModules access must be guarded. Direct top-level access can crash the app before screen code runs.
- Risky native integrations should use a separate package name or product flavor so the stable installed app is not overwritten.
| Project | Stack | What It Validated |
|---|---|---|
| TransLite | Kotlin, Compose, LiteRT or MediaPipe model runtime | L4 phase execution, engine migration, model download lifecycle, compile-fix recovery |
| CyberDiviner | Kotlin, Compose, Hilt, CameraX, MediaPipe | Incremental visual planning, navigation hardening, font rules, canvas rendering, secret hygiene |
| Voyager AI Mobile | Expo, React Native, Clerk, Mapbox, native Android bridge | Async polling, data-flow normalization, guarded NativeModules, parallel vSDK package isolation |
| Hermes Mobile | Flutter, Android, embedded Termux | Real-device install verification, Xiaomi device behavior, release discipline |
| AntiScamAI | Kotlin, Compose, foreground service | Recovery from corrupted source and Android service permission pitfalls |
| CustomCam | Kotlin, Camera2, NDK | Camera2 RAW capture, ImageReader pitfalls, native ISP path |
bun install
bun test
bun run typecheck
bun run src/main.tsGateway mode:
bun run src/main.ts --gatewayBuild:
bun run build- Real device evidence beats synthetic confidence.
- Build success is a gate, not a finish line.
- The implementation agent should not be the final reviewer.
- Subagents should not run long Gradle builds when timeout risk is high.
- Data-flow bugs require tracing source, normalizer, store, props, and component boundary.
- Native module failures must be treated as launch blockers until verified through logcat and screenshot.
- Public documentation should describe verified behavior only.
- The repository is an implementation scaffold and field-knowledge carrier, not a fully self-hosting Android IDE.
- Most validation is Android-focused. Web, iOS, and backend expansion are architectural targets, not equally verified paths.
- Automated Compose UI interaction through ADB remains fragile. Use log markers, screenshot inspection, or Compose test APIs where possible.
- The current ADB service is partly represented through framework commands and patterns; a dedicated typed ADB service remains a future implementation area.
MIT