Skip to content

sinonchum/android-autonomous-dev-agent

Repository files navigation

Android Autonomous Development Agent

Real-device autonomous Android delivery loop

A TypeScript agent runtime and field manual for autonomous Android development. The project combines planning, implementation, compile-fix recovery, real-device verification, memory, and messaging adapters into a repeatable Android delivery control plane.

License: MIT Built with TypeScript Runtime Bun

English | 中文


Positioning

Android Autonomous Development Agent is the Android-specific implementation layer for the broader Autonomous AI Development Framework. It is not positioned as a chatbot or a code-completion toy. Its job is to turn Android development into an auditable loop:

Requirement
  -> plan
  -> implement with isolated agents
  -> build
  -> repair compiler failures
  -> install APK on a real device
  -> cold launch
  -> inspect logcat
  -> capture screenshot
  -> review specification compliance
  -> review code quality
  -> record reusable lessons

The key standard is simple: code exists is not done. Build success is not done. Work is done only after the APK is built, installed, launched, checked through logcat, and visually verified when a device is available.

Why This Matters

Most AI coding tools improve the model-facing part of software development: larger context windows, stronger frontier models, faster code generation. This project takes a different position: reliability should come from the delivery system, not only from the model.

The real value is not making Opus stronger — it is making affordable models good enough. That is the path to scalable AI-assisted development.

The practical bet is architecture over model size. Smaller and cheaper models can become useful for real delivery when the system gives them fresh task contexts, bounded retry loops, independent review, build feedback, device verification, and reusable memory.

Executive Summary

Area Current State
Autonomy level L4 validated, with L5 components under development
Primary target Android apps using Kotlin, Jetpack Compose, React Native or Flutter Android targets
Runtime Bun and TypeScript
Core loop Plan, implement, review, build, install, verify, learn
Verification Gradle, ADB install, explicit Activity launch, logcat crash scan, screenshot evidence
Memory Pitfalls, reusable patterns, decisions, environment details, error fixes
Adapters CLI, Telegram, Feishu/Lark
Validated projects TransLite, CyberDiviner, Voyager AI Mobile, Hermes Mobile, AntiScamAI, CustomCam

AI Development Autonomy Levels

This repository uses the same L1-L5 taxonomy as the upstream Autonomous AI Development Framework, adapted from SAE J3016 autonomous driving levels.

Level Stage Definition Human Role
L1 Code Completion AI assists with autocomplete and suggestions. The developer remains the writer and decides what code enters the project. Writer
L2 Pair Development AI generates code through dialogue. The human prompts, reviews, and assembles the output into a working change. Reviewer
L3 Semi-autonomous Agent AI leads implementation across a scoped task, while the human supervises key checkpoints, decisions, and fixes. Supervisor
L4 Fully Autonomous Agent AI executes the end-to-end delivery loop from plan to implementation, build, install, runtime verification, review, and repair. The human validates the final result. Accepter
L5 AI Development Team Multiple specialized agents collaborate in parallel with planning, implementation, verification, review, memory, and quality gates coordinated automatically. None

Current position: this Android implementation targets L4 validated delivery and develops the Android-specific components needed for L5: planner, implementer, independent spec reviewer, independent quality reviewer, verifier, memory, research, compile-fix recovery, and real-device evidence capture.

What This Repository Contains

Path Purpose
src/framework/androidDevFramework.ts Main framework class: task execution, review loop, compile-fix-loop, phase execution, engine migration
src/framework/androidPatterns.ts Field-tested Android patterns from TransLite, CyberDiviner, Voyager AI Mobile, and other projects
src/agents/ Planner, implementer, reviewer, verifier, and web researcher agents
src/memory/ Memory storage and retrieval for reusable development lessons
src/services/llm/ OpenAI, Anthropic, Ollama, vLLM, and custom endpoint support
src/gateway/ CLI, Telegram, and Feishu/Lark integration layer
tests/ Unit tests for memory, dialogue, GitHub research, and message bus components

Core Architecture

graph TB
    subgraph "Interfaces"
        CLI[CLI]
        TG[Telegram]
        LK[Feishu or Lark]
    end

    subgraph "Gateway"
        ROUTER[Message Router]
        SESSION[Session Manager]
    end

    subgraph "Agent Runtime"
        COORD[Coordinator]
        PLAN[Planner]
        IMPL[Implementer]
        SPEC[Spec Reviewer]
        QUAL[Quality Reviewer]
        VERIFY[Verifier]
        RESEARCH[Researcher]
    end

    subgraph "Android Control Plane"
        BUILD[Gradle Build]
        FIX[Compile-Fix Loop]
        ADB[ADB Install and Launch]
        LOGCAT[Logcat Analysis]
        SHOT[Screenshot Evidence]
    end

    subgraph "Memory"
        STORE[(Memory Store)]
        PATTERNS[Validated Patterns]
    end

    CLI --> ROUTER
    TG --> ROUTER
    LK --> ROUTER
    ROUTER --> SESSION
    SESSION --> COORD
    COORD --> PLAN
    COORD --> IMPL
    COORD --> SPEC
    COORD --> QUAL
    COORD --> VERIFY
    COORD --> RESEARCH
    VERIFY --> BUILD
    BUILD --> FIX
    FIX --> BUILD
    VERIFY --> ADB
    ADB --> LOGCAT
    ADB --> SHOT
    LOGCAT --> STORE
    SHOT --> STORE
    STORE --> PATTERNS
    PATTERNS --> IMPL
    PATTERNS --> FIX
Loading

Mandatory Android Delivery Gate

Every Android task follows this gate unless the user explicitly scopes it as documentation-only or planning-only.

./gradlew assembleDebug
adb install -t -r app/build/outputs/apk/debug/app-debug.apk
adb shell am force-stop <package>
adb logcat -c
adb shell am start -n <package>/<activity>
sleep 5
adb logcat -d | grep -E "FATAL|AndroidRuntime|Exception|ANR" || true
adb shell screencap -p /sdcard/android-agent-check.png
adb pull /sdcard/android-agent-check.png ./artifacts/android-agent-check.png

Completion requires:

  1. Build result recorded.
  2. APK installed or install failure explained with device-specific evidence.
  3. App launched through explicit Activity, not only a deep link.
  4. Logcat checked in a clean window after launch.
  5. Screenshot captured and inspected when a device is attached.
  6. Specification compliance review completed.
  7. Code quality review completed.
  8. Findings repaired, then build and device verification repeated.

Validation Stage

For competition evaluation, the important artifact is not a demo video alone. It is a one-pass delivery record: one continuous run from requirement to implementation, test, typecheck, build, install, launch, logcat inspection, and screenshot evidence.

A validated run should produce the following evidence bundle:

Stage Evidence Pass Criteria
Requirement intake Task prompt and generated plan Scope is explicit, bounded, and testable
Implementation Git diff or patch set Changes are traceable to the plan
Tests Unit or integration test output Test command exits successfully
Typecheck or lint Static check output No blocking type or lint errors
Android build Gradle output and APK path APK is produced successfully
Device install ADB install output Install returns success on the target device
Runtime launch Explicit Activity launch command App starts from a cold state
Logcat scan Clean post-launch log window No fatal crash, ANR, or AndroidRuntime exception
Visual verification Device screenshot UI is visible and corresponds to the requested feature
Review Spec and quality review notes No unresolved blocker before acceptance

One-Pass Delivery Criterion

A one-pass validation run means the system completes the full delivery loop without manual code patching after the run starts. Human involvement is limited to the initial requirement and final acceptance. If the build or runtime fails, the agent may use its bounded compile-fix or debug-fix loop, but the repair must be driven by the system rather than by a human editing source files.

This is the standard used to separate a coding assistant from a delivery agent. A coding assistant can generate plausible files. A delivery agent must produce a running artifact and the evidence that it ran.

Compile-Fix Recovery Logic

The compile-fix-loop is the main resilience mechanism. It parses Gradle output, groups errors by file, injects known fix patterns, asks the implementation agent for a targeted patch, and rebuilds. The loop is bounded by maxRounds, defaulting to 3.

import { AndroidDevFramework } from "android-autonomous-dev-agent";

const framework = new AndroidDevFramework({
  llm: {
    provider: "custom",
    model: "local-model",
    baseUrl: "http://localhost:8000",
  },
});

const result = await framework.compileFixLoop(
  "cd /path/to/android/project && ./gradlew assembleDebug",
  { maxRounds: 3 }
);

if (!result.success) {
  console.error(result.finalBuildOutput);
}

The pattern library is available directly:

import { ANDROID_FIELD_PATTERNS, formatPatternsForPrompt } from "android-autonomous-dev-agent";

console.log(formatPatternsForPrompt(["compile-fix", "runtime-verification"]));

Field Lessons Integrated From Recent Projects

Autonomous AI Development Framework

The upstream framework defines the operating model:

  • Use fresh subagents for isolated tasks.
  • Run independent specification and code-quality reviews.
  • Treat build, install, launch, logcat, and screenshot as quality gates.
  • Keep retry loops bounded.
  • Store reusable lessons as memory or structured patterns.
  • Split work so subagents write code while the parent session runs slow Gradle builds.

TransLite

TransLite validated the L4 Android build loop on an offline translation application.

Reusable lessons:

  • Long Gradle or R8 builds should run in the parent session, not inside code-writing subagents.
  • Runtime engine migration should follow migrateEngine: scan old references, implement the new engine behind the same interface, update consumers, adjust dependencies and ProGuard, then build-fix until clean.
  • Large on-device model support requires runtime download state, a foreground service, notification channel, model file verification, and keep rules for inference libraries.
  • Model changes must account for APK size, device RAM, and co-installed apps.

Common fixes now represented in code:

  • ML Kit language enum mismatches: use stable string language codes where the API expects strings.
  • Missing coroutine Play Services dependency: add kotlinx-coroutines-play-services.
  • R8 stripping MediaPipe or LiteRT classes: add explicit keep rules.
  • Kotlin metadata mismatch: do not force-upgrade the entire Android toolchain; isolate newer libraries behind Java reflection when needed.

CyberDiviner

CyberDiviner contributed visually intensive Compose and Kotlin patterns.

Reusable lessons:

  • Visual apps should use incremental phase planning: plan Phase N, execute, inspect output, then plan Phase N+1.
  • Define the navigation skeleton early. Adding a screen means route constant, composable registration, callback parameter, menu entry, and callback wiring in one batch.
  • For Chinese UI, audit every Text composable and set the correct font family explicitly.
  • When adding enum or sealed-class values, update every when expression in the same batch.
  • Canvas is the production path for complex visual geometry. ASCII is only acceptable for debug output.
  • Secrets must be scanned before commit. Partially masked API keys are still secrets.

Voyager AI Mobile

Voyager AI Mobile contributed mobile data-flow, Expo, React Native, and native-module failure lessons.

Reusable lessons:

  • React Native long-running operations should use async job polling, not SSE and not one long HTTP response.
  • Completion handlers must validate payloads strictly. Do not let polling catch blocks swallow processing errors.
  • Backend fields added for downstream UI must pass through every normalizer. Silent data loss often occurs outside the visible component.
  • Tool schema and system prompt must list the same required fields. If a field is critical, it cannot be optional.
  • NativeModules access must be guarded. Direct top-level access can crash the app before screen code runs.
  • Risky native integrations should use a separate package name or product flavor so the stable installed app is not overwritten.

Verified Projects

Project Stack What It Validated
TransLite Kotlin, Compose, LiteRT or MediaPipe model runtime L4 phase execution, engine migration, model download lifecycle, compile-fix recovery
CyberDiviner Kotlin, Compose, Hilt, CameraX, MediaPipe Incremental visual planning, navigation hardening, font rules, canvas rendering, secret hygiene
Voyager AI Mobile Expo, React Native, Clerk, Mapbox, native Android bridge Async polling, data-flow normalization, guarded NativeModules, parallel vSDK package isolation
Hermes Mobile Flutter, Android, embedded Termux Real-device install verification, Xiaomi device behavior, release discipline
AntiScamAI Kotlin, Compose, foreground service Recovery from corrupted source and Android service permission pitfalls
CustomCam Kotlin, Camera2, NDK Camera2 RAW capture, ImageReader pitfalls, native ISP path

Quick Start

bun install
bun test
bun run typecheck
bun run src/main.ts

Gateway mode:

bun run src/main.ts --gateway

Build:

bun run build

Development Principles

  1. Real device evidence beats synthetic confidence.
  2. Build success is a gate, not a finish line.
  3. The implementation agent should not be the final reviewer.
  4. Subagents should not run long Gradle builds when timeout risk is high.
  5. Data-flow bugs require tracing source, normalizer, store, props, and component boundary.
  6. Native module failures must be treated as launch blockers until verified through logcat and screenshot.
  7. Public documentation should describe verified behavior only.

Known Limitations

  • The repository is an implementation scaffold and field-knowledge carrier, not a fully self-hosting Android IDE.
  • Most validation is Android-focused. Web, iOS, and backend expansion are architectural targets, not equally verified paths.
  • Automated Compose UI interaction through ADB remains fragile. Use log markers, screenshot inspection, or Compose test APIs where possible.
  • The current ADB service is partly represented through framework commands and patterns; a dedicated typed ADB service remains a future implementation area.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors