Scroller is a modular, production‑grade Android system for autonomous UI control. It is built as a sequence of strongly isolated modules that together form a closed‑loop Perception → Reasoning → Action engine, with strict safety guardrails.
This repository implements:
- Hands: AccessibilityActionExecutor
- Eyes: MediaProjection ScreenCapture
- Reasoning Boundary: Strict LLM Client (JSON‑only)
- Heartbeat: AgentLoopController (bounded control loop)
- Safety Layer: Supervisor + RecoveryPolicy
- Task Segmentation: Orchestrator (sequential subgoals)
- Short‑Term Memory: AgentMemory (bounded context)
No scaffolding. No fake responses. No hardcoded UI assumptions.
app/— Host Android app (minimal UI for enabling accessibility + MediaProjection permission)action-executor/— Library module containing all core logic and servicesdocs/— Architecture and module documentation
The system runs a bounded loop:
- Capture screen (MediaProjection)
- Ask LLM for next action (strict JSON schema)
- Execute action (Accessibility)
- Evaluate safety (Supervisor + Recovery)
- Repeat within strict bounds
Core modules are pure logic (no Android APIs) except for:
- Screen capture
- Accessibility execution
- MediaProjection control
Reference diagram: docs/SYSTEM_ARCHITECTURE.md.
- File(s):
action-executor/src/main/java/com/scroller/agent/executor/AccessibilityActionExecutorService.kt,AndroidActionExecutor.kt - Executes:
Tap(x, y)Scroll(direction)PressBackType(text)OpenApp(packageName)
- Uses Accessibility APIs only.
- All gestures dispatched on
Dispatchers.Main.immediate. - No hardcoded coordinates.
- Backpressure via
Channel<ActionCommand>(capacity = 1).
- File(s):
ScreenCaptureManager.kt,MediaProjectionController.kt,ScreenFrame.kt - MediaProjection‑based capture.
- Real-time bounds via
WindowMetricsCalculator. - Strict timeout on capture.
- No deprecated Display APIs.
- File(s):
LlmClient.kt,OpenAiLlmClient.kt,ActionSchema.kt - Strict JSON schema validation enforced.
- Invalid output never reaches executor.
- API key is injected at runtime.
- Input includes:
- Goal
- Last 10 actions
- Compressed memory summary
- Screenshot (JPEG, max dimension 1024, quality 75)
- File(s):
AgentLoopController.kt,LoopConfig.kt,LoopResult.kt,LoopState.kt - Bounded loop with:
maxStepsstepDelayMs- per‑stage timeouts
- Cancellation propagates cleanly.
- No Android dependencies.
- File(s):
Supervisor.kt,ScreenFingerprint.kt,RecoveryPolicy.kt - Detects:
- Repeated actions
- Stalled screen
- Oscillation
- Failure cascade
- Recovery is deterministic and bounded.
- File(s):
Orchestrator.kt,Subgoal.kt - One LLM call to split goal into 1–5 subgoals.
- Runs subgoals sequentially; aborts on first failure.
- File(s):
AgentMemory.kt,MemorySnapshot.kt - Bounded buffers for actions, events, fingerprints, decisions.
- Provides compressed summary for the LLM.
- No infinite loops:
maxSteps, failure thresholds, timeouts. - No silent failures: all errors surface via structured results.
- No uncontrolled retries: retries are explicit and bounded.
- No Android dependencies in core logic: Supervisor/Recovery/Memory are pure Kotlin.
- No unbounded memory growth: all history buffers capped.
- Android Studio (Giraffe+)
- Gradle 8.2.2 (wrapper scripts included; jar not committed)
- Kotlin 1.9.22
- JDK 17
- Open project in Android Studio.
- Ensure Gradle sync completes.
- Build
:appand install on device. - Enable accessibility service:
- Open app → Open Accessibility Settings → enable Scroller
- Request MediaProjection permission:
- Open app → Request Screen Capture Permission → allow.
All critical limits are configurable:
LoopConfigmaxSteps,stepDelayMs, timeouts
SupervisorConfig- repeated actions, oscillation thresholds
RecoveryConfigmaxRecoveryAttempts
No magic numbers in controllers.
- API keys must be injected at runtime.
- No API keys are stored in code or logs.
- LLM input/output is strictly validated.
Key docs:
docs/SYSTEM_ARCHITECTURE.mddocs/ACCESSIBILITY_ACTION_EXECUTOR.mddocs/SCREEN_CAPTURE_MODULE.mddocs/LLM_CLIENT.mddocs/AGENT_LOOP_CONTROLLER.mddocs/SUPERVISOR.mddocs/RECOVERY.mddocs/ORCHESTRATOR.mddocs/MEMORY.md
- No persistence across app restarts
- No background services beyond required Android services
- No multi‑agent parallelism
- No semantic memory or vector DB
- No hidden retries
This project is internal and experimental. Add a license if you plan to distribute.