Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator by dadachi · Pull Request #39 · nativeapptemplate/nativeapptemplate-agent

dadachi · 2026-05-02T11:42:16Z

Summary

Adds installAndLaunch(input) — chains the two-step install + launch flow against whatever sim/emulator is currently booted, returns a structured LaunchResult ({ok, command, durationMs, error?}).

iOS path:

xcrun simctl install booted <appPath>
xcrun simctl launch  booted <bundleId>

Android path:

adb install -r <apkPath>
adb shell monkey -p <packageName> -c android.intent.category.LAUNCHER 1

Why `monkey` on Android

It only needs the package name, not the activity name. The renamed substrate's package name follows the slug (com.<slugflat>.<slugpascal>App per buildProductRenamePairs); the activity entry-point could be anywhere. monkey lets Phase 5 orchestration derive the launch target from a single rename pair without parsing the AndroidManifest.

What this PR does NOT do

Discovery: locating the .app / .apk artifact post-build, deriving the bundle ID / package name from the slug. The caller passes already-resolved inputs; the orchestrator (Phase 5) wires this up.
Screen-readiness polling: caller should sleep briefly before captureScreenshot() so the home screen has rendered.

adb install detection nuance

adb install returns exit 0 even on "Failure" — so runOnce checks stdout for the literal "failure" string and surfaces the message in error. Quirky but standard; without this, adb install -r of a missing APK reports false success.

Hardening

Same shape as #38 (capture):

spawn() try/catch for synchronous exec errors.
scrubbedEnv() applied (defense-in-depth from Scrub ANTHROPIC_API_KEY from subprocess env + Security docs #29).
Configurable timeout (default 60s) with SIGTERM kill.
All failure paths return a well-formed LaunchResult instead of crashing the caller.

Test plan

Validation index exports installAndLaunch and the launch types.
Structured-failure test against missing iOS sim — asserts result shape.
npm run ci — 10/10 green.
Real-mode smoke against current dev env (no sim/emulator booted): iOS returns "No devices are booted." cleanly; Android surfaces the local adb / Apple Silicon issue without crashing the process.
After merge: with an iOS Simulator booted and a built .app, installAndLaunch({platform: "ios", appPath, bundleId}) should return ok: true and the app should be visible on the sim home screen.

🤖 Generated with Claude Code

…lator Adds installAndLaunch(input) — chains the two-step install + launch flow against whatever sim/emulator is currently booted, returns a structured LaunchResult {ok, command, durationMs, error?}. iOS path: xcrun simctl install booted <appPath> xcrun simctl launch booted <bundleId> Android path: adb install -r <apkPath> adb shell monkey -p <packageName> -c android.intent.category.LAUNCHER 1 Why monkey on Android: it only needs the package name, not the activity name. The renamed substrate's package name follows the slug (com.<slugflat>.<slugpascal>App per buildProductRenamePairs); the activity could be anywhere. monkey lets Phase 5 orchestration derive the launch target from a single rename pair. Discovery (locating the .app / .apk artifact post-build, deriving the bundle ID / package name from the slug) is the caller's concern. This function takes already-resolved inputs and runs the chain. adb install detection: adb returns exit 0 even on "Failure" — so runOnce checks stdout for the literal "failure" string and surfaces the message in `error`. Hardening: same shape as #38 (capture): - spawn() try/catch for sync exec errors - scrubbedEnv() applied - configurable timeout (default 60s for install, kill on SIGTERM) - all failure paths return well-formed LaunchResult Real-mode smoke against current dev env (no sim booted): iOS: "No devices are booted." — clean error in result.error Android: spawn-level error surfaced cleanly (local adb / Apple Silicon issue) without crashing the process 10/10 npm run ci green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#40) Adds runVisualJudge() — chains the previously-shipped Phase-3 capture (#38) and Phase-4 install/launch (#39) primitives together with the Phase-1 vision judge (#37) into one end-to-end Stage 1 visual pipeline: 1. installAndLaunch (Phase 4) 2. sleep renderWaitMs (default 3s) for the home screen to render 3. captureScreenshot (Phase 3) 4. runLayer3 with rubric + spec (Phase 1) Fail-fast: each step's failure short-circuits and surfaces in the returned VisualJudgeResult. Caller responsibilities: build the artifact (Layer 2 build mode), resolve artifactPath / bundleId / packageName from the slug, ensure a sim/emulator is booted. Also adds DEFAULT_STAGE1_RUBRIC — three Yes/No criteria for home-screen judging, phrased so pass=true is the desired state on each (per SPEC.md guardrail "structured Yes/No-per-criterion rubrics, never free-form scoring"): - domain-match: does the screen read as the spec's product? - no-substrate-leak: no Shop/Shopkeeper/ItemTag/NativeAppTemplate visible? - renders-cleanly: no obvious layout breakage or placeholders? Out of scope (Phase 5b): - Wiring runVisualJudge into runJudge / dispatch - Resolving artifactPath / bundleId / packageName from slug (probably via Info.plist / AndroidManifest.xml read post-build) - Plumbing screenshot paths through JudgeInput Tests: - DEFAULT_STAGE1_RUBRIC has expected criterion ids ✓ - Short-circuit shape verified against missing sim ✓ - 12/12 npm run ci green Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in the shell, npm run dev: 1. Runs the existing planner + workers + reviewer chain. 2. Calls runJudge with layer2Mode: "build", forcing real xcodebuild build + ./gradlew assembleDebug instead of the fast-mode toolchain probe. The build outputs are what Stage 1 visual judging needs. 3. Calls runJudge with visual: { iosDir, androidDir, spec }. runJudge in turn calls runStage1Visual (#43) per platform: discoverArtifact (#42) → installAndLaunch (#39) → 3s render wait → captureScreenshot (#38) → runLayer3 (#37) with DEFAULT_STAGE1_RUBRIC (#40). 4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult. overallPass requires all three to pass; visual failures DO fail the run (matches PLAN.md "a run that green-builds without passing Layer 3 is a failed run"). Without the flag set, behavior is unchanged from #43: Layer 2 in fast mode, Layer 3 skipped, summary reads "Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped". Refactors the JudgeInput.visual shape from per-platform pre-resolved configs ({artifactPath, bundleId} pairs) to outDir-based discovery ({iosDir, androidDir}). runJudge.runVisualPhase now delegates to runStage1Visual which does discovery + visual-judge atomically. Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s (Android) on top of the fast-mode baseline. Hot rebuilds are much faster but vary with substrate caches. Recommendations covered: - Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem per the post-#30 convention; keep stub flags' rename for a follow-up PR). - Build coupling: visual implies Layer 2 build mode. - Failure semantics: visual failures fail the run. - Render wait: 3s default for both platforms (in DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input). - Per-platform: judge both when visual enabled — discovery returns null gracefully if either platform's build is missing. Tests: 16/16 npm run ci green. README.md gains an "Optional flags" subsection documenting the trigger and its latency cost. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surfaced during the first real Layer 3 visual smoke against a booted Android emulator. The agent's installAndLaunch (#39) and captureScreenshot (#38) were calling `adb` via the PATH-resolved binary, which on this dev machine was `~/.apportable/SDK/bin/adb` — an i386 Mach-O from 2014 that fails to exec on Apple Silicon with "spawn Unknown system error -86". Visible adb installs (Android Studio default, /Applications/android- sdk-macosx, Homebrew) were further down PATH and never reached. New helper resolveAdbPath() in src/adb.ts walks a fixed priority order: env-var locations first ($ANDROID_HOME, $ANDROID_SDK_ROOT), then known macOS install paths (Android Studio default, the older /Applications/android-sdk-macosx, /opt/homebrew, /usr/local), and falls back to "adb" via PATH. First existing path wins. src/validation/capture.ts (Android branch) and src/validation/launch.ts (install + launch) now resolve via this helper instead of trusting PATH. Real-mode smoke confirms the fix: on this machine, adb now resolves to /Applications/android-sdk-macosx/platform-tools/adb (universal binary, x86_64+arm64). The Apple Silicon spawn error is gone. Newly-surfaced (separate concern, documented in README): when multiple Android targets are attached (e.g. physical device + emulator), adb requires ANDROID_SERIAL=<serial> to disambiguate. This is a stock adb feature, not something the agent needs to implement — the agent runs adb directly and inherits the env var. Tests: 19/19 npm run ci green. (No adb-specific test in CI since that requires a real Android SDK; resolution logic is verified by the real-mode smoke output above.) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dadachi merged commit 44952ab into main May 2, 2026
1 check passed

dadachi deleted the layer3-install-launch branch May 2, 2026 12:11

dadachi mentioned this pull request May 2, 2026

Layer 3 Phase 5a: visual-judge orchestration + default Stage 1 rubric #40

Merged

4 tasks

dadachi mentioned this pull request May 2, 2026

Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1 #44

Merged

4 tasks

dadachi mentioned this pull request May 3, 2026

adb: resolve to known-good binary instead of trusting PATH order #49

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator#39

Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator#39
dadachi merged 1 commit intomainfrom
layer3-install-launch

dadachi commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dadachi commented May 2, 2026

Summary

Why monkey on Android

What this PR does NOT do

adb install detection nuance

Hardening

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why `monkey` on Android