Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator#39
Merged
Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator#39
Conversation
…lator
Adds installAndLaunch(input) — chains the two-step install + launch
flow against whatever sim/emulator is currently booted, returns a
structured LaunchResult {ok, command, durationMs, error?}.
iOS path:
xcrun simctl install booted <appPath>
xcrun simctl launch booted <bundleId>
Android path:
adb install -r <apkPath>
adb shell monkey -p <packageName> -c android.intent.category.LAUNCHER 1
Why monkey on Android: it only needs the package name, not the
activity name. The renamed substrate's package name follows the slug
(com.<slugflat>.<slugpascal>App per buildProductRenamePairs); the
activity could be anywhere. monkey lets Phase 5 orchestration derive
the launch target from a single rename pair.
Discovery (locating the .app / .apk artifact post-build, deriving the
bundle ID / package name from the slug) is the caller's concern. This
function takes already-resolved inputs and runs the chain.
adb install detection: adb returns exit 0 even on "Failure" — so
runOnce checks stdout for the literal "failure" string and surfaces
the message in `error`.
Hardening: same shape as #38 (capture):
- spawn() try/catch for sync exec errors
- scrubbedEnv() applied
- configurable timeout (default 60s for install, kill on SIGTERM)
- all failure paths return well-formed LaunchResult
Real-mode smoke against current dev env (no sim booted):
iOS: "No devices are booted." — clean error in result.error
Android: spawn-level error surfaced cleanly (local adb / Apple
Silicon issue) without crashing the process
10/10 npm run ci green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
dadachi
added a commit
that referenced
this pull request
May 2, 2026
…#40) Adds runVisualJudge() — chains the previously-shipped Phase-3 capture (#38) and Phase-4 install/launch (#39) primitives together with the Phase-1 vision judge (#37) into one end-to-end Stage 1 visual pipeline: 1. installAndLaunch (Phase 4) 2. sleep renderWaitMs (default 3s) for the home screen to render 3. captureScreenshot (Phase 3) 4. runLayer3 with rubric + spec (Phase 1) Fail-fast: each step's failure short-circuits and surfaces in the returned VisualJudgeResult. Caller responsibilities: build the artifact (Layer 2 build mode), resolve artifactPath / bundleId / packageName from the slug, ensure a sim/emulator is booted. Also adds DEFAULT_STAGE1_RUBRIC — three Yes/No criteria for home-screen judging, phrased so pass=true is the desired state on each (per SPEC.md guardrail "structured Yes/No-per-criterion rubrics, never free-form scoring"): - domain-match: does the screen read as the spec's product? - no-substrate-leak: no Shop/Shopkeeper/ItemTag/NativeAppTemplate visible? - renders-cleanly: no obvious layout breakage or placeholders? Out of scope (Phase 5b): - Wiring runVisualJudge into runJudge / dispatch - Resolving artifactPath / bundleId / packageName from slug (probably via Info.plist / AndroidManifest.xml read post-build) - Plumbing screenshot paths through JudgeInput Tests: - DEFAULT_STAGE1_RUBRIC has expected criterion ids ✓ - Short-circuit shape verified against missing sim ✓ - 12/12 npm run ci green Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
dadachi
added a commit
that referenced
this pull request
May 2, 2026
Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in
the shell, npm run dev:
1. Runs the existing planner + workers + reviewer chain.
2. Calls runJudge with layer2Mode: "build", forcing real
xcodebuild build + ./gradlew assembleDebug instead of the
fast-mode toolchain probe. The build outputs are what Stage 1
visual judging needs.
3. Calls runJudge with visual: { iosDir, androidDir, spec }.
runJudge in turn calls runStage1Visual (#43) per platform:
discoverArtifact (#42) → installAndLaunch (#39) → 3s render
wait → captureScreenshot (#38) → runLayer3 (#37) with
DEFAULT_STAGE1_RUBRIC (#40).
4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult.
overallPass requires all three to pass; visual failures DO
fail the run (matches PLAN.md "a run that green-builds without
passing Layer 3 is a failed run").
Without the flag set, behavior is unchanged from #43: Layer 2 in
fast mode, Layer 3 skipped, summary reads
"Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".
Refactors the JudgeInput.visual shape from per-platform pre-resolved
configs ({artifactPath, bundleId} pairs) to outDir-based discovery
({iosDir, androidDir}). runJudge.runVisualPhase now delegates to
runStage1Visual which does discovery + visual-judge atomically.
Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s
(Android) on top of the fast-mode baseline. Hot rebuilds are much
faster but vary with substrate caches.
Recommendations covered:
- Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem
per the post-#30 convention; keep stub flags' rename for a
follow-up PR).
- Build coupling: visual implies Layer 2 build mode.
- Failure semantics: visual failures fail the run.
- Render wait: 3s default for both platforms (in
DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input).
- Per-platform: judge both when visual enabled — discovery
returns null gracefully if either platform's build is missing.
Tests: 16/16 npm run ci green.
README.md gains an "Optional flags" subsection documenting the
trigger and its latency cost.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
dadachi
added a commit
that referenced
this pull request
May 3, 2026
Surfaced during the first real Layer 3 visual smoke against a booted Android emulator. The agent's installAndLaunch (#39) and captureScreenshot (#38) were calling `adb` via the PATH-resolved binary, which on this dev machine was `~/.apportable/SDK/bin/adb` — an i386 Mach-O from 2014 that fails to exec on Apple Silicon with "spawn Unknown system error -86". Visible adb installs (Android Studio default, /Applications/android- sdk-macosx, Homebrew) were further down PATH and never reached. New helper resolveAdbPath() in src/adb.ts walks a fixed priority order: env-var locations first ($ANDROID_HOME, $ANDROID_SDK_ROOT), then known macOS install paths (Android Studio default, the older /Applications/android-sdk-macosx, /opt/homebrew, /usr/local), and falls back to "adb" via PATH. First existing path wins. src/validation/capture.ts (Android branch) and src/validation/launch.ts (install + launch) now resolve via this helper instead of trusting PATH. Real-mode smoke confirms the fix: on this machine, adb now resolves to /Applications/android-sdk-macosx/platform-tools/adb (universal binary, x86_64+arm64). The Apple Silicon spawn error is gone. Newly-surfaced (separate concern, documented in README): when multiple Android targets are attached (e.g. physical device + emulator), adb requires ANDROID_SERIAL=<serial> to disambiguate. This is a stock adb feature, not something the agent needs to implement — the agent runs adb directly and inherits the env var. Tests: 19/19 npm run ci green. (No adb-specific test in CI since that requires a real Android SDK; resolution logic is verified by the real-mode smoke output above.) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
installAndLaunch(input)— chains the two-step install + launch flow against whatever sim/emulator is currently booted, returns a structuredLaunchResult({ok, command, durationMs, error?}).iOS path:
Android path:
Why
monkeyon AndroidIt only needs the package name, not the activity name. The renamed substrate's package name follows the slug (
com.<slugflat>.<slugpascal>AppperbuildProductRenamePairs); the activity entry-point could be anywhere.monkeylets Phase 5 orchestration derive the launch target from a single rename pair without parsing the AndroidManifest.What this PR does NOT do
.app/.apkartifact post-build, deriving the bundle ID / package name from the slug. The caller passes already-resolved inputs; the orchestrator (Phase 5) wires this up.captureScreenshot()so the home screen has rendered.adb install detection nuance
adb installreturns exit 0 even on "Failure" — sorunOncechecks stdout for the literal"failure"string and surfaces the message inerror. Quirky but standard; without this,adb install -rof a missing APK reports false success.Hardening
Same shape as #38 (capture):
spawn()try/catch for synchronous exec errors.scrubbedEnv()applied (defense-in-depth from Scrub ANTHROPIC_API_KEY from subprocess env + Security docs #29).SIGTERMkill.LaunchResultinstead of crashing the caller.Test plan
installAndLaunchand the launch types.npm run ci— 10/10 green."No devices are booted."cleanly; Android surfaces the local adb / Apple Silicon issue without crashing the process..app,installAndLaunch({platform: "ios", appPath, bundleId})should returnok: trueand the app should be visible on the sim home screen.🤖 Generated with Claude Code