Skip to content

Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator#39

Merged
dadachi merged 1 commit intomainfrom
layer3-install-launch
May 2, 2026
Merged

Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator#39
dadachi merged 1 commit intomainfrom
layer3-install-launch

Conversation

@dadachi
Copy link
Copy Markdown
Contributor

@dadachi dadachi commented May 2, 2026

Summary

Adds installAndLaunch(input) — chains the two-step install + launch flow against whatever sim/emulator is currently booted, returns a structured LaunchResult ({ok, command, durationMs, error?}).

iOS path:

xcrun simctl install booted <appPath>
xcrun simctl launch  booted <bundleId>

Android path:

adb install -r <apkPath>
adb shell monkey -p <packageName> -c android.intent.category.LAUNCHER 1

Why monkey on Android

It only needs the package name, not the activity name. The renamed substrate's package name follows the slug (com.<slugflat>.<slugpascal>App per buildProductRenamePairs); the activity entry-point could be anywhere. monkey lets Phase 5 orchestration derive the launch target from a single rename pair without parsing the AndroidManifest.

What this PR does NOT do

  • Discovery: locating the .app / .apk artifact post-build, deriving the bundle ID / package name from the slug. The caller passes already-resolved inputs; the orchestrator (Phase 5) wires this up.
  • Screen-readiness polling: caller should sleep briefly before captureScreenshot() so the home screen has rendered.

adb install detection nuance

adb install returns exit 0 even on "Failure" — so runOnce checks stdout for the literal "failure" string and surfaces the message in error. Quirky but standard; without this, adb install -r of a missing APK reports false success.

Hardening

Same shape as #38 (capture):

Test plan

  • Validation index exports installAndLaunch and the launch types.
  • Structured-failure test against missing iOS sim — asserts result shape.
  • npm run ci — 10/10 green.
  • Real-mode smoke against current dev env (no sim/emulator booted): iOS returns "No devices are booted." cleanly; Android surfaces the local adb / Apple Silicon issue without crashing the process.
  • After merge: with an iOS Simulator booted and a built .app, installAndLaunch({platform: "ios", appPath, bundleId}) should return ok: true and the app should be visible on the sim home screen.

🤖 Generated with Claude Code

…lator

Adds installAndLaunch(input) — chains the two-step install + launch
flow against whatever sim/emulator is currently booted, returns a
structured LaunchResult {ok, command, durationMs, error?}.

iOS path:
  xcrun simctl install booted <appPath>
  xcrun simctl launch  booted <bundleId>

Android path:
  adb install -r <apkPath>
  adb shell monkey -p <packageName> -c android.intent.category.LAUNCHER 1

Why monkey on Android: it only needs the package name, not the
activity name. The renamed substrate's package name follows the slug
(com.<slugflat>.<slugpascal>App per buildProductRenamePairs); the
activity could be anywhere. monkey lets Phase 5 orchestration derive
the launch target from a single rename pair.

Discovery (locating the .app / .apk artifact post-build, deriving the
bundle ID / package name from the slug) is the caller's concern. This
function takes already-resolved inputs and runs the chain.

adb install detection: adb returns exit 0 even on "Failure" — so
runOnce checks stdout for the literal "failure" string and surfaces
the message in `error`.

Hardening: same shape as #38 (capture):
  - spawn() try/catch for sync exec errors
  - scrubbedEnv() applied
  - configurable timeout (default 60s for install, kill on SIGTERM)
  - all failure paths return well-formed LaunchResult

Real-mode smoke against current dev env (no sim booted):
  iOS:     "No devices are booted." — clean error in result.error
  Android: spawn-level error surfaced cleanly (local adb / Apple
           Silicon issue) without crashing the process

10/10 npm run ci green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dadachi dadachi merged commit 44952ab into main May 2, 2026
1 check passed
@dadachi dadachi deleted the layer3-install-launch branch May 2, 2026 12:11
dadachi added a commit that referenced this pull request May 2, 2026
…#40)

Adds runVisualJudge() — chains the previously-shipped Phase-3 capture
(#38) and Phase-4 install/launch (#39) primitives together with the
Phase-1 vision judge (#37) into one end-to-end Stage 1 visual pipeline:

  1. installAndLaunch (Phase 4)
  2. sleep renderWaitMs (default 3s) for the home screen to render
  3. captureScreenshot (Phase 3)
  4. runLayer3 with rubric + spec (Phase 1)

Fail-fast: each step's failure short-circuits and surfaces in the
returned VisualJudgeResult. Caller responsibilities: build the
artifact (Layer 2 build mode), resolve artifactPath / bundleId /
packageName from the slug, ensure a sim/emulator is booted.

Also adds DEFAULT_STAGE1_RUBRIC — three Yes/No criteria for
home-screen judging, phrased so pass=true is the desired state on
each (per SPEC.md guardrail "structured Yes/No-per-criterion
rubrics, never free-form scoring"):
  - domain-match: does the screen read as the spec's product?
  - no-substrate-leak: no Shop/Shopkeeper/ItemTag/NativeAppTemplate
    visible?
  - renders-cleanly: no obvious layout breakage or placeholders?

Out of scope (Phase 5b):
  - Wiring runVisualJudge into runJudge / dispatch
  - Resolving artifactPath / bundleId / packageName from slug
    (probably via Info.plist / AndroidManifest.xml read post-build)
  - Plumbing screenshot paths through JudgeInput

Tests:
  - DEFAULT_STAGE1_RUBRIC has expected criterion ids ✓
  - Short-circuit shape verified against missing sim ✓
  - 12/12 npm run ci green

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dadachi added a commit that referenced this pull request May 2, 2026
Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in
the shell, npm run dev:

  1. Runs the existing planner + workers + reviewer chain.
  2. Calls runJudge with layer2Mode: "build", forcing real
     xcodebuild build + ./gradlew assembleDebug instead of the
     fast-mode toolchain probe. The build outputs are what Stage 1
     visual judging needs.
  3. Calls runJudge with visual: { iosDir, androidDir, spec }.
     runJudge in turn calls runStage1Visual (#43) per platform:
     discoverArtifact (#42) → installAndLaunch (#39) → 3s render
     wait → captureScreenshot (#38) → runLayer3 (#37) with
     DEFAULT_STAGE1_RUBRIC (#40).
  4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult.
     overallPass requires all three to pass; visual failures DO
     fail the run (matches PLAN.md "a run that green-builds without
     passing Layer 3 is a failed run").

Without the flag set, behavior is unchanged from #43: Layer 2 in
fast mode, Layer 3 skipped, summary reads
"Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".

Refactors the JudgeInput.visual shape from per-platform pre-resolved
configs ({artifactPath, bundleId} pairs) to outDir-based discovery
({iosDir, androidDir}). runJudge.runVisualPhase now delegates to
runStage1Visual which does discovery + visual-judge atomically.

Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s
(Android) on top of the fast-mode baseline. Hot rebuilds are much
faster but vary with substrate caches.

Recommendations covered:
  - Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem
    per the post-#30 convention; keep stub flags' rename for a
    follow-up PR).
  - Build coupling: visual implies Layer 2 build mode.
  - Failure semantics: visual failures fail the run.
  - Render wait: 3s default for both platforms (in
    DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input).
  - Per-platform: judge both when visual enabled — discovery
    returns null gracefully if either platform's build is missing.

Tests: 16/16 npm run ci green.
README.md gains an "Optional flags" subsection documenting the
trigger and its latency cost.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dadachi added a commit that referenced this pull request May 3, 2026
Surfaced during the first real Layer 3 visual smoke against a booted
Android emulator. The agent's installAndLaunch (#39) and
captureScreenshot (#38) were calling `adb` via the PATH-resolved
binary, which on this dev machine was
`~/.apportable/SDK/bin/adb` — an i386 Mach-O from 2014 that fails
to exec on Apple Silicon with "spawn Unknown system error -86".
Visible adb installs (Android Studio default, /Applications/android-
sdk-macosx, Homebrew) were further down PATH and never reached.

New helper resolveAdbPath() in src/adb.ts walks a fixed priority
order: env-var locations first ($ANDROID_HOME, $ANDROID_SDK_ROOT),
then known macOS install paths (Android Studio default, the older
/Applications/android-sdk-macosx, /opt/homebrew, /usr/local), and
falls back to "adb" via PATH. First existing path wins.

src/validation/capture.ts (Android branch) and
src/validation/launch.ts (install + launch) now resolve via this
helper instead of trusting PATH.

Real-mode smoke confirms the fix: on this machine, adb now resolves
to /Applications/android-sdk-macosx/platform-tools/adb (universal
binary, x86_64+arm64). The Apple Silicon spawn error is gone.

Newly-surfaced (separate concern, documented in README): when
multiple Android targets are attached (e.g. physical device +
emulator), adb requires ANDROID_SERIAL=<serial> to disambiguate.
This is a stock adb feature, not something the agent needs to
implement — the agent runs adb directly and inherits the env var.

Tests: 19/19 npm run ci green. (No adb-specific test in CI since
that requires a real Android SDK; resolution logic is verified by
the real-mode smoke output above.)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant