Skip to content

Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1#44

Merged
dadachi merged 1 commit intomainfrom
layer3-dispatch-visual-integration
May 2, 2026
Merged

Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1#44
dadachi merged 1 commit intomainfrom
layer3-dispatch-visual-integration

Conversation

@dadachi
Copy link
Copy Markdown
Contributor

@dadachi dadachi commented May 2, 2026

Summary

Closes the Layer 3 integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in the shell, npm run dev:

  1. Runs the existing planner + workers + reviewer chain.
  2. Calls runJudge with layer2Mode: "build", forcing real xcodebuild build + ./gradlew assembleDebug instead of the fast-mode toolchain probe.
  3. Calls runJudge with visual: { iosDir, androidDir, spec }. runJudge delegates to runStage1Visual (Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge) #43) per platform: discoverArtifact (Layer 3 Phase 5c: artifact discovery for iOS .app and Android .apk #42) → installAndLaunch (Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator #39) → 3s render wait → captureScreenshot (Layer 3 Phase 3: screenshot capture primitive #38) → runLayer3 (Layer 3 Phase 1: Opus 4.7 vision judge — runLayer3() implementation #37) with DEFAULT_STAGE1_RUBRIC (Layer 3 Phase 5a: visual-judge orchestration + default Stage 1 rubric #40).
  4. Aggregates Layer 1 + Layer 2 + Layer 3 into the final JudgeResult. overallPass requires all three to pass.

Without the flag set, behavior is unchanged from #43: Layer 2 in fast mode, Layer 3 skipped, summary reads "Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".

Recommendations applied

Question Decision
Trigger mechanism Env var NATIVEAPPTEMPLATE_VISUAL=1 — canonical stem per post-#30 convention. Stub flags (NATEMPLATE_STUB_*) should be renamed in a follow-up PR.
Build coupling Visual forces Layer 2 build mode. Otherwise visual would always fail (no artifacts).
Failure semantics Visual failures fail the run (matches PLAN.md "a run that green-builds without passing Layer 3 is a failed run").
Render wait 3s default for both platforms. Configurable via runVisualJudge input; not yet exposed as an env var.
Per-platform opt-in Judge both when enabled. Discovery returns null gracefully if either platform's build is missing — surfaces as a clean error per platform.

Refactor

Reshapes JudgeInput.visual from per-platform pre-resolved configs ({artifactPath, bundleId} pairs) to outDir-based discovery ({iosDir, androidDir}). runJudge.runVisualPhase now delegates to runStage1Visual which does discovery + visual-judge atomically. Cleaner contract — callers don't pre-resolve paths.

Latency cost

A cold build + judge run adds:

  • iOS: ~30-60s for xcodebuild build (cold), then ~10-20s for install + launch + capture + 3 vision API calls.
  • Android: ~60-180s for ./gradlew assembleDebug (cold), then ~10-20s for install + launch + capture + 3 vision API calls.

Hot rebuilds (Xcode DerivedData warm, Gradle daemon up) are much faster but vary with substrate caches.

Test plan

  • npm run ci — 16/16 green. Existing dispatch e2e test unchanged (visual env var not set in CI → skip path).
  • npm run build — clean.
  • Manual smoke after merge: NATIVEAPPTEMPLATE_VISUAL=1 npm run dev -- "a walk-in clinic queue" against booted iPhone 17 sim + Android emulator. Confirms Layer 3 2/2 pass lands in the summary.
  • Follow-up: rename NATEMPLATE_STUB_*NATIVEAPPTEMPLATE_STUB_* for symmetry.

🤖 Generated with Claude Code

Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in
the shell, npm run dev:

  1. Runs the existing planner + workers + reviewer chain.
  2. Calls runJudge with layer2Mode: "build", forcing real
     xcodebuild build + ./gradlew assembleDebug instead of the
     fast-mode toolchain probe. The build outputs are what Stage 1
     visual judging needs.
  3. Calls runJudge with visual: { iosDir, androidDir, spec }.
     runJudge in turn calls runStage1Visual (#43) per platform:
     discoverArtifact (#42) → installAndLaunch (#39) → 3s render
     wait → captureScreenshot (#38) → runLayer3 (#37) with
     DEFAULT_STAGE1_RUBRIC (#40).
  4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult.
     overallPass requires all three to pass; visual failures DO
     fail the run (matches PLAN.md "a run that green-builds without
     passing Layer 3 is a failed run").

Without the flag set, behavior is unchanged from #43: Layer 2 in
fast mode, Layer 3 skipped, summary reads
"Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".

Refactors the JudgeInput.visual shape from per-platform pre-resolved
configs ({artifactPath, bundleId} pairs) to outDir-based discovery
({iosDir, androidDir}). runJudge.runVisualPhase now delegates to
runStage1Visual which does discovery + visual-judge atomically.

Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s
(Android) on top of the fast-mode baseline. Hot rebuilds are much
faster but vary with substrate caches.

Recommendations covered:
  - Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem
    per the post-#30 convention; keep stub flags' rename for a
    follow-up PR).
  - Build coupling: visual implies Layer 2 build mode.
  - Failure semantics: visual failures fail the run.
  - Render wait: 3s default for both platforms (in
    DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input).
  - Per-platform: judge both when visual enabled — discovery
    returns null gracefully if either platform's build is missing.

Tests: 16/16 npm run ci green.
README.md gains an "Optional flags" subsection documenting the
trigger and its latency cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dadachi dadachi merged commit 9ade45d into main May 2, 2026
1 check passed
@dadachi dadachi deleted the layer3-dispatch-visual-integration branch May 2, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant