Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1 by dadachi · Pull Request #44 · nativeapptemplate/nativeapptemplate-agent

dadachi · 2026-05-02T22:57:41Z

Summary

Closes the Layer 3 integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in the shell, npm run dev:

Runs the existing planner + workers + reviewer chain.
Calls runJudge with layer2Mode: "build", forcing real xcodebuild build + ./gradlew assembleDebug instead of the fast-mode toolchain probe.
Calls runJudge with visual: { iosDir, androidDir, spec }. runJudge delegates to runStage1Visual (Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge) #43) per platform: discoverArtifact (Layer 3 Phase 5c: artifact discovery for iOS .app and Android .apk #42) → installAndLaunch (Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator #39) → 3s render wait → captureScreenshot (Layer 3 Phase 3: screenshot capture primitive #38) → runLayer3 (Layer 3 Phase 1: Opus 4.7 vision judge — runLayer3() implementation #37) with DEFAULT_STAGE1_RUBRIC (Layer 3 Phase 5a: visual-judge orchestration + default Stage 1 rubric #40).
Aggregates Layer 1 + Layer 2 + Layer 3 into the final JudgeResult. overallPass requires all three to pass.

Without the flag set, behavior is unchanged from #43: Layer 2 in fast mode, Layer 3 skipped, summary reads "Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".

Recommendations applied

Question	Decision
Trigger mechanism	Env var `NATIVEAPPTEMPLATE_VISUAL=1` — canonical stem per post-#30 convention. Stub flags (`NATEMPLATE_STUB_*`) should be renamed in a follow-up PR.
Build coupling	Visual forces Layer 2 build mode. Otherwise visual would always fail (no artifacts).
Failure semantics	Visual failures fail the run (matches PLAN.md "a run that green-builds without passing Layer 3 is a failed run").
Render wait	3s default for both platforms. Configurable via `runVisualJudge` input; not yet exposed as an env var.
Per-platform opt-in	Judge both when enabled. Discovery returns `null` gracefully if either platform's build is missing — surfaces as a clean error per platform.

Refactor

Reshapes JudgeInput.visual from per-platform pre-resolved configs ({artifactPath, bundleId} pairs) to outDir-based discovery ({iosDir, androidDir}). runJudge.runVisualPhase now delegates to runStage1Visual which does discovery + visual-judge atomically. Cleaner contract — callers don't pre-resolve paths.

Latency cost

A cold build + judge run adds:

iOS: ~30-60s for xcodebuild build (cold), then ~10-20s for install + launch + capture + 3 vision API calls.
Android: ~60-180s for ./gradlew assembleDebug (cold), then ~10-20s for install + launch + capture + 3 vision API calls.

Hot rebuilds (Xcode DerivedData warm, Gradle daemon up) are much faster but vary with substrate caches.

Test plan

npm run ci — 16/16 green. Existing dispatch e2e test unchanged (visual env var not set in CI → skip path).
npm run build — clean.
Manual smoke after merge: NATIVEAPPTEMPLATE_VISUAL=1 npm run dev -- "a walk-in clinic queue" against booted iPhone 17 sim + Android emulator. Confirms Layer 3 2/2 pass lands in the summary.
Follow-up: rename NATEMPLATE_STUB_* → NATIVEAPPTEMPLATE_STUB_* for symmetry.

🤖 Generated with Claude Code

Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in the shell, npm run dev: 1. Runs the existing planner + workers + reviewer chain. 2. Calls runJudge with layer2Mode: "build", forcing real xcodebuild build + ./gradlew assembleDebug instead of the fast-mode toolchain probe. The build outputs are what Stage 1 visual judging needs. 3. Calls runJudge with visual: { iosDir, androidDir, spec }. runJudge in turn calls runStage1Visual (#43) per platform: discoverArtifact (#42) → installAndLaunch (#39) → 3s render wait → captureScreenshot (#38) → runLayer3 (#37) with DEFAULT_STAGE1_RUBRIC (#40). 4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult. overallPass requires all three to pass; visual failures DO fail the run (matches PLAN.md "a run that green-builds without passing Layer 3 is a failed run"). Without the flag set, behavior is unchanged from #43: Layer 2 in fast mode, Layer 3 skipped, summary reads "Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped". Refactors the JudgeInput.visual shape from per-platform pre-resolved configs ({artifactPath, bundleId} pairs) to outDir-based discovery ({iosDir, androidDir}). runJudge.runVisualPhase now delegates to runStage1Visual which does discovery + visual-judge atomically. Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s (Android) on top of the fast-mode baseline. Hot rebuilds are much faster but vary with substrate caches. Recommendations covered: - Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem per the post-#30 convention; keep stub flags' rename for a follow-up PR). - Build coupling: visual implies Layer 2 build mode. - Failure semantics: visual failures fail the run. - Render wait: 3s default for both platforms (in DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input). - Per-platform: judge both when visual enabled — discovery returns null gracefully if either platform's build is missing. Tests: 16/16 npm run ci green. README.md gains an "Optional flags" subsection documenting the trigger and its latency cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dadachi merged commit 9ade45d into main May 2, 2026
1 check passed

dadachi deleted the layer3-dispatch-visual-integration branch May 2, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1#44

Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1#44
dadachi merged 1 commit intomainfrom
layer3-dispatch-visual-integration

dadachi commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dadachi commented May 2, 2026

Summary

Recommendations applied

Refactor

Latency cost

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant