Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1#44
Merged
Conversation
Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in
the shell, npm run dev:
1. Runs the existing planner + workers + reviewer chain.
2. Calls runJudge with layer2Mode: "build", forcing real
xcodebuild build + ./gradlew assembleDebug instead of the
fast-mode toolchain probe. The build outputs are what Stage 1
visual judging needs.
3. Calls runJudge with visual: { iosDir, androidDir, spec }.
runJudge in turn calls runStage1Visual (#43) per platform:
discoverArtifact (#42) → installAndLaunch (#39) → 3s render
wait → captureScreenshot (#38) → runLayer3 (#37) with
DEFAULT_STAGE1_RUBRIC (#40).
4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult.
overallPass requires all three to pass; visual failures DO
fail the run (matches PLAN.md "a run that green-builds without
passing Layer 3 is a failed run").
Without the flag set, behavior is unchanged from #43: Layer 2 in
fast mode, Layer 3 skipped, summary reads
"Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".
Refactors the JudgeInput.visual shape from per-platform pre-resolved
configs ({artifactPath, bundleId} pairs) to outDir-based discovery
({iosDir, androidDir}). runJudge.runVisualPhase now delegates to
runStage1Visual which does discovery + visual-judge atomically.
Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s
(Android) on top of the fast-mode baseline. Hot rebuilds are much
faster but vary with substrate caches.
Recommendations covered:
- Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem
per the post-#30 convention; keep stub flags' rename for a
follow-up PR).
- Build coupling: visual implies Layer 2 build mode.
- Failure semantics: visual failures fail the run.
- Render wait: 3s default for both platforms (in
DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input).
- Per-platform: judge both when visual enabled — discovery
returns null gracefully if either platform's build is missing.
Tests: 16/16 npm run ci green.
README.md gains an "Optional flags" subsection documenting the
trigger and its latency cost.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the Layer 3 integration loop. With
NATIVEAPPTEMPLATE_VISUAL=1set in the shell,npm run dev:runJudgewithlayer2Mode: "build", forcing realxcodebuild build+./gradlew assembleDebuginstead of the fast-mode toolchain probe.runJudgewithvisual: { iosDir, androidDir, spec }.runJudgedelegates torunStage1Visual(Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge) #43) per platform:discoverArtifact(Layer 3 Phase 5c: artifact discovery for iOS .app and Android .apk #42) →installAndLaunch(Layer 3 Phase 4: install + launch wrapper for iOS sim and Android emulator #39) → 3s render wait →captureScreenshot(Layer 3 Phase 3: screenshot capture primitive #38) →runLayer3(Layer 3 Phase 1: Opus 4.7 vision judge — runLayer3() implementation #37) withDEFAULT_STAGE1_RUBRIC(Layer 3 Phase 5a: visual-judge orchestration + default Stage 1 rubric #40).JudgeResult.overallPassrequires all three to pass.Without the flag set, behavior is unchanged from #43: Layer 2 in fast mode, Layer 3 skipped, summary reads
"Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".Recommendations applied
NATIVEAPPTEMPLATE_VISUAL=1— canonical stem per post-#30 convention. Stub flags (NATEMPLATE_STUB_*) should be renamed in a follow-up PR.runVisualJudgeinput; not yet exposed as an env var.nullgracefully if either platform's build is missing — surfaces as a clean error per platform.Refactor
Reshapes
JudgeInput.visualfrom per-platform pre-resolved configs ({artifactPath, bundleId}pairs) to outDir-based discovery ({iosDir, androidDir}).runJudge.runVisualPhasenow delegates torunStage1Visualwhich does discovery + visual-judge atomically. Cleaner contract — callers don't pre-resolve paths.Latency cost
A cold build + judge run adds:
xcodebuild build(cold), then ~10-20s for install + launch + capture + 3 vision API calls../gradlew assembleDebug(cold), then ~10-20s for install + launch + capture + 3 vision API calls.Hot rebuilds (Xcode DerivedData warm, Gradle daemon up) are much faster but vary with substrate caches.
Test plan
npm run ci— 16/16 green. Existing dispatch e2e test unchanged (visual env var not set in CI → skip path).npm run build— clean.NATIVEAPPTEMPLATE_VISUAL=1 npm run dev -- "a walk-in clinic queue"against booted iPhone 17 sim + Android emulator. ConfirmsLayer 3 2/2 passlands in the summary.NATEMPLATE_STUB_*→NATIVEAPPTEMPLATE_STUB_*for symmetry.🤖 Generated with Claude Code