fix(mcp/mobile): scroll button into view + sweep orphan qemu/adb in heal (claim-opp + ensureAvdRunning)#349
Merged
Merged
Conversation
…eal (claim-opp + ensureAvdRunning) Two related Phase-6 blockers batched into one PR. ## Bug A: connect-claim-opp.yaml button below fold after title-scroll PR #346 added a pre-branch `scrollUntilVisible { text: ${OPP_NAME} }` that brings the target tile's TITLE into the viewport before Branch A / Branch B dispatch. With 5+ prior-run invite cards rendered ahead of the target (steady state on any dogfood machine), the title-only scroll lands at the viewport BOTTOM, and the button below the title (`btn_resume` for in-progress, `btn_view_opportunity` for new) is clipped off-screen. Both branches' `when: visible: { id: btn_*, below: text: ${OPP_NAME} }` guards then evaluate to false and the in-body `scrollUntilVisible` (which would scroll the button into view) never fires. Catch-22. Fix: after the unconditional title-scroll, add two `runFlow when:` blocks that scroll the per-branch button into view. Whichever button the tile actually carries is scrolled; the other branch's runFlow no-ops because its `when:` guard matches nothing. Both Branch A and Branch B guards downstream resolve correctly afterwards. Live reproducer: malaria-itn-fgd run 20260515-1645 Phase 6 attempt 11; screenshot at .maestro/tests/2026-05-18_100623/. Third PR in the claim-opp recipe series (PR #346 = title scroll; PR #348 = post-start wait; this = post-title button scroll). Class also flagged in skills/app-screenshot-capture/SKILL.md § Step 4 "OPP_NAME uniqueness assumption" as a durable cross-recipe pattern. ## Bug B: adb-daemon-wedge heal pre-step (ensureAvdRunning) Same precondition-restore class as the cold-boot itself (CLAUDE.md § "Phase preconditions are restored, not adapted") — one layer lower: qemu+adb daemon state, not AVD content state. Three observed instances on malaria-itn-fgd/20260515-1645: - Attempt 8: post-PR-#345 boot-wait fix passed but a wedged adb daemon independent of the wait still 500'd; manual `adb kill-server` + `adb start-server` cleared it. - Attempt 10: 2 orphan qemu-system-aarch64 processes + 3 stale adb daemons from prior crashed boots — `package service` never bound; 3 consecutive `mobile_ensure_avd_running` failures. - Attempt 11: even after the parent manually swept qemu+adb, the first in-dispatch `ensureAvdRunning` call still threw the same signature until a second `kill-server`/`start-server` fired inside the dispatch — proving the heal itself needs to own daemon restoration, not the operator. Fix: new `sweepStaleEmulatorState()` runs as the FIRST step of `ensureAvdRunning`. Two-part contract: 1. Orphan qemu sweep — pgrep qemu-system PIDs; if `adb devices` shows zero live emulators, every qemu PID is by definition orphan, kill -9. Conservative on mixed-state boxes (only kills excess when pgrep PIDs ≥ adb devices + 2). Logs structured info per kill so future debug avoids the manual `pgrep -af qemu` step. 2. Always run `adb kill-server` + `adb start-server`. Cheap (~500ms), resets the daemon to a known-clean state regardless of whether we detected wedge symptoms. Same trade-off as the cold-boot widening — pay a small fixed cost for guaranteed clean state, avoid junk-state debug cycles. Both steps are best-effort: every shell call swallows its error. The cold-boot funnel still proceeds if sweeping fails. ## Tests - `test/mcp/mobile/static-recipe-invariants.test.ts`: new invariant asserting both per-branch button-anchored scrollUntilVisibles exist BEFORE the `# --- BRANCH A:` marker, each scoped to `below: text: ${OPP_NAME}`. - `test/mcp/mobile/avd.test.ts`: three new tests on `sweepStaleEmulatorState`: - adb daemon restart fires before listAvds + AVD-specific emu kill. - orphan qemu PIDs are killed when adb sees no devices. - pgrep/kill failures are tolerated (best-effort, no propagation). Full suite: 1235 passing, up from 1230 pre-PR.
jjackson
added a commit
that referenced
this pull request
May 19, 2026
…step ordering Two orthogonal bugs surfaced on malaria-itn-fgd/20260515-1645 Phase 6 attempt 12. Both unrelated to PR #354's Learn-Deliver chain scope. Bug A — learn-tap-module.yaml Branch-B over-fires when form-name != module-name: The same-name-suppressed-auto-skip branch guarded only on `screen_suite_menu_list visible AND nav_btn_next NOT visible`, then re-tapped `text:${MODULE_NAME}`. On J1 (module "Briefing Acknowledgement" → form "Acknowledge Readiness"), the form-list screen's toolbar still reads ${MODULE_NAME}, so Branch-B fired and re-tapped the non-tappable toolbar TextView instead of the form row. Fix: nested runFlow with inner guard `visible: text: ${MODULE_NAME}, below: id: screen_suite_menu_list` so the text-match is scoped to the list body. When form-name != module-name no body row matches, Branch-B skips, caller's next learn-tap-module(FORM_NAME) drills the form row by its own label. Bug B — sweepStaleEmulatorState ordering (PR #349 follow-up): PR #349 wired up the orphan-qemu sweep and adb-server restart but (a) used a conservative `qemuPids.length >= liveCount + 2` kill threshold that left the attempt-12 signature (2 orphan qemu + 1 stale adb-devices entry, 2 NOT >= 1+2) below the kill bar, and (b) ran the adb-restart immediately after the kill without waiting for the kernel to release the emulator-NNNN TCP sockets — letting the freshly-restarted daemon adopt the wedged-port state. Result: 2 of the next 3 ensureAvdRunning calls still failed with "package service did not bind" until a second manual adb kill-server/start-server fired inside the dispatch. Fix: 1. Loosen the orphan kill to fire whenever qemuPids.length > liveCount. 2. Add a 500ms socket-release wait between the last orphan SIGKILL and adb kill-server. Tests: - static-recipe-invariants.test.ts asserts Branch-B's inner runFlow has `visible: text: ${MODULE_NAME}, below: id: screen_suite_menu_list`. - avd.test.ts asserts orphan-qemu kill precedes adb kill-server AND there's a ≥400ms gap between them (we wait 500ms, 100ms scheduling slack). Second test asserts the loosened threshold actually fires kills on the attempt-12 2-PID + 1-live-device signature. Live reproducer: /Users/jjackson/.maestro/tests/2026-05-19_151650/ screenshot-❌-1779218294468-(chunk-0.yaml).png. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related Phase-6 blockers from
malaria-itn-fgd/20260515-1645attempts 10 + 11. Batched into one PR because both ship through the ace-mobile MCP.Bug A —
connect-claim-opp.yamlbutton below fold after title-scrollPR #346 added a pre-branch
scrollUntilVisible { text: ${OPP_NAME} }that brings the target tile's TITLE into the viewport before Branch A / Branch B dispatch. With 5+ prior-run invite cards rendered ahead of the target (steady state on any dogfood machine), the title-only scroll lands the title at the viewport bottom, and the button below the title (btn_resumefor in-progress,btn_view_opportunityfor new) is clipped off-screen. Both branches'when: visible: { id: btn_*, below: text: ${OPP_NAME} }guards then evaluate to false and the in-bodyscrollUntilVisible(which would scroll the button into view) never fires. Catch-22.Fix: after the unconditional title-scroll, add two
runFlow when:blocks that scroll the per-branch button into view. Whichever button the tile actually carries is scrolled; the other no-ops.Live reproducer: attempt 11 failure screenshot at
.maestro/tests/2026-05-18_100623/. Class also flagged inskills/app-screenshot-capture/SKILL.md § Step 4 "OPP_NAME uniqueness assumption".Third PR in the claim-opp recipe series (#346 title scroll; #348 post-start wait; this = post-title button scroll).
Bug B —
adb-daemon-wedgeheal pre-step (ensureAvdRunning)Same precondition-restore class as the cold-boot itself (CLAUDE.md § "Phase preconditions are restored, not adapted") — one layer lower: qemu+adb daemon state, not AVD content state. Three observed instances on malaria-itn-fgd/20260515-1645 (attempts 8, 10, 11), all resolved with the same
adb kill-server+adb start-serverpair. Attempt 11 proved the heal itself needs to own daemon restoration, not the operator.Fix: new
sweepStaleEmulatorState()runs as the FIRST step ofensureAvdRunning:qemu-systemPIDs; ifadb devicesshows zero live emulators, every qemu PID is by definition orphan, kill -9. Conservative on mixed-state boxes. Logs structured info per kill.adb kill-server+adb start-server. Cheap (~500ms), resets the daemon to a known-clean state.Best-effort: every shell call swallows its error.
Test plan
npm test— 1235 passing (was 1230 baseline)test/mcp/mobile/static-recipe-invariants.test.ts— new invariant: both per-branch button-anchored scrolls exist BEFORE# --- BRANCH A:, each scoped tobelow: text: ${OPP_NAME}.test/mcp/mobile/avd.test.ts— three new tests onsweepStaleEmulatorState: adb daemon restart fires first; orphan qemu PIDs killed when adb sees no devices; kill failures tolerated.🤖 Generated with Claude Code