Skip to content

fix(mcp/mobile): scroll button into view + sweep orphan qemu/adb in heal (claim-opp + ensureAvdRunning)#349

Merged
jjackson merged 1 commit into
mainfrom
emdash/e2e-malaria-fgd-7yoah
May 18, 2026
Merged

fix(mcp/mobile): scroll button into view + sweep orphan qemu/adb in heal (claim-opp + ensureAvdRunning)#349
jjackson merged 1 commit into
mainfrom
emdash/e2e-malaria-fgd-7yoah

Conversation

@jjackson
Copy link
Copy Markdown
Owner

Summary

Two related Phase-6 blockers from malaria-itn-fgd/20260515-1645 attempts 10 + 11. Batched into one PR because both ship through the ace-mobile MCP.

Bug A — connect-claim-opp.yaml button below fold after title-scroll

PR #346 added a pre-branch scrollUntilVisible { text: ${OPP_NAME} } that brings the target tile's TITLE into the viewport before Branch A / Branch B dispatch. With 5+ prior-run invite cards rendered ahead of the target (steady state on any dogfood machine), the title-only scroll lands the title at the viewport bottom, and the button below the title (btn_resume for in-progress, btn_view_opportunity for new) is clipped off-screen. Both branches' when: visible: { id: btn_*, below: text: ${OPP_NAME} } guards then evaluate to false and the in-body scrollUntilVisible (which would scroll the button into view) never fires. Catch-22.

Fix: after the unconditional title-scroll, add two runFlow when: blocks that scroll the per-branch button into view. Whichever button the tile actually carries is scrolled; the other no-ops.

Live reproducer: attempt 11 failure screenshot at .maestro/tests/2026-05-18_100623/. Class also flagged in skills/app-screenshot-capture/SKILL.md § Step 4 "OPP_NAME uniqueness assumption".

Third PR in the claim-opp recipe series (#346 title scroll; #348 post-start wait; this = post-title button scroll).

Bug B — adb-daemon-wedge heal pre-step (ensureAvdRunning)

Same precondition-restore class as the cold-boot itself (CLAUDE.md § "Phase preconditions are restored, not adapted") — one layer lower: qemu+adb daemon state, not AVD content state. Three observed instances on malaria-itn-fgd/20260515-1645 (attempts 8, 10, 11), all resolved with the same adb kill-server + adb start-server pair. Attempt 11 proved the heal itself needs to own daemon restoration, not the operator.

Fix: new sweepStaleEmulatorState() runs as the FIRST step of ensureAvdRunning:

  1. Orphan qemu sweep — pgrep qemu-system PIDs; if adb devices shows zero live emulators, every qemu PID is by definition orphan, kill -9. Conservative on mixed-state boxes. Logs structured info per kill.
  2. Always run adb kill-server + adb start-server. Cheap (~500ms), resets the daemon to a known-clean state.

Best-effort: every shell call swallows its error.

Test plan

  • npm test — 1235 passing (was 1230 baseline)
  • test/mcp/mobile/static-recipe-invariants.test.ts — new invariant: both per-branch button-anchored scrolls exist BEFORE # --- BRANCH A:, each scoped to below: text: ${OPP_NAME}.
  • test/mcp/mobile/avd.test.ts — three new tests on sweepStaleEmulatorState: adb daemon restart fires first; orphan qemu PIDs killed when adb sees no devices; kill failures tolerated.
  • Post-merge: parent session full-restarts Claude Code (MCP backend code + static recipe both require respawn), re-dispatches Phase 6.

🤖 Generated with Claude Code

…eal (claim-opp + ensureAvdRunning)

Two related Phase-6 blockers batched into one PR.

## Bug A: connect-claim-opp.yaml button below fold after title-scroll

PR #346 added a pre-branch `scrollUntilVisible { text: ${OPP_NAME} }`
that brings the target tile's TITLE into the viewport before Branch A
/ Branch B dispatch. With 5+ prior-run invite cards rendered ahead of
the target (steady state on any dogfood machine), the title-only
scroll lands at the viewport BOTTOM, and the button below the title
(`btn_resume` for in-progress, `btn_view_opportunity` for new) is
clipped off-screen. Both branches' `when: visible: { id: btn_*,
below: text: ${OPP_NAME} }` guards then evaluate to false and the
in-body `scrollUntilVisible` (which would scroll the button into
view) never fires. Catch-22.

Fix: after the unconditional title-scroll, add two `runFlow when:`
blocks that scroll the per-branch button into view. Whichever button
the tile actually carries is scrolled; the other branch's runFlow
no-ops because its `when:` guard matches nothing. Both Branch A and
Branch B guards downstream resolve correctly afterwards.

Live reproducer: malaria-itn-fgd run 20260515-1645 Phase 6 attempt
11; screenshot at .maestro/tests/2026-05-18_100623/.

Third PR in the claim-opp recipe series (PR #346 = title scroll;
PR #348 = post-start wait; this = post-title button scroll). Class
also flagged in skills/app-screenshot-capture/SKILL.md § Step 4
"OPP_NAME uniqueness assumption" as a durable cross-recipe pattern.

## Bug B: adb-daemon-wedge heal pre-step (ensureAvdRunning)

Same precondition-restore class as the cold-boot itself (CLAUDE.md
§ "Phase preconditions are restored, not adapted") — one layer
lower: qemu+adb daemon state, not AVD content state. Three observed
instances on malaria-itn-fgd/20260515-1645:

- Attempt 8: post-PR-#345 boot-wait fix passed but a wedged adb
  daemon independent of the wait still 500'd; manual `adb
  kill-server` + `adb start-server` cleared it.
- Attempt 10: 2 orphan qemu-system-aarch64 processes + 3 stale adb
  daemons from prior crashed boots — `package service` never bound;
  3 consecutive `mobile_ensure_avd_running` failures.
- Attempt 11: even after the parent manually swept qemu+adb, the
  first in-dispatch `ensureAvdRunning` call still threw the same
  signature until a second `kill-server`/`start-server` fired inside
  the dispatch — proving the heal itself needs to own daemon
  restoration, not the operator.

Fix: new `sweepStaleEmulatorState()` runs as the FIRST step of
`ensureAvdRunning`. Two-part contract:
  1. Orphan qemu sweep — pgrep qemu-system PIDs; if `adb devices`
     shows zero live emulators, every qemu PID is by definition
     orphan, kill -9. Conservative on mixed-state boxes (only kills
     excess when pgrep PIDs ≥ adb devices + 2). Logs structured
     info per kill so future debug avoids the manual `pgrep -af
     qemu` step.
  2. Always run `adb kill-server` + `adb start-server`. Cheap
     (~500ms), resets the daemon to a known-clean state regardless
     of whether we detected wedge symptoms. Same trade-off as the
     cold-boot widening — pay a small fixed cost for guaranteed
     clean state, avoid junk-state debug cycles.

Both steps are best-effort: every shell call swallows its error.
The cold-boot funnel still proceeds if sweeping fails.

## Tests

- `test/mcp/mobile/static-recipe-invariants.test.ts`: new invariant
  asserting both per-branch button-anchored scrollUntilVisibles
  exist BEFORE the `# --- BRANCH A:` marker, each scoped to
  `below: text: ${OPP_NAME}`.
- `test/mcp/mobile/avd.test.ts`: three new tests on
  `sweepStaleEmulatorState`:
  - adb daemon restart fires before listAvds + AVD-specific emu kill.
  - orphan qemu PIDs are killed when adb sees no devices.
  - pgrep/kill failures are tolerated (best-effort, no propagation).

Full suite: 1235 passing, up from 1230 pre-PR.
@jjackson jjackson enabled auto-merge May 18, 2026 14:24
@jjackson jjackson merged commit de64f82 into main May 18, 2026
2 checks passed
jjackson added a commit that referenced this pull request May 19, 2026
…step ordering

Two orthogonal bugs surfaced on malaria-itn-fgd/20260515-1645 Phase 6
attempt 12. Both unrelated to PR #354's Learn-Deliver chain scope.

Bug A — learn-tap-module.yaml Branch-B over-fires when form-name != module-name:
The same-name-suppressed-auto-skip branch guarded only on
`screen_suite_menu_list visible AND nav_btn_next NOT visible`, then re-tapped
`text:${MODULE_NAME}`. On J1 (module "Briefing Acknowledgement" → form
"Acknowledge Readiness"), the form-list screen's toolbar still reads
${MODULE_NAME}, so Branch-B fired and re-tapped the non-tappable toolbar
TextView instead of the form row. Fix: nested runFlow with inner guard
`visible: text: ${MODULE_NAME}, below: id: screen_suite_menu_list` so the
text-match is scoped to the list body. When form-name != module-name no
body row matches, Branch-B skips, caller's next learn-tap-module(FORM_NAME)
drills the form row by its own label.

Bug B — sweepStaleEmulatorState ordering (PR #349 follow-up):
PR #349 wired up the orphan-qemu sweep and adb-server restart but
(a) used a conservative `qemuPids.length >= liveCount + 2` kill threshold
that left the attempt-12 signature (2 orphan qemu + 1 stale adb-devices
entry, 2 NOT >= 1+2) below the kill bar, and (b) ran the adb-restart
immediately after the kill without waiting for the kernel to release the
emulator-NNNN TCP sockets — letting the freshly-restarted daemon adopt
the wedged-port state. Result: 2 of the next 3 ensureAvdRunning calls
still failed with "package service did not bind" until a second manual
adb kill-server/start-server fired inside the dispatch. Fix:
  1. Loosen the orphan kill to fire whenever qemuPids.length > liveCount.
  2. Add a 500ms socket-release wait between the last orphan SIGKILL and
     adb kill-server.

Tests:
  - static-recipe-invariants.test.ts asserts Branch-B's inner runFlow has
    `visible: text: ${MODULE_NAME}, below: id: screen_suite_menu_list`.
  - avd.test.ts asserts orphan-qemu kill precedes adb kill-server AND
    there's a ≥400ms gap between them (we wait 500ms, 100ms scheduling
    slack). Second test asserts the loosened threshold actually fires
    kills on the attempt-12 2-PID + 1-live-device signature.

Live reproducer: /Users/jjackson/.maestro/tests/2026-05-19_151650/
  screenshot-❌-1779218294468-(chunk-0.yaml).png.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant