Skip to content

Phase 5 screenshot chain: Connect→Learn handoff fails + 3 supporting platform gaps (from leep-paint-collection 20260506-1440) #115

@jjackson

Description

@jjackson

Context

Backfilling the screenshot chain on leep-paint-collection/20260506-1440 after the 0.13.47 input-completeness pre-flight fix surfaced four cascading blockers. Task 1 (app-test-cases) succeeded — 4 validated Maestro recipes are now under 2-commcare/recipes/. Tasks 2 (app-screenshot-capture) and 3 (training-deck-build) are blocked by the four issues below. The first one is load-bearing; the other three are independent platform gaps that surfaced during the same run.

Filed as a single issue because they're discovered together; can be split into children if useful.


🚨 1. Connect → Learn handoff fails on the AVD ("Failed to start learning")

Symptom

On the local AVD running Connect (post-claim, LEEP opp visible in claimed list), tapping btn_start on the LEEP opp detail produces an on-screen banner reading "Failed to start learning". Reproduces consistently. Both J1 (Deliver smoke) and J4 (Learn smoke) recipes depend on entering the Learn app first, so neither can capture screenshots.

Local evidence PNG: /tmp/ace-screenshots/leep-paint-collection-20260506-1440/_probe-start2/after-tap-start-by-point.png.

Where

Connect Android client → CCHQ Learn-app fetch → in-device launch path. The Connect opportunity is correctly wired to the released CCHQ apps:

  • Connect opp f14d8c5d-8859-4d0c-8952-8a6a30d06c43 has learn_app.cc_app_id = 0506ae3aae3c4d73ab92e329e5d843a0 and deliver_app.cc_app_id = 76266ff1fce44a859ffa2a395797b7c5 (verified via connect_get_opportunity).
  • Phase 2 reported both apps released to v1 with build IDs 5b9443748d2a4b26a4826aff14a80741 (Learn) / d301692229064f6ab765638517234476 (Deliver).
  • CCZ marker counts greppped from the released CCZs at deploy time: Learn = 8 learn_module + 8 assessment; Deliver = 5 deliver_unit. Markers are structurally present.

So the wire-up looks right by metadata, but Connect can't actually launch the Learn app at runtime.

Hypotheses (order of likelihood)

  1. CCHQ App-Editor permission gap on the Connect API key user — the HQ API key Connect uses to fetch the CCZ may not have access to a released build for connect-ace-prod. app-release SKILL says the standard Admin role includes edit_apps, but the CCHQ user backing Connect's API call may be a different user without that role.
  2. CCZ format/version mismatch between Nova-built apps and what Connect expects — Nova's autobuild emits CommCare 2.62.0+ XForms with the Connect connect.learn_module blocks, but the released CCZ may be missing a header field (e.g. commcare_app_type=learn or connect_app_id) that Connect uses to dispatch the launch.
  3. Connect cached the Learn-app metadata from an earlier run — this opp is on a Connect program that's seen 5 prior runs (opp.yaml.runs) including some that explicitly blocked. Connect may be holding a stale learn_app_id or build id that doesn't match the one we just released.
  4. The released build is a multi-app upload artifactnova_upload_to_hq always creates a fresh HQ app document (no atomic update); each Phase 2 re-upload bumps the HQ app id. The opp record was created in this run pointing at the freshly-uploaded ids, so this should be correct, but worth verifying nothing else re-uploaded between Phase 2 and now.

Deep-dive plan

  1. adb logcat on the AVD while reproducing the tap. Exception class + message will narrow to (a) network / auth, (b) parse error, or (c) CommCare runtime error.
  2. Curl the Learn CCZ as the Connect API user — verify the released build is fetchable end-to-end with the same auth Connect uses.
  3. Inspect the Connect opp's HTML/admin view for any "broken-app" diagnostic Connect surfaces.
  4. Compare the leep Learn CCZ to a known-working ACE Learn CCZ (e.g. turmeric's, if one exists) for header / manifest differences.

Proposed fix shape

Depends on root cause from the deep-dive. Most likely candidates:

  • App-permission fix: add an explicit App-Editor role grant for the Connect-side API user during app-release, or document the prerequisite in connect-opp-setup SKILL.md so Phase 3 verifies it before claiming the wire-up is complete.
  • Cache-invalidation: call a Connect "refresh apps" endpoint as part of connect-opp-setup after the wire-up so the per-FLW client doesn't see stale data.
  • CCZ header fix: if Nova's autobuild is missing a Connect-required field, file upstream against voidcraft-labs/nova-plugin and ship a Phase 2 patch that injects the field via commcare_patch_xform until upstream lands.

2. Recipe selectors are calibrated against an imagined schema, not the live app

Symptom

Phase 2 app-test-cases produced recipes whose tapOn:text strings are e.g. "L0 — Why this matters", "F1 — Shop Registration", "Stage 1 — Market Analysis" — calibrated against the brief sent to Nova's autobuild. The deployed app actually renders "1. Why this matters", "Stage 1: shop visits & interviews" etc. (Nova/CommCare's own ordering + label conventions). Recipe text matchers will never hit live app screens, even on a working Connect→Learn handoff.

mobile_validate_recipe accepts these recipes — it's a static lint that doesn't execute against the AVD. It can verify selector syntax is well-formed but not that the strings exist on a screen.

Where

skills/app-test-cases/SKILL.md Step 3 (recipe composition). The skill reads Nova get_app/get_form for IDs but uses the brief's labels for text matchers. Live label rendering is determined by Nova's scaffold + CommCare's app-editor, not by the brief.

Proposed fix

Two tracks; do both:

  1. Read live labels from get_form's response. The form response carries each field's label as Nova would render it. Use those strings in tapOn:text matchers instead of the brief's strings. Eliminates the imagined-vs-live drift at composition time.
  2. Add a runtime smoke validator. Extend app-test-cases SKILL with a new optional Step 4: after writing recipes, boot the AVD (if mobile bootstrap is healthy) and dry-run each smoke recipe with mobile_run_recipe's validation mode. Selectors that don't resolve fail the SKILL with a structured error pointing at the offending recipe + step. Feature-flagged so non-mobile-bootstrapped operators can opt out.

3. ace-gdrive MCP has no atom for setting "anyone with link" permission

Symptom

app-screenshot-capture SKILL.md Step 5 marks this CRITICAL: "after uploading each PNG, set its sharing permission to anyone-with-link (role: reader) via drive.permissions.create. Slides' createImage (used by training-deck-build) fetches PNGs via Google's image-import service, which doesn't carry the SA's auth — so an SA-only file gets 'image cannot be reached' and the deck slide comes out blank."

But there's no drive_set_permission / drive_set_anyone_with_link atom in ace-gdrive. The SKILL contract is unfulfillable through the MCP today.

Where

mcp/google-drive-server.ts. Either:

  • (a) Add a new atom drive_set_anyone_with_link(fileId) that wraps drive.permissions.create({fileId, role: 'reader', type: 'anyone'}).
  • (b) Auto-set anyone-with-link inside drive_upload_binary for any file uploaded under a Phase 5 screenshots subfolder (heuristic on parentFolderId or via an explicit share: 'anyone-with-link' parameter).

(b) is more friendly to skill authors; (a) is more orthogonal. Either ships in the same change.

Proposed fix

Add explicit drive_set_anyone_with_link(fileId) atom in ace-gdrive, plus a shareAnyoneWithLink: boolean = false optional parameter to drive_upload_binary that calls the same permission-setter inline. app-screenshot-capture switches to drive_upload_binary({..., shareAnyoneWithLink: true}) for the screenshot uploads.


4. connect-claim-opp.yaml static recipe doesn't pin to OPP_NAME

Symptom

The static prerequisite recipe at mcp/mobile/recipes/static/connect-claim-opp.yaml taps the first opp card on the AVD's Connect home, regardless of which opp the run intends. With multiple visible opps (the LEEP one + stale turmerics from prior runs), the recipe grabs whichever sorts first — which can be the wrong opp without any error surfaced. Confirmed live during this run.

Where

mcp/mobile/recipes/static/connect-claim-opp.yaml.

Proposed fix

Take OPP_NAME as an env var (substituted via mobile_run_recipe's envVars param) and use tapOn: with text: matching the full opp name, with visibilityPercentage: 30 so multi-line cards still match. Add an assertion step before the tap that the LEEP opp card is visible — fail loud rather than silently grab the wrong one.


Run-level context

  • Run: ACE/leep-paint-collection/runs/20260506-1440/ (Drive)
  • Phase 5 verdicts (incomplete): 5-qa-and-training/app-screenshot-capture_verdict.yaml, _verdict-shallow.yaml
  • Block doc: 5-qa-and-training/app-screenshot-capture_block.md (Drive id 11bV4qx0TgWPETjbNw07r71Pl-Swn_xj2i7QpSynPhS8)
  • ACE plugin version: 0.13.47
  • Operator: jjackson@dimagi.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions