Context
Backfilling the screenshot chain on leep-paint-collection/20260506-1440 after the 0.13.47 input-completeness pre-flight fix surfaced four cascading blockers. Task 1 (app-test-cases) succeeded — 4 validated Maestro recipes are now under 2-commcare/recipes/. Tasks 2 (app-screenshot-capture) and 3 (training-deck-build) are blocked by the four issues below. The first one is load-bearing; the other three are independent platform gaps that surfaced during the same run.
Filed as a single issue because they're discovered together; can be split into children if useful.
🚨 1. Connect → Learn handoff fails on the AVD ("Failed to start learning")
Symptom
On the local AVD running Connect (post-claim, LEEP opp visible in claimed list), tapping btn_start on the LEEP opp detail produces an on-screen banner reading "Failed to start learning". Reproduces consistently. Both J1 (Deliver smoke) and J4 (Learn smoke) recipes depend on entering the Learn app first, so neither can capture screenshots.
Local evidence PNG: /tmp/ace-screenshots/leep-paint-collection-20260506-1440/_probe-start2/after-tap-start-by-point.png.
Where
Connect Android client → CCHQ Learn-app fetch → in-device launch path. The Connect opportunity is correctly wired to the released CCHQ apps:
- Connect opp
f14d8c5d-8859-4d0c-8952-8a6a30d06c43 has learn_app.cc_app_id = 0506ae3aae3c4d73ab92e329e5d843a0 and deliver_app.cc_app_id = 76266ff1fce44a859ffa2a395797b7c5 (verified via connect_get_opportunity).
- Phase 2 reported both apps released to v1 with build IDs
5b9443748d2a4b26a4826aff14a80741 (Learn) / d301692229064f6ab765638517234476 (Deliver).
- CCZ marker counts greppped from the released CCZs at deploy time: Learn = 8
learn_module + 8 assessment; Deliver = 5 deliver_unit. Markers are structurally present.
So the wire-up looks right by metadata, but Connect can't actually launch the Learn app at runtime.
Hypotheses (order of likelihood)
- CCHQ App-Editor permission gap on the Connect API key user — the HQ API key Connect uses to fetch the CCZ may not have access to a released build for
connect-ace-prod. app-release SKILL says the standard Admin role includes edit_apps, but the CCHQ user backing Connect's API call may be a different user without that role.
- CCZ format/version mismatch between Nova-built apps and what Connect expects — Nova's autobuild emits CommCare 2.62.0+ XForms with the Connect
connect.learn_module blocks, but the released CCZ may be missing a header field (e.g. commcare_app_type=learn or connect_app_id) that Connect uses to dispatch the launch.
- Connect cached the Learn-app metadata from an earlier run — this opp is on a Connect program that's seen 5 prior runs (
opp.yaml.runs) including some that explicitly blocked. Connect may be holding a stale learn_app_id or build id that doesn't match the one we just released.
- The released build is a multi-app upload artifact —
nova_upload_to_hq always creates a fresh HQ app document (no atomic update); each Phase 2 re-upload bumps the HQ app id. The opp record was created in this run pointing at the freshly-uploaded ids, so this should be correct, but worth verifying nothing else re-uploaded between Phase 2 and now.
Deep-dive plan
adb logcat on the AVD while reproducing the tap. Exception class + message will narrow to (a) network / auth, (b) parse error, or (c) CommCare runtime error.
- Curl the Learn CCZ as the Connect API user — verify the released build is fetchable end-to-end with the same auth Connect uses.
- Inspect the Connect opp's HTML/admin view for any "broken-app" diagnostic Connect surfaces.
- Compare the leep Learn CCZ to a known-working ACE Learn CCZ (e.g. turmeric's, if one exists) for header / manifest differences.
Proposed fix shape
Depends on root cause from the deep-dive. Most likely candidates:
- App-permission fix: add an explicit App-Editor role grant for the Connect-side API user during
app-release, or document the prerequisite in connect-opp-setup SKILL.md so Phase 3 verifies it before claiming the wire-up is complete.
- Cache-invalidation: call a Connect "refresh apps" endpoint as part of
connect-opp-setup after the wire-up so the per-FLW client doesn't see stale data.
- CCZ header fix: if Nova's autobuild is missing a Connect-required field, file upstream against
voidcraft-labs/nova-plugin and ship a Phase 2 patch that injects the field via commcare_patch_xform until upstream lands.
2. Recipe selectors are calibrated against an imagined schema, not the live app
Symptom
Phase 2 app-test-cases produced recipes whose tapOn:text strings are e.g. "L0 — Why this matters", "F1 — Shop Registration", "Stage 1 — Market Analysis" — calibrated against the brief sent to Nova's autobuild. The deployed app actually renders "1. Why this matters", "Stage 1: shop visits & interviews" etc. (Nova/CommCare's own ordering + label conventions). Recipe text matchers will never hit live app screens, even on a working Connect→Learn handoff.
mobile_validate_recipe accepts these recipes — it's a static lint that doesn't execute against the AVD. It can verify selector syntax is well-formed but not that the strings exist on a screen.
Where
skills/app-test-cases/SKILL.md Step 3 (recipe composition). The skill reads Nova get_app/get_form for IDs but uses the brief's labels for text matchers. Live label rendering is determined by Nova's scaffold + CommCare's app-editor, not by the brief.
Proposed fix
Two tracks; do both:
- Read live labels from
get_form's response. The form response carries each field's label as Nova would render it. Use those strings in tapOn:text matchers instead of the brief's strings. Eliminates the imagined-vs-live drift at composition time.
- Add a runtime smoke validator. Extend
app-test-cases SKILL with a new optional Step 4: after writing recipes, boot the AVD (if mobile bootstrap is healthy) and dry-run each smoke recipe with mobile_run_recipe's validation mode. Selectors that don't resolve fail the SKILL with a structured error pointing at the offending recipe + step. Feature-flagged so non-mobile-bootstrapped operators can opt out.
3. ace-gdrive MCP has no atom for setting "anyone with link" permission
Symptom
app-screenshot-capture SKILL.md Step 5 marks this CRITICAL: "after uploading each PNG, set its sharing permission to anyone-with-link (role: reader) via drive.permissions.create. Slides' createImage (used by training-deck-build) fetches PNGs via Google's image-import service, which doesn't carry the SA's auth — so an SA-only file gets 'image cannot be reached' and the deck slide comes out blank."
But there's no drive_set_permission / drive_set_anyone_with_link atom in ace-gdrive. The SKILL contract is unfulfillable through the MCP today.
Where
mcp/google-drive-server.ts. Either:
- (a) Add a new atom
drive_set_anyone_with_link(fileId) that wraps drive.permissions.create({fileId, role: 'reader', type: 'anyone'}).
- (b) Auto-set anyone-with-link inside
drive_upload_binary for any file uploaded under a Phase 5 screenshots subfolder (heuristic on parentFolderId or via an explicit share: 'anyone-with-link' parameter).
(b) is more friendly to skill authors; (a) is more orthogonal. Either ships in the same change.
Proposed fix
Add explicit drive_set_anyone_with_link(fileId) atom in ace-gdrive, plus a shareAnyoneWithLink: boolean = false optional parameter to drive_upload_binary that calls the same permission-setter inline. app-screenshot-capture switches to drive_upload_binary({..., shareAnyoneWithLink: true}) for the screenshot uploads.
4. connect-claim-opp.yaml static recipe doesn't pin to OPP_NAME
Symptom
The static prerequisite recipe at mcp/mobile/recipes/static/connect-claim-opp.yaml taps the first opp card on the AVD's Connect home, regardless of which opp the run intends. With multiple visible opps (the LEEP one + stale turmerics from prior runs), the recipe grabs whichever sorts first — which can be the wrong opp without any error surfaced. Confirmed live during this run.
Where
mcp/mobile/recipes/static/connect-claim-opp.yaml.
Proposed fix
Take OPP_NAME as an env var (substituted via mobile_run_recipe's envVars param) and use tapOn: with text: matching the full opp name, with visibilityPercentage: 30 so multi-line cards still match. Add an assertion step before the tap that the LEEP opp card is visible — fail loud rather than silently grab the wrong one.
Run-level context
- Run:
ACE/leep-paint-collection/runs/20260506-1440/ (Drive)
- Phase 5 verdicts (incomplete):
5-qa-and-training/app-screenshot-capture_verdict.yaml, _verdict-shallow.yaml
- Block doc:
5-qa-and-training/app-screenshot-capture_block.md (Drive id 11bV4qx0TgWPETjbNw07r71Pl-Swn_xj2i7QpSynPhS8)
- ACE plugin version: 0.13.47
- Operator: jjackson@dimagi.com
Context
Backfilling the screenshot chain on
leep-paint-collection/20260506-1440after the 0.13.47 input-completeness pre-flight fix surfaced four cascading blockers. Task 1 (app-test-cases) succeeded — 4 validated Maestro recipes are now under2-commcare/recipes/. Tasks 2 (app-screenshot-capture) and 3 (training-deck-build) are blocked by the four issues below. The first one is load-bearing; the other three are independent platform gaps that surfaced during the same run.Filed as a single issue because they're discovered together; can be split into children if useful.
🚨 1. Connect → Learn handoff fails on the AVD ("Failed to start learning")
Symptom
On the local AVD running Connect (post-claim, LEEP opp visible in claimed list), tapping
btn_starton the LEEP opp detail produces an on-screen banner reading "Failed to start learning". Reproduces consistently. Both J1 (Deliver smoke) and J4 (Learn smoke) recipes depend on entering the Learn app first, so neither can capture screenshots.Local evidence PNG:
/tmp/ace-screenshots/leep-paint-collection-20260506-1440/_probe-start2/after-tap-start-by-point.png.Where
Connect Android client → CCHQ Learn-app fetch → in-device launch path. The Connect opportunity is correctly wired to the released CCHQ apps:
f14d8c5d-8859-4d0c-8952-8a6a30d06c43haslearn_app.cc_app_id = 0506ae3aae3c4d73ab92e329e5d843a0anddeliver_app.cc_app_id = 76266ff1fce44a859ffa2a395797b7c5(verified viaconnect_get_opportunity).5b9443748d2a4b26a4826aff14a80741(Learn) /d301692229064f6ab765638517234476(Deliver).learn_module+ 8assessment; Deliver = 5deliver_unit. Markers are structurally present.So the wire-up looks right by metadata, but Connect can't actually launch the Learn app at runtime.
Hypotheses (order of likelihood)
connect-ace-prod.app-releaseSKILL says the standard Admin role includesedit_apps, but the CCHQ user backing Connect's API call may be a different user without that role.connect.learn_moduleblocks, but the released CCZ may be missing a header field (e.g.commcare_app_type=learnorconnect_app_id) that Connect uses to dispatch the launch.opp.yaml.runs) including some that explicitly blocked. Connect may be holding a stalelearn_app_idor build id that doesn't match the one we just released.nova_upload_to_hqalways creates a fresh HQ app document (no atomic update); each Phase 2 re-upload bumps the HQ app id. The opp record was created in this run pointing at the freshly-uploaded ids, so this should be correct, but worth verifying nothing else re-uploaded between Phase 2 and now.Deep-dive plan
adb logcaton the AVD while reproducing the tap. Exception class + message will narrow to (a) network / auth, (b) parse error, or (c) CommCare runtime error.Proposed fix shape
Depends on root cause from the deep-dive. Most likely candidates:
app-release, or document the prerequisite inconnect-opp-setupSKILL.md so Phase 3 verifies it before claiming the wire-up is complete.connect-opp-setupafter the wire-up so the per-FLW client doesn't see stale data.voidcraft-labs/nova-pluginand ship a Phase 2 patch that injects the field viacommcare_patch_xformuntil upstream lands.2. Recipe selectors are calibrated against an imagined schema, not the live app
Symptom
Phase 2
app-test-casesproduced recipes whosetapOn:textstrings are e.g. "L0 — Why this matters", "F1 — Shop Registration", "Stage 1 — Market Analysis" — calibrated against the brief sent to Nova's autobuild. The deployed app actually renders "1. Why this matters", "Stage 1: shop visits & interviews" etc. (Nova/CommCare's own ordering + label conventions). Recipe text matchers will never hit live app screens, even on a working Connect→Learn handoff.mobile_validate_recipeaccepts these recipes — it's a static lint that doesn't execute against the AVD. It can verify selector syntax is well-formed but not that the strings exist on a screen.Where
skills/app-test-cases/SKILL.mdStep 3 (recipe composition). The skill reads Novaget_app/get_formfor IDs but uses the brief's labels for text matchers. Live label rendering is determined by Nova's scaffold + CommCare's app-editor, not by the brief.Proposed fix
Two tracks; do both:
get_form's response. The form response carries each field'slabelas Nova would render it. Use those strings intapOn:textmatchers instead of the brief's strings. Eliminates the imagined-vs-live drift at composition time.app-test-casesSKILL with a new optional Step 4: after writing recipes, boot the AVD (if mobile bootstrap is healthy) and dry-run each smoke recipe withmobile_run_recipe's validation mode. Selectors that don't resolve fail the SKILL with a structured error pointing at the offending recipe + step. Feature-flagged so non-mobile-bootstrapped operators can opt out.3.
ace-gdriveMCP has no atom for setting "anyone with link" permissionSymptom
app-screenshot-captureSKILL.md Step 5 marks this CRITICAL: "after uploading each PNG, set its sharing permission to anyone-with-link (role: reader) viadrive.permissions.create. Slides'createImage(used bytraining-deck-build) fetches PNGs via Google's image-import service, which doesn't carry the SA's auth — so an SA-only file gets 'image cannot be reached' and the deck slide comes out blank."But there's no
drive_set_permission/drive_set_anyone_with_linkatom inace-gdrive. The SKILL contract is unfulfillable through the MCP today.Where
mcp/google-drive-server.ts. Either:drive_set_anyone_with_link(fileId)that wrapsdrive.permissions.create({fileId, role: 'reader', type: 'anyone'}).drive_upload_binaryfor any file uploaded under a Phase 5 screenshots subfolder (heuristic onparentFolderIdor via an explicitshare: 'anyone-with-link'parameter).(b) is more friendly to skill authors; (a) is more orthogonal. Either ships in the same change.
Proposed fix
Add explicit
drive_set_anyone_with_link(fileId)atom inace-gdrive, plus ashareAnyoneWithLink: boolean = falseoptional parameter todrive_upload_binarythat calls the same permission-setter inline.app-screenshot-captureswitches todrive_upload_binary({..., shareAnyoneWithLink: true})for the screenshot uploads.4.
connect-claim-opp.yamlstatic recipe doesn't pin toOPP_NAMESymptom
The static prerequisite recipe at
mcp/mobile/recipes/static/connect-claim-opp.yamltaps the first opp card on the AVD's Connect home, regardless of which opp the run intends. With multiple visible opps (the LEEP one + stale turmerics from prior runs), the recipe grabs whichever sorts first — which can be the wrong opp without any error surfaced. Confirmed live during this run.Where
mcp/mobile/recipes/static/connect-claim-opp.yaml.Proposed fix
Take
OPP_NAMEas an env var (substituted viamobile_run_recipe's envVars param) and usetapOn:withtext:matching the full opp name, withvisibilityPercentage: 30so multi-line cards still match. Add an assertion step before the tap that the LEEP opp card is visible — fail loud rather than silently grab the wrong one.Run-level context
ACE/leep-paint-collection/runs/20260506-1440/(Drive)5-qa-and-training/app-screenshot-capture_verdict.yaml,_verdict-shallow.yaml5-qa-and-training/app-screenshot-capture_block.md(Drive id11bV4qx0TgWPETjbNw07r71Pl-Swn_xj2i7QpSynPhS8)