Phase 5 screenshot chain: Connect→Learn handoff fails + 3 supporting platform gaps (from leep-paint-collection 20260506-1440)

## Context

Backfilling the screenshot chain on `leep-paint-collection/20260506-1440` after the [0.13.47 input-completeness pre-flight](https://github.com/jjackson/ace/pull/110) fix surfaced four cascading blockers. Task 1 (`app-test-cases`) succeeded — 4 validated Maestro recipes are now under `2-commcare/recipes/`. Tasks 2 (`app-screenshot-capture`) and 3 (`training-deck-build`) are blocked by the four issues below. The first one is load-bearing; the other three are independent platform gaps that surfaced during the same run.

Filed as a single issue because they're discovered together; can be split into children if useful.

---

## 🚨 1. Connect → Learn handoff fails on the AVD ("Failed to start learning")

### Symptom

On the local AVD running Connect (post-claim, LEEP opp visible in claimed list), tapping `btn_start` on the LEEP opp detail produces an on-screen banner reading **"Failed to start learning"**. Reproduces consistently. Both J1 (Deliver smoke) and J4 (Learn smoke) recipes depend on entering the Learn app first, so neither can capture screenshots.

Local evidence PNG: `/tmp/ace-screenshots/leep-paint-collection-20260506-1440/_probe-start2/after-tap-start-by-point.png`.

### Where

Connect Android client → CCHQ Learn-app fetch → in-device launch path. The Connect opportunity is correctly wired to the released CCHQ apps:
- Connect opp `f14d8c5d-8859-4d0c-8952-8a6a30d06c43` has `learn_app.cc_app_id = 0506ae3aae3c4d73ab92e329e5d843a0` and `deliver_app.cc_app_id = 76266ff1fce44a859ffa2a395797b7c5` (verified via `connect_get_opportunity`).
- Phase 2 reported both apps released to v1 with build IDs `5b9443748d2a4b26a4826aff14a80741` (Learn) / `d301692229064f6ab765638517234476` (Deliver).
- CCZ marker counts greppped from the released CCZs at deploy time: Learn = 8 `learn_module` + 8 `assessment`; Deliver = 5 `deliver_unit`. Markers are structurally present.

So the wire-up looks right by metadata, but Connect can't actually launch the Learn app at runtime.

### Hypotheses (order of likelihood)

1. **CCHQ App-Editor permission gap on the Connect API key user** — the HQ API key Connect uses to fetch the CCZ may not have access to a *released* build for `connect-ace-prod`. `app-release` SKILL says the standard Admin role includes `edit_apps`, but the CCHQ user backing Connect's API call may be a *different* user without that role.
2. **CCZ format/version mismatch between Nova-built apps and what Connect expects** — Nova's autobuild emits CommCare 2.62.0+ XForms with the Connect `connect.learn_module` blocks, but the released CCZ may be missing a header field (e.g. `commcare_app_type=learn` or `connect_app_id`) that Connect uses to dispatch the launch.
3. **Connect cached the Learn-app metadata from an earlier run** — this opp is on a Connect program that's seen 5 prior runs (`opp.yaml.runs`) including some that explicitly blocked. Connect may be holding a stale `learn_app_id` or build id that doesn't match the one we just released.
4. **The released build is a multi-app upload artifact** — `nova_upload_to_hq` always creates a *fresh* HQ app document (no atomic update); each Phase 2 re-upload bumps the HQ app id. The opp record was created in this run pointing at the freshly-uploaded ids, so this should be correct, but worth verifying nothing else re-uploaded between Phase 2 and now.

### Deep-dive plan

1. **`adb logcat` on the AVD** while reproducing the tap. Exception class + message will narrow to (a) network / auth, (b) parse error, or (c) CommCare runtime error.
2. **Curl the Learn CCZ as the Connect API user** — verify the released build is fetchable end-to-end with the same auth Connect uses.
3. **Inspect the Connect opp's HTML/admin view** for any "broken-app" diagnostic Connect surfaces.
4. **Compare the leep Learn CCZ to a known-working ACE Learn CCZ** (e.g. turmeric's, if one exists) for header / manifest differences.

### Proposed fix shape

Depends on root cause from the deep-dive. Most likely candidates:
- **App-permission fix:** add an explicit App-Editor role grant for the Connect-side API user during `app-release`, or document the prerequisite in `connect-opp-setup` SKILL.md so Phase 3 verifies it before claiming the wire-up is complete.
- **Cache-invalidation:** call a Connect "refresh apps" endpoint as part of `connect-opp-setup` after the wire-up so the per-FLW client doesn't see stale data.
- **CCZ header fix:** if Nova's autobuild is missing a Connect-required field, file upstream against `voidcraft-labs/nova-plugin` and ship a Phase 2 patch that injects the field via `commcare_patch_xform` until upstream lands.

---

## 2. Recipe selectors are calibrated against an imagined schema, not the live app

### Symptom

Phase 2 `app-test-cases` produced recipes whose `tapOn:text` strings are e.g. **"L0 — Why this matters"**, **"F1 — Shop Registration"**, **"Stage 1 — Market Analysis"** — calibrated against the brief sent to Nova's autobuild. The deployed app actually renders **"1. Why this matters"**, **"Stage 1: shop visits & interviews"** etc. (Nova/CommCare's own ordering + label conventions). Recipe text matchers will never hit live app screens, even on a working Connect→Learn handoff.

`mobile_validate_recipe` accepts these recipes — it's a static lint that doesn't execute against the AVD. It can verify selector syntax is well-formed but not that the strings exist on a screen.

### Where

`skills/app-test-cases/SKILL.md` Step 3 (recipe composition). The skill reads Nova `get_app`/`get_form` for IDs but uses the brief's labels for text matchers. Live label rendering is determined by Nova's scaffold + CommCare's app-editor, not by the brief.

### Proposed fix

Two tracks; do both:

1. **Read live labels from `get_form`'s response.** The form response carries each field's `label` as Nova would render it. Use those strings in `tapOn:text` matchers instead of the brief's strings. Eliminates the imagined-vs-live drift at composition time.
2. **Add a runtime smoke validator.** Extend `app-test-cases` SKILL with a new optional Step 4: after writing recipes, boot the AVD (if mobile bootstrap is healthy) and dry-run each smoke recipe with `mobile_run_recipe`'s validation mode. Selectors that don't resolve fail the SKILL with a structured error pointing at the offending recipe + step. Feature-flagged so non-mobile-bootstrapped operators can opt out.

---

## 3. `ace-gdrive` MCP has no atom for setting "anyone with link" permission

### Symptom

`app-screenshot-capture` SKILL.md Step 5 marks this CRITICAL: "after uploading each PNG, set its sharing permission to anyone-with-link (role: reader) via `drive.permissions.create`. Slides' `createImage` (used by `training-deck-build`) fetches PNGs via Google's image-import service, which doesn't carry the SA's auth — so an SA-only file gets 'image cannot be reached' and the deck slide comes out blank."

But there's no `drive_set_permission` / `drive_set_anyone_with_link` atom in `ace-gdrive`. The SKILL contract is unfulfillable through the MCP today.

### Where

`mcp/google-drive-server.ts`. Either:
- (a) Add a new atom `drive_set_anyone_with_link(fileId)` that wraps `drive.permissions.create({fileId, role: 'reader', type: 'anyone'})`.
- (b) Auto-set anyone-with-link inside `drive_upload_binary` for any file uploaded under a Phase 5 screenshots subfolder (heuristic on `parentFolderId` or via an explicit `share: 'anyone-with-link'` parameter).

(b) is more friendly to skill authors; (a) is more orthogonal. Either ships in the same change.

### Proposed fix

Add explicit `drive_set_anyone_with_link(fileId)` atom in `ace-gdrive`, plus a `shareAnyoneWithLink: boolean = false` optional parameter to `drive_upload_binary` that calls the same permission-setter inline. `app-screenshot-capture` switches to `drive_upload_binary({..., shareAnyoneWithLink: true})` for the screenshot uploads.

---

## 4. `connect-claim-opp.yaml` static recipe doesn't pin to `OPP_NAME`

### Symptom

The static prerequisite recipe at `mcp/mobile/recipes/static/connect-claim-opp.yaml` taps the first opp card on the AVD's Connect home, regardless of which opp the run intends. With multiple visible opps (the LEEP one + stale turmerics from prior runs), the recipe grabs whichever sorts first — which can be the wrong opp without any error surfaced. Confirmed live during this run.

### Where

`mcp/mobile/recipes/static/connect-claim-opp.yaml`.

### Proposed fix

Take `OPP_NAME` as an env var (substituted via `mobile_run_recipe`'s envVars param) and use `tapOn:` with `text:` matching the full opp name, with `visibilityPercentage: 30` so multi-line cards still match. Add an assertion step before the tap that the LEEP opp card is visible — fail loud rather than silently grab the wrong one.

---

## Run-level context

- Run: `ACE/leep-paint-collection/runs/20260506-1440/` (Drive)
- Phase 5 verdicts (incomplete): `5-qa-and-training/app-screenshot-capture_verdict.yaml`, `_verdict-shallow.yaml`
- Block doc: `5-qa-and-training/app-screenshot-capture_block.md` (Drive id `11bV4qx0TgWPETjbNw07r71Pl-Swn_xj2i7QpSynPhS8`)
- ACE plugin version: 0.13.47
- Operator: jjackson@dimagi.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 5 screenshot chain: Connect→Learn handoff fails + 3 supporting platform gaps (from leep-paint-collection 20260506-1440) #115

Context

🚨 1. Connect → Learn handoff fails on the AVD ("Failed to start learning")

Symptom

Where

Hypotheses (order of likelihood)

Deep-dive plan

Proposed fix shape

2. Recipe selectors are calibrated against an imagined schema, not the live app

Symptom

Where

Proposed fix

3. `ace-gdrive` MCP has no atom for setting "anyone with link" permission

Symptom

Where

Proposed fix

4. `connect-claim-opp.yaml` static recipe doesn't pin to `OPP_NAME`

Symptom

Where

Proposed fix

Run-level context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Phase 5 screenshot chain: Connect→Learn handoff fails + 3 supporting platform gaps (from leep-paint-collection 20260506-1440) #115

Description

Context

🚨 1. Connect → Learn handoff fails on the AVD ("Failed to start learning")

Symptom

Where

Hypotheses (order of likelihood)

Deep-dive plan

Proposed fix shape

2. Recipe selectors are calibrated against an imagined schema, not the live app

Symptom

Where

Proposed fix

3. ace-gdrive MCP has no atom for setting "anyone with link" permission

Symptom

Where

Proposed fix

4. connect-claim-opp.yaml static recipe doesn't pin to OPP_NAME

Symptom

Where

Proposed fix

Run-level context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

3. `ace-gdrive` MCP has no atom for setting "anyone with link" permission

4. `connect-claim-opp.yaml` static recipe doesn't pin to `OPP_NAME`