test(e2e): tighten fake-passing assertions; add cold-start init smoke by ottovlotto · Pull Request #114 · paritytech/playground-cli

ottovlotto · 2026-05-04T13:11:26Z

Closes #112

Summary

Two-pass tighten-up of the E2E test suite, plus a new daily-cron CI job that catches the class of bug a CI runner can't see.

Pass 1 — fake-passing assertions

Several tests passed without exercising the path their name claimed:

/checking availability|deploy/i always matched because deploy is the command name itself.
exitCode-only assertions on failure paths — any unrelated crash satisfied them.
Generic word matches (/not found/i, /signer|init|log.?in/i) caught help text and stack traces, not the specific failure under test.

Fixed across deploy, mod, diagnostic, init, session to use literal checkpoint strings (Checking availability, No signer available, not found in registry, etc.) and pair output checks with exit-code checks.

Pass 2 — adversarial QA review

dot update now asserts on the actual outcome line, not just exit 0 (a silent no-op was previously invisible).
build tests now stage fixtures into a tempdir, removing a cross-file order dependency on frontend-only/dist. Error-path assertions tightened to exact strings.
init no-session / corrupted-session tests now fail loudly if init falls into the "Login skipped" branch instead of silently accepting it. Corrupt-session test additionally asserts the file on disk is unchanged.
diagnostic verbose-mode tests now assert on the verbose-only markers ([+<seconds>s] and [mem +<seconds>s]) so a regression where the env var is silently ignored is caught.
deploy:
- end-to-end test queries the registry independently and verifies the on-chain metadata CID matches what the CLI printed.
- re-deploy test mutates build output between deploys and asserts the second deploy produced a different CID — a silent no-op re-deploy would otherwise still print "Deploy complete" and pass.
- "availability runs before build/upload" now verifies the temporal ordering claimed in its name.
- multi-contract test renamed (the old name claimed plurality detection, but headless output doesn't surface per-contract names).
globalSetup SIGNER funding failure now throws by default. Set E2E_ALLOW_OFFLINE_SETUP=1 to opt back into lenient behaviour for local dev on a flaky network.

Cold-start init smoke job (new CI job, daily cron only)

The regular e2e job runs on a runner with rustup, IPFS, foundry already on PATH — so init's install code path is never exercised. paritytech/playground-app#118 (newly-installed rustup unreachable from the same init process) is exactly the class of bug this gap masks.

The new init-cold-smoke job runs dot init -y inside ubuntu:22.04 with nothing pre-installed beyond what's needed to bootstrap bun + pnpm. If init reaches "setup complete" with no failed rows, every tool was both installed AND reachable to the next step in the same process. Daily cron + manual dispatch only because it pulls full toolchains and takes several minutes.

Test-only PR + workflow change — no changeset needed.

Follow-ups (separate PRs)

Add Layer 2 E2E tests: session injection for phone-mode signing #50 — Layer 2 phone-mode session tests, still blocked on @polkadot-apps/terminal publishing the merged createTestSession helper. Two existing init/session assertions in this PR will be tightened further once that lands.
refactor(toolchain): extract post-install path-update logic for unit testability #115: refactor src/utils/toolchain.ts to expose post-install path-update logic as a pure function so unit tests can catch [BUG] Apps uploaded as moddable with private repos trigger failure #118-class regressions in milliseconds, complementing the daily container test.

Test plan

npx tsc --noEmit clean
Per-PR e2e job green
Manual workflow_dispatch of init-cold-smoke job green before merge

github-actions · 2026-05-04T14:48:15Z

Dev build ready — try this branch:

curl -fsSL https://raw.githubusercontent.com/paritytech/playground-cli/main/install.sh | VERSION=dev/becca/e2e-fixed-domains bash

github-actions · 2026-05-04T15:06:40Z

E2E Test Pass · ✅ PASS

Tag: e2e-ci-pr · Branch: becca/e2e-fixed-domains · Commit: 8a80ee5 · Run logs

Cell	Result	Time
`pr-preflight`	✅ PASS	2m29s
`pr-init-session`	✅ PASS	1m54s
`pr-deploy-frontend`	✅ PASS	8m10s
`pr-install`	✅ PASS	0m48s
`pr-deploy-foundry`	✅ PASS	1m56s
`pr-mod`	✅ PASS	1m33s
`${{ matrix.cell }}`	⏭️ SKIP	0m-1s
`${{ matrix.cell }}`	⏭️ SKIP	0m-1s

Sentry traces: view spans for this run

* fix(mod,deploy): reject private repos with a clear error message - dot deploy --modable now calls assertPublicGitHubRepo during preflight and throws ModablePreflightError for private or missing GitHub repos, preventing the obscure downstream failure - resolveDefaultBranch in dot mod detects 401/404 API responses and surfaces 'private or does not exist' instead of the misleading 'pin one in metadata.branch' hint - Covers both paths with unit tests * chore: fix biome formatting

Several tests were passing without actually exercising the path their name claimed. The common patterns: - Loose regexes like `/checking availability|deploy/i` always matched because "deploy" is the command name itself, so the test passed even when nothing of substance ran. - exitCode-only assertions on failure paths — any unrelated crash that produced a non-zero exit satisfied the check. - Generic word matches (`/not found/i`, `/signer|init|log.?in/i`) that could match help text, stack traces, or transient gateway errors rather than the specific failure mode under test. Fixes per file: - deploy.test.ts: replaced `/checking availability|deploy/i` with the literal "Checking availability" checkpoint (only printed after preflight succeeds). Tightened the mainnet, --contracts-no-project, and "domain taken by another account" tests with exit-code assertions and exact error fragments. - mod.test.ts: registry-miss now asserts the exact "not found in registry" wording plus the domain; signer-suggestion asserts the exact "No signer available" / "dot init" text rather than matching any of those words anywhere. - diagnostic.test.ts: DOT_DEPLOY_VERBOSE test switched off ALICE (unfunded → would abort before producing verbose output) onto SIGNER + --no-build so it actually reaches the availability check. - session.test.ts: "build does not touch sessions" now verifies both builds returned exit 0 (a crashed build can't write any file, so the previous version was tautological); "logout with no session" now asserts exitCode === 0. - init.test.ts: dropped `-y` from the session-detection tests — with -y, init skips the connect()/login block entirely, so the tests were really just verifying that toolchain installation in a fresh tempHome takes longer than 15s. Now they assert on the actual "Scan with the Polkadot mobile app" / "Login skipped" output. Added a new `dot init — toolchain detection` test that strips rustup/cargo/foundry from PATH and verifies init does NOT report ✓ rustup — catches detection regressions even though CI runners have toolchains pre-installed.

Second pass on the same fake-passing class of bug, this time catching weaknesses the previous tighten-up missed: - install.test.ts: `dot update` now asserts on the actual outcome line ("already on latest" / "Updated dot to") instead of just exitCode 0. A silent no-op update was previously indistinguishable from a real one. - build.test.ts: each test stages its fixture into a tempdir and runs the build there, instead of mutating the source-controlled fixture's dist/. This removes a hidden cross-file order dependency where build tests would delete the dist that later deploy --no-build tests needed. Error-path assertions tightened to the exact strings the fixture/runner emit ("Compile error: unexpected token", "No build strategy detected") rather than /error|fail|detect/. - session.test.ts: tightened "no signer" / "logout" assertions to the exact "No signer available" / "No account is signed in" strings. - init.test.ts: the no-session and corrupted-session tests now fail loudly when init falls into the "Login skipped" branch (login service unreachable from the runner) instead of silently accepting it as a pass — which was masking the QR path's testability gap. Corrupted-session test additionally asserts the corrupt file is unchanged on disk, catching a hypothetical regression where init "fixes" parse failures by deleting user data. - diagnostic.test.ts: both verbose-mode tests now assert on a verbose-only marker that proves the env var actually engaged its code path (`[+<seconds>s]` for DOT_DEPLOY_VERBOSE, `[mem +<seconds>s]` for DOT_MEMORY_TRACE). Previously the tests could pass with the env var silently ignored. - deploy.test.ts: - "frontend-only deploy completes end-to-end" now queries the registry independently after deploy and verifies the on-chain metadata CID matches what the CLI claimed it published. - "re-deploy same domain" now mutates the build output between the two deploys and asserts the second deploy produced a *different* metadata CID. A silently-no-op re-deploy would otherwise still print "Deploy complete" and pass. - "domain availability check runs before build/upload" now verifies the temporal ordering claimed in its name — availability banner must precede any build-runner header or storage-phase marker. - Multi-contract test renamed; it cannot prove plurality from headless output (logHeadlessEvent doesn't surface contract names), so the old name was misleading. - e2e/cli/setup/global.ts: SIGNER funding failure now throws by default with a clear message, instead of swallowing into a console.warn that let downstream deploy tests fail with cryptic "Invalid Payment" errors. Set E2E_ALLOW_OFFLINE_SETUP=1 to opt back into the lenient behaviour for local dev on a flaky network.

- Thread signer.address as defaultOrigin through resolveLiveContractAddresses so the CDM meta-registry lookup that resolves the live playground registry address uses the logged-in account. Previously fell through to the Alice dev fallback in @polkadot-apps/contracts and printed `[contracts] No origin configured — using dev fallback (Alice) for query dry-run` even when the user was signed in via dot init. - Lazy-probe the picked app's repository via assertPublicGitHubRepo() between picker dismount and SetupScreen mount. Catches the case where a publisher flips a repo to private after deploying as --modable; user gets a clean two-line console message instead of a mid-step StepRunner failure. The direct-domain path (`dot mod some-domain.dot`) still falls through to resolveDefaultBranch's existing 404/401 message. - Annotate the cli.mod.repo-check span with `cli.mod.repo` so Sentry dashboards can group failures by repository.

The regular e2e job runs on a GitHub Actions runner that already has rustup, Bun, IPFS, and friends on PATH. As a result, init's toolchain-install code path is never exercised in CI — regressions that only surface on a fresh user machine (e.g., paritytech/ playground-app#118, where newly-installed rustup wasn't reachable from the same init process) pass every PR silently. Add a sibling job that runs `dot init -y` inside ubuntu:22.04 with nothing pre-installed beyond what's strictly needed to bootstrap bun and pnpm. If init reaches "setup complete" with no failed dependency rows, every tool was both installed AND reachable to the next step in the same process — which is the exact contract #118 broke. Excluded from per-PR runs because it pulls full toolchains over the network and takes several minutes. Daily cron + manual dispatch only.

fix(mod): silence false origin warning and verify repo before download

Bring in 19 commits of e2e infrastructure work that landed since this branch was forked, including: - PR #68: dynamic registry contract address resolution at runtime - PR #78: e2e fixture fix to use the same live address (otherwise our registry-state assertions would query the snapshot contract while deploys write to the live one) - PR #113: rustup post-install path patching — the actual fix for the paritytech/playground-app#118 bug our cold-start smoke job was added to catch - Phase 5/6/7/8 e2e workflow restructuring (matrix-based cells, surface-failure helpers, post-release smoke, sticky PR comments) Conflict resolution notes: - e2e/cli/deploy.test.ts: kept main's runContractDeployTest helper + the new full-pipeline contract describes (foundry/hardhat/multi). Re-applied my QA tightening to the preflight tests (mainnet exact text + exit code, contracts-detected use the 'Checking availability' checkpoint, --contracts-no-project asserts exit code, availability- before-build verifies temporal ordering). Added registry-state verification to the frontend-only and re-deploy tests via the new getApp() fixture. DROPPED my 'already registered' tightening on the domain-taken test — in dev mode the availability check defers ownership to bulletin-deploy's preflight, so the rejection comes from the registry contract revert (Revive.ContractReverted), not the friendly availability message. Main's loose regex catches both paths correctly. - e2e/cli/mod.test.ts: kept main's split into 'dot mod — clone' and 'dot mod — registry miss' describes. Re-applied my exact-text tightening to both tests. - .github/workflows/e2e.yml: kept main's matrix-cell structure entirely, appended my init-cold-smoke job at the end (still daily cron + workflow_dispatch only).

… test - Bump init session-detection timeouts 15s → 25s. The connect()/QR flow can take >15s when the statement-store endpoint is slow, producing flaky 'expected QR prompt' failures even though the CLI behaviour is correct. - Pass `-y` to the toolchain-detection test (was reaching SIGTERM before the TUI rendered) so the dependency table renders within ~1s. The 12s execa timeout still terminates well before any real rustup install completes — exactly the window where we should see '· rustup' / '⠋ rustup', not '✓ rustup'. Verified locally: full offline-eligible subset (init, install, build, session) passes with E2E_ALLOW_OFFLINE_SETUP=1.

The previous version mutated `dist/index.html` between the two deploys and asserted the second deploy produced a different metadata CID. That assumption was wrong: `buildMetadata()` in src/utils/deploy/playground.ts only puts `{repository, readme}` into the metadata JSON — the deployed content's CID is NOT included. Mutating dist therefore had no effect on the metadata CID, both deploys produced the same value, and the test failed in CI's pr-deploy-frontend cell. Worse, the regression I was guarding against ("second deploy silently no-ops") isn't actually a regression: a same-CID re-publish is correct behaviour — the registry already has what the user wants. Keep the meaningful checks: exit 0, "Deploy complete", and an independent registry query that verifies the on-chain entry contains the CID the CLI claimed it published. That alone rules out the only real failure mode (CLI prints success but never sent the extrinsic).

The previous version of this test verified `exitCode === 0` and that a slug-prefix directory existed with a `package.json`. That assertion would pass even if `dot mod` invented an arbitrary `package.json` from thin air and silently skipped both setup.sh and the side effects of the source-download step. `dot mod`'s flow has three steps (see src/commands/mod/SetupScreen.tsx): 1. fetch app metadata 2. download source — extract tarball + writeDotJson + ignoreModLogs + stripPostinstall + createOptionalGitBaseline 3. run setup.sh (or warn if absent) Steps 2's side effects and step 3 weren't observed at all. Strengthen the test to assert on: - multiple specific files from the upstream paritytech/Rock-Paper- Scissors fixture (README.md, setup.sh, src/, vite.config.ts) — proves the GitHub tarball was actually fetched and extracted, not invented. - dot.json was written with the expected `domain` (relative slug-suffix path, per writeDotJson()) and `name` fields. - .gitignore contains the mod-log entries appended by ignoreModLogs(). - .dot-mod-setup.log exists with `[setup]`-prefixed content from the fixture's setup.sh — proves step 3 ran, not just that the file was copied across. Bumped the per-test timeout 180s → 240s to give setup.sh's `npm install` enough headroom on a cold cache. Same fixture, no new registration needed.

Hides the e2e-cli-* fixture domains from the public playground registry grid (visibility=0). Tests still publish and verify exactly as before; SIGNER still sees them under My Apps. Cleans up new test traffic only — the existing 9 visible entries need a one-off setVisibility flip.

- init-cold-smoke: switch runs-on to the parity-default conditional expression added in #120 (was the only ubuntu-latest literal left in the file). Tighten the failed-row grep: the previous " error" / " failed" substring match would also trip on benign output like "0 errors" and apt warnings; now anchor on "✕ "/"✗ " rendered by the dot TUI plus the literal "failed dependency" marker. Add a comment confirming `bun run src/index.ts` is intentional — it matches the entry path used by the e2e/cli/ vitest suite (per CLAUDE.md), and the published-binary install is covered separately by e2e-post-release.yml. - deploy.test.ts ordering check: the previous regex /\n>\s+\w/ also matches build-tool stdout that the CLI pipes through (pnpm prints `> @scope/pkg@1 build /path` for every run-script, vite logs `> built in 234ms`, etc.). Anchor on the dot CLI's exact banner format `\n> (pnpm|npm|yarn|bun|npx) ` from src/commands/build.ts:15 + src/utils/build/detect.ts so unrelated `> ` lines can't satisfy it. - init.test.ts toolchain detection: drop the timing-based 12s window in favour of forcing the rustup install to fail deterministically via HTTPS_PROXY=http://127.0.0.1:1 (curl's connection-refused is ~instant). The test's contract is that init should not falsely conclude rustup is installed when PATH is stripped — that no longer depends on whether a cache makes the real install fast. Cold-start install success is covered separately by the smoke job.

… just patterns The previous version stripped any PATH segment matching /cargo/, /rustup/, or /foundry/. That works on standard GitHub runners (rustup at ~/.cargo/bin) but fails on parity-default, where the rustup binary lives at a path that doesn't contain any of those tokens (e.g. /usr/local/bin). Result: commandExists("rustup") in init returns true even with PATH "stripped", and the test asserts ✕ rustup but observes ✓ rustup. Resolve each toolchain binary's actual location(s) via `which -a` up front and strip exactly those parent dirs. Keep the original pattern-strip as a belt-and-braces fallback for related dirs that don't contain a binary directly (e.g. an empty ~/.cargo/bin on a clean home).

The "detects rustup as missing when not on PATH" test failed in CI in a way that no PATH-strip approach can fix on this runner. Pattern: with PATH stripped of every directory we can resolve, `command -v rustup` inside init STILL returned 0 — but actually executing `rustup ...` returned 127 (command not found). That signature is consistent with parity-default having a rustup wrapper script in /usr/bin (or similar shared system path) that delegates into $HOME/.cargo/bin/rustup. We set HOME to a clean tempHome so the delegation target doesn't exist — the wrapper is found but exec fails. We can't strip /usr/bin without breaking sibling binaries init legitimately needs. The cold-start smoke job in .github/workflows/e2e.yml runs `dot init` in a fresh ubuntu:22.04 with no toolchain pre-installed. That's the correct environment to assert detection + install-then-use against, and it's already wired up (daily cron + workflow_dispatch). Drop the per-PR test and update the file's KNOWN GAP comment to point readers to the smoke job.

charlesHetterich · 2026-05-05T15:53:12Z

+                      # every tool was installed AND was reachable to the
+                      # next step in the same process — which is the exact
+                      # path config that #118 broke.
+                      bun run src/index.ts init -y 2>&1 | tee /tmp/init.log


We shouldn't be running the test itself inside of the ci yml file. This should just minimally setup and call to a typescript e2e test

The other important note on this line is that we should call dot init through the compiled bun binary (which is the same as what an end user would download & use).

There are some errors that only show up on the compiled bun binary, but don't when when we call bun run ...

charlesHetterich · 2026-05-05T15:58:16Z

 * - Corrupted session file handling
 * - Dev signer (--suri) bypasses session requirement
+ *
+ * KNOWN GAP — toolchain install paths (rustup, IPFS, foundry, cdm) are NOT


Hm i see this is why you have the one test directly in CI. I suppose for now it could be fine to keep that one test scenario as a CI job for simplicity

charlesHetterich

LGTM, we can merge and look at improving individual test cases in the future.

On the init-cold-smoke test in CI: on every PR, we have a bot comment that goes up such as:

curl -fsSL https://raw.githubusercontent.com/paritytech/playground-cli/main/install.sh | VERSION=dev/becca/e2e-fixed-domains bash

This is what the init-cold-smoke test should use instead.

This test should run after the CI job that updates the branch version of the CLI
install the CLI inside the test using that single curl command
run the dot init command

Please update the test to follow this flow then feel free to merge

Replaces the bun + pnpm + `bun run src/index.ts` setup with the same `curl … install.sh | VERSION=dev/<branch> bash` one-liner the dev-release bot posts on each PR, so the smoke exercises the SEA binary end users actually install (catches Bun-compile-only regressions that source-mode masked) and triggers via workflow_run after Dev Release succeeds. Schedule + workflow_dispatch keep working against the latest stable tag.

ottovlotto · 2026-05-05T17:10:01Z

Addressed in a41c046:

cold-smoke now installs via the curl … install.sh | VERSION=dev/<branch> bash one-liner, so it exercises the SEA binary an end user actually downloads (also covers the compiled-binary-only-bugs point from the earlier subthread)
new workflow_run trigger on Dev Release fires the smoke after the per-PR dev/<branch> tag is published
schedule + workflow_dispatch on main keep working; with no VERSION set, install.sh resolves the latest stable tag
dropped the workspace mount + checkout step — install.sh fetches the binary on its own, no source needed in the container

ensureTemplateRegistered() previously only verified the fixture exists and threw if missing — leaving the mod test red whenever the registry contract was redeployed. The mod test is read-only, so unlike the deploy fixtures (which re-publish on every run via the deploy tests themselves), the mod fixture had no self-healing path. Now: register-if-missing using the same SIGNER fundDeployerIfLow has already topped up. Same domain, same owner, idempotent re-publish; no fresh DotNS entry is ever burned, so the registry stays clean.

Mirrors the test-side --private flag added in #114 to the bootstrap tool. Without this, re-running register-e2e-fixtures.ts re-publishes fixtures as PUBLIC and undoes the visibility flip. CLI tests that hit these domains use direct getMetadataUri queries, which are unaffected by visibility — so no test changes needed.

This was referenced May 4, 2026

refactor(toolchain): extract post-install path-update logic for unit testability #115

Open

Add Layer 2 E2E tests: session injection for phone-mode signing #50

Open

ottovlotto requested review from EnderOfWorlds007 and charlesHetterich May 4, 2026 18:53

EnderOfWorlds007 and others added 12 commits May 5, 2026 14:26

chore: version packages

567ec43

Merge pull request #121 from paritytech/fix/mod-origin-and-repo-probe

a13568e

fix(mod): silence false origin warning and verify repo before download

Merge branch 'main' into becca/e2e-fixed-domains

8bb6b97

ottovlotto force-pushed the becca/e2e-fixed-domains branch from 4fe6328 to 8bb6b97 Compare May 5, 2026 12:35

ottovlotto added 6 commits May 5, 2026 14:37

Merge branch 'main' into becca/e2e-fixed-domains

fb1bded

Merge remote-tracking branch 'origin/main' into becca/e2e-fixed-domains

017f91f

charlesHetterich reviewed May 5, 2026

View reviewed changes

charlesHetterich approved these changes May 5, 2026

View reviewed changes

ottovlotto merged commit b147280 into main May 5, 2026
18 checks passed

ottovlotto deleted the becca/e2e-fixed-domains branch May 5, 2026 18:08

ottovlotto mentioned this pull request May 6, 2026

test(e2e): register fixtures with visibility=0 (private) #133

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): tighten fake-passing assertions; add cold-start init smoke#114

test(e2e): tighten fake-passing assertions; add cold-start init smoke#114
ottovlotto merged 20 commits intomainfrom
becca/e2e-fixed-domains

ottovlotto commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

charlesHetterich May 5, 2026

Uh oh!

charlesHetterich May 5, 2026

Uh oh!

charlesHetterich May 5, 2026

Uh oh!

charlesHetterich left a comment

Uh oh!

ottovlotto commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ottovlotto commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pass 1 — fake-passing assertions

Pass 2 — adversarial QA review

Cold-start init smoke job (new CI job, daily cron only)

Follow-ups (separate PRs)

Test plan

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Pass · ✅ PASS

Uh oh!

charlesHetterich May 5, 2026

Choose a reason for hiding this comment

Uh oh!

charlesHetterich May 5, 2026

Choose a reason for hiding this comment

Uh oh!

charlesHetterich May 5, 2026

Choose a reason for hiding this comment

Uh oh!

charlesHetterich left a comment

Choose a reason for hiding this comment

Uh oh!

ottovlotto commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ottovlotto commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading