test(e2e): tighten fake-passing assertions; add cold-start init smoke#114
test(e2e): tighten fake-passing assertions; add cold-start init smoke#114ottovlotto merged 20 commits intomainfrom
Conversation
|
Dev build ready — try this branch: |
E2E Test Pass · ✅ PASSTag:
Sentry traces: view spans for this run |
* fix(mod,deploy): reject private repos with a clear error message - dot deploy --modable now calls assertPublicGitHubRepo during preflight and throws ModablePreflightError for private or missing GitHub repos, preventing the obscure downstream failure - resolveDefaultBranch in dot mod detects 401/404 API responses and surfaces 'private or does not exist' instead of the misleading 'pin one in metadata.branch' hint - Covers both paths with unit tests * chore: fix biome formatting
Several tests were passing without actually exercising the path their name claimed. The common patterns: - Loose regexes like `/checking availability|deploy/i` always matched because "deploy" is the command name itself, so the test passed even when nothing of substance ran. - exitCode-only assertions on failure paths — any unrelated crash that produced a non-zero exit satisfied the check. - Generic word matches (`/not found/i`, `/signer|init|log.?in/i`) that could match help text, stack traces, or transient gateway errors rather than the specific failure mode under test. Fixes per file: - deploy.test.ts: replaced `/checking availability|deploy/i` with the literal "Checking availability" checkpoint (only printed after preflight succeeds). Tightened the mainnet, --contracts-no-project, and "domain taken by another account" tests with exit-code assertions and exact error fragments. - mod.test.ts: registry-miss now asserts the exact "not found in registry" wording plus the domain; signer-suggestion asserts the exact "No signer available" / "dot init" text rather than matching any of those words anywhere. - diagnostic.test.ts: DOT_DEPLOY_VERBOSE test switched off ALICE (unfunded → would abort before producing verbose output) onto SIGNER + --no-build so it actually reaches the availability check. - session.test.ts: "build does not touch sessions" now verifies both builds returned exit 0 (a crashed build can't write any file, so the previous version was tautological); "logout with no session" now asserts exitCode === 0. - init.test.ts: dropped `-y` from the session-detection tests — with -y, init skips the connect()/login block entirely, so the tests were really just verifying that toolchain installation in a fresh tempHome takes longer than 15s. Now they assert on the actual "Scan with the Polkadot mobile app" / "Login skipped" output. Added a new `dot init — toolchain detection` test that strips rustup/cargo/foundry from PATH and verifies init does NOT report ✓ rustup — catches detection regressions even though CI runners have toolchains pre-installed.
Second pass on the same fake-passing class of bug, this time catching
weaknesses the previous tighten-up missed:
- install.test.ts: `dot update` now asserts on the actual outcome line
("already on latest" / "Updated dot to") instead of just exitCode 0.
A silent no-op update was previously indistinguishable from a real
one.
- build.test.ts: each test stages its fixture into a tempdir and runs
the build there, instead of mutating the source-controlled fixture's
dist/. This removes a hidden cross-file order dependency where build
tests would delete the dist that later deploy --no-build tests
needed. Error-path assertions tightened to the exact strings the
fixture/runner emit ("Compile error: unexpected token", "No build
strategy detected") rather than /error|fail|detect/.
- session.test.ts: tightened "no signer" / "logout" assertions to the
exact "No signer available" / "No account is signed in" strings.
- init.test.ts: the no-session and corrupted-session tests now fail
loudly when init falls into the "Login skipped" branch (login
service unreachable from the runner) instead of silently accepting
it as a pass — which was masking the QR path's testability gap.
Corrupted-session test additionally asserts the corrupt file is
unchanged on disk, catching a hypothetical regression where init
"fixes" parse failures by deleting user data.
- diagnostic.test.ts: both verbose-mode tests now assert on a
verbose-only marker that proves the env var actually engaged its
code path (`[+<seconds>s]` for DOT_DEPLOY_VERBOSE,
`[mem +<seconds>s]` for DOT_MEMORY_TRACE). Previously the tests
could pass with the env var silently ignored.
- deploy.test.ts:
- "frontend-only deploy completes end-to-end" now queries the
registry independently after deploy and verifies the on-chain
metadata CID matches what the CLI claimed it published.
- "re-deploy same domain" now mutates the build output between the
two deploys and asserts the second deploy produced a *different*
metadata CID. A silently-no-op re-deploy would otherwise still
print "Deploy complete" and pass.
- "domain availability check runs before build/upload" now verifies
the temporal ordering claimed in its name — availability banner
must precede any build-runner header or storage-phase marker.
- Multi-contract test renamed; it cannot prove plurality from
headless output (logHeadlessEvent doesn't surface contract names),
so the old name was misleading.
- e2e/cli/setup/global.ts: SIGNER funding failure now throws by default
with a clear message, instead of swallowing into a console.warn that
let downstream deploy tests fail with cryptic "Invalid Payment"
errors. Set E2E_ALLOW_OFFLINE_SETUP=1 to opt back into the lenient
behaviour for local dev on a flaky network.
- Thread signer.address as defaultOrigin through resolveLiveContractAddresses so the CDM meta-registry lookup that resolves the live playground registry address uses the logged-in account. Previously fell through to the Alice dev fallback in @polkadot-apps/contracts and printed `[contracts] No origin configured — using dev fallback (Alice) for query dry-run` even when the user was signed in via dot init. - Lazy-probe the picked app's repository via assertPublicGitHubRepo() between picker dismount and SetupScreen mount. Catches the case where a publisher flips a repo to private after deploying as --modable; user gets a clean two-line console message instead of a mid-step StepRunner failure. The direct-domain path (`dot mod some-domain.dot`) still falls through to resolveDefaultBranch's existing 404/401 message. - Annotate the cli.mod.repo-check span with `cli.mod.repo` so Sentry dashboards can group failures by repository.
The regular e2e job runs on a GitHub Actions runner that already has rustup, Bun, IPFS, and friends on PATH. As a result, init's toolchain-install code path is never exercised in CI — regressions that only surface on a fresh user machine (e.g., paritytech/ playground-app#118, where newly-installed rustup wasn't reachable from the same init process) pass every PR silently. Add a sibling job that runs `dot init -y` inside ubuntu:22.04 with nothing pre-installed beyond what's strictly needed to bootstrap bun and pnpm. If init reaches "setup complete" with no failed dependency rows, every tool was both installed AND reachable to the next step in the same process — which is the exact contract #118 broke. Excluded from per-PR runs because it pulls full toolchains over the network and takes several minutes. Daily cron + manual dispatch only.
fix(mod): silence false origin warning and verify repo before download
Bring in 19 commits of e2e infrastructure work that landed since this branch was forked, including: - PR #68: dynamic registry contract address resolution at runtime - PR #78: e2e fixture fix to use the same live address (otherwise our registry-state assertions would query the snapshot contract while deploys write to the live one) - PR #113: rustup post-install path patching — the actual fix for the paritytech/playground-app#118 bug our cold-start smoke job was added to catch - Phase 5/6/7/8 e2e workflow restructuring (matrix-based cells, surface-failure helpers, post-release smoke, sticky PR comments) Conflict resolution notes: - e2e/cli/deploy.test.ts: kept main's runContractDeployTest helper + the new full-pipeline contract describes (foundry/hardhat/multi). Re-applied my QA tightening to the preflight tests (mainnet exact text + exit code, contracts-detected use the 'Checking availability' checkpoint, --contracts-no-project asserts exit code, availability- before-build verifies temporal ordering). Added registry-state verification to the frontend-only and re-deploy tests via the new getApp() fixture. DROPPED my 'already registered' tightening on the domain-taken test — in dev mode the availability check defers ownership to bulletin-deploy's preflight, so the rejection comes from the registry contract revert (Revive.ContractReverted), not the friendly availability message. Main's loose regex catches both paths correctly. - e2e/cli/mod.test.ts: kept main's split into 'dot mod — clone' and 'dot mod — registry miss' describes. Re-applied my exact-text tightening to both tests. - .github/workflows/e2e.yml: kept main's matrix-cell structure entirely, appended my init-cold-smoke job at the end (still daily cron + workflow_dispatch only).
… test - Bump init session-detection timeouts 15s → 25s. The connect()/QR flow can take >15s when the statement-store endpoint is slow, producing flaky 'expected QR prompt' failures even though the CLI behaviour is correct. - Pass `-y` to the toolchain-detection test (was reaching SIGTERM before the TUI rendered) so the dependency table renders within ~1s. The 12s execa timeout still terminates well before any real rustup install completes — exactly the window where we should see '· rustup' / '⠋ rustup', not '✓ rustup'. Verified locally: full offline-eligible subset (init, install, build, session) passes with E2E_ALLOW_OFFLINE_SETUP=1.
The previous version mutated `dist/index.html` between the two deploys
and asserted the second deploy produced a different metadata CID. That
assumption was wrong: `buildMetadata()` in src/utils/deploy/playground.ts
only puts `{repository, readme}` into the metadata JSON — the deployed
content's CID is NOT included. Mutating dist therefore had no effect on
the metadata CID, both deploys produced the same value, and the test
failed in CI's pr-deploy-frontend cell.
Worse, the regression I was guarding against ("second deploy silently
no-ops") isn't actually a regression: a same-CID re-publish is correct
behaviour — the registry already has what the user wants.
Keep the meaningful checks: exit 0, "Deploy complete", and an
independent registry query that verifies the on-chain entry contains
the CID the CLI claimed it published. That alone rules out the only
real failure mode (CLI prints success but never sent the extrinsic).
The previous version of this test verified `exitCode === 0` and that a
slug-prefix directory existed with a `package.json`. That assertion
would pass even if `dot mod` invented an arbitrary `package.json` from
thin air and silently skipped both setup.sh and the side effects of
the source-download step.
`dot mod`'s flow has three steps (see src/commands/mod/SetupScreen.tsx):
1. fetch app metadata
2. download source — extract tarball + writeDotJson + ignoreModLogs
+ stripPostinstall + createOptionalGitBaseline
3. run setup.sh (or warn if absent)
Steps 2's side effects and step 3 weren't observed at all. Strengthen
the test to assert on:
- multiple specific files from the upstream paritytech/Rock-Paper-
Scissors fixture (README.md, setup.sh, src/, vite.config.ts) — proves
the GitHub tarball was actually fetched and extracted, not invented.
- dot.json was written with the expected `domain` (relative slug-suffix
path, per writeDotJson()) and `name` fields.
- .gitignore contains the mod-log entries appended by ignoreModLogs().
- .dot-mod-setup.log exists with `[setup]`-prefixed content from the
fixture's setup.sh — proves step 3 ran, not just that the file was
copied across.
Bumped the per-test timeout 180s → 240s to give setup.sh's `npm
install` enough headroom on a cold cache. Same fixture, no new
registration needed.
4fe6328 to
8bb6b97
Compare
Hides the e2e-cli-* fixture domains from the public playground registry grid (visibility=0). Tests still publish and verify exactly as before; SIGNER still sees them under My Apps. Cleans up new test traffic only — the existing 9 visible entries need a one-off setVisibility flip.
- init-cold-smoke: switch runs-on to the parity-default conditional expression added in #120 (was the only ubuntu-latest literal left in the file). Tighten the failed-row grep: the previous " error" / " failed" substring match would also trip on benign output like "0 errors" and apt warnings; now anchor on "✕ "/"✗ " rendered by the dot TUI plus the literal "failed dependency" marker. Add a comment confirming `bun run src/index.ts` is intentional — it matches the entry path used by the e2e/cli/ vitest suite (per CLAUDE.md), and the published-binary install is covered separately by e2e-post-release.yml. - deploy.test.ts ordering check: the previous regex /\n>\s+\w/ also matches build-tool stdout that the CLI pipes through (pnpm prints `> @scope/pkg@1 build /path` for every run-script, vite logs `> built in 234ms`, etc.). Anchor on the dot CLI's exact banner format `\n> (pnpm|npm|yarn|bun|npx) ` from src/commands/build.ts:15 + src/utils/build/detect.ts so unrelated `> ` lines can't satisfy it. - init.test.ts toolchain detection: drop the timing-based 12s window in favour of forcing the rustup install to fail deterministically via HTTPS_PROXY=http://127.0.0.1:1 (curl's connection-refused is ~instant). The test's contract is that init should not falsely conclude rustup is installed when PATH is stripped — that no longer depends on whether a cache makes the real install fast. Cold-start install success is covered separately by the smoke job.
… just patterns
The previous version stripped any PATH segment matching /cargo/, /rustup/,
or /foundry/. That works on standard GitHub runners (rustup at
~/.cargo/bin) but fails on parity-default, where the rustup binary lives
at a path that doesn't contain any of those tokens (e.g. /usr/local/bin).
Result: commandExists("rustup") in init returns true even with PATH
"stripped", and the test asserts ✕ rustup but observes ✓ rustup.
Resolve each toolchain binary's actual location(s) via `which -a` up
front and strip exactly those parent dirs. Keep the original
pattern-strip as a belt-and-braces fallback for related dirs that don't
contain a binary directly (e.g. an empty ~/.cargo/bin on a clean home).
The "detects rustup as missing when not on PATH" test failed in CI in a way that no PATH-strip approach can fix on this runner. Pattern: with PATH stripped of every directory we can resolve, `command -v rustup` inside init STILL returned 0 — but actually executing `rustup ...` returned 127 (command not found). That signature is consistent with parity-default having a rustup wrapper script in /usr/bin (or similar shared system path) that delegates into $HOME/.cargo/bin/rustup. We set HOME to a clean tempHome so the delegation target doesn't exist — the wrapper is found but exec fails. We can't strip /usr/bin without breaking sibling binaries init legitimately needs. The cold-start smoke job in .github/workflows/e2e.yml runs `dot init` in a fresh ubuntu:22.04 with no toolchain pre-installed. That's the correct environment to assert detection + install-then-use against, and it's already wired up (daily cron + workflow_dispatch). Drop the per-PR test and update the file's KNOWN GAP comment to point readers to the smoke job.
| # every tool was installed AND was reachable to the | ||
| # next step in the same process — which is the exact | ||
| # path config that #118 broke. | ||
| bun run src/index.ts init -y 2>&1 | tee /tmp/init.log |
There was a problem hiding this comment.
We shouldn't be running the test itself inside of the ci yml file. This should just minimally setup and call to a typescript e2e test
There was a problem hiding this comment.
The other important note on this line is that we should call dot init through the compiled bun binary (which is the same as what an end user would download & use).
There are some errors that only show up on the compiled bun binary, but don't when when we call bun run ...
| * - Corrupted session file handling | ||
| * - Dev signer (--suri) bypasses session requirement | ||
| * | ||
| * KNOWN GAP — toolchain install paths (rustup, IPFS, foundry, cdm) are NOT |
There was a problem hiding this comment.
Hm i see this is why you have the one test directly in CI. I suppose for now it could be fine to keep that one test scenario as a CI job for simplicity
charlesHetterich
left a comment
There was a problem hiding this comment.
LGTM, we can merge and look at improving individual test cases in the future.
On the init-cold-smoke test in CI: on every PR, we have a bot comment that goes up such as:
curl -fsSL https://raw.githubusercontent.com/paritytech/playground-cli/main/install.sh | VERSION=dev/becca/e2e-fixed-domains bash
This is what the init-cold-smoke test should use instead.
- This test should run after the CI job that updates the branch version of the CLI
- install the CLI inside the test using that single curl command
- run the dot init command
Please update the test to follow this flow then feel free to merge
Replaces the bun + pnpm + `bun run src/index.ts` setup with the same `curl … install.sh | VERSION=dev/<branch> bash` one-liner the dev-release bot posts on each PR, so the smoke exercises the SEA binary end users actually install (catches Bun-compile-only regressions that source-mode masked) and triggers via workflow_run after Dev Release succeeds. Schedule + workflow_dispatch keep working against the latest stable tag.
|
Addressed in a41c046:
|
ensureTemplateRegistered() previously only verified the fixture exists and threw if missing — leaving the mod test red whenever the registry contract was redeployed. The mod test is read-only, so unlike the deploy fixtures (which re-publish on every run via the deploy tests themselves), the mod fixture had no self-healing path. Now: register-if-missing using the same SIGNER fundDeployerIfLow has already topped up. Same domain, same owner, idempotent re-publish; no fresh DotNS entry is ever burned, so the registry stays clean.
Mirrors the test-side --private flag added in #114 to the bootstrap tool. Without this, re-running register-e2e-fixtures.ts re-publishes fixtures as PUBLIC and undoes the visibility flip. CLI tests that hit these domains use direct getMetadataUri queries, which are unaffected by visibility — so no test changes needed.
Closes #112
Summary
Two-pass tighten-up of the E2E test suite, plus a new daily-cron CI job that catches the class of bug a CI runner can't see.
Pass 1 — fake-passing assertions
Several tests passed without exercising the path their name claimed:
/checking availability|deploy/ialways matched because deploy is the command name itself./not found/i,/signer|init|log.?in/i) caught help text and stack traces, not the specific failure under test.Fixed across
deploy,mod,diagnostic,init,sessionto use literal checkpoint strings (Checking availability,No signer available,not found in registry, etc.) and pair output checks with exit-code checks.Pass 2 — adversarial QA review
dot updatenow asserts on the actual outcome line, not just exit 0 (a silent no-op was previously invisible).buildtests now stage fixtures into a tempdir, removing a cross-file order dependency onfrontend-only/dist. Error-path assertions tightened to exact strings.initno-session / corrupted-session tests now fail loudly if init falls into the "Login skipped" branch instead of silently accepting it. Corrupt-session test additionally asserts the file on disk is unchanged.diagnosticverbose-mode tests now assert on the verbose-only markers ([+<seconds>s]and[mem +<seconds>s]) so a regression where the env var is silently ignored is caught.deploy:globalSetupSIGNER funding failure now throws by default. SetE2E_ALLOW_OFFLINE_SETUP=1to opt back into lenient behaviour for local dev on a flaky network.Cold-start init smoke job (new CI job, daily cron only)
The regular
e2ejob runs on a runner with rustup, IPFS, foundry already on PATH — so init's install code path is never exercised. paritytech/playground-app#118 (newly-installed rustup unreachable from the same init process) is exactly the class of bug this gap masks.The new
init-cold-smokejob runsdot init -yinsideubuntu:22.04with nothing pre-installed beyond what's needed to bootstrap bun + pnpm. If init reaches "setup complete" with no failed rows, every tool was both installed AND reachable to the next step in the same process. Daily cron + manual dispatch only because it pulls full toolchains and takes several minutes.Test-only PR + workflow change — no changeset needed.
Follow-ups (separate PRs)
@polkadot-apps/terminalpublishing the mergedcreateTestSessionhelper. Two existing init/session assertions in this PR will be tightened further once that lands.src/utils/toolchain.tsto expose post-install path-update logic as a pure function so unit tests can catch [BUG] Apps uploaded as moddable with private repos trigger failure #118-class regressions in milliseconds, complementing the daily container test.Test plan
npx tsc --noEmitcleane2ejob greenworkflow_dispatchofinit-cold-smokejob green before merge