Skip to content

test(e2e): tighten fake-passing assertions; add cold-start init smoke#114

Merged
ottovlotto merged 20 commits intomainfrom
becca/e2e-fixed-domains
May 5, 2026
Merged

test(e2e): tighten fake-passing assertions; add cold-start init smoke#114
ottovlotto merged 20 commits intomainfrom
becca/e2e-fixed-domains

Conversation

@ottovlotto
Copy link
Copy Markdown
Collaborator

@ottovlotto ottovlotto commented May 4, 2026

Closes #112

Summary

Two-pass tighten-up of the E2E test suite, plus a new daily-cron CI job that catches the class of bug a CI runner can't see.

Pass 1 — fake-passing assertions

Several tests passed without exercising the path their name claimed:

  • /checking availability|deploy/i always matched because deploy is the command name itself.
  • exitCode-only assertions on failure paths — any unrelated crash satisfied them.
  • Generic word matches (/not found/i, /signer|init|log.?in/i) caught help text and stack traces, not the specific failure under test.

Fixed across deploy, mod, diagnostic, init, session to use literal checkpoint strings (Checking availability, No signer available, not found in registry, etc.) and pair output checks with exit-code checks.

Pass 2 — adversarial QA review

  • dot update now asserts on the actual outcome line, not just exit 0 (a silent no-op was previously invisible).
  • build tests now stage fixtures into a tempdir, removing a cross-file order dependency on frontend-only/dist. Error-path assertions tightened to exact strings.
  • init no-session / corrupted-session tests now fail loudly if init falls into the "Login skipped" branch instead of silently accepting it. Corrupt-session test additionally asserts the file on disk is unchanged.
  • diagnostic verbose-mode tests now assert on the verbose-only markers ([+<seconds>s] and [mem +<seconds>s]) so a regression where the env var is silently ignored is caught.
  • deploy:
    • end-to-end test queries the registry independently and verifies the on-chain metadata CID matches what the CLI printed.
    • re-deploy test mutates build output between deploys and asserts the second deploy produced a different CID — a silent no-op re-deploy would otherwise still print "Deploy complete" and pass.
    • "availability runs before build/upload" now verifies the temporal ordering claimed in its name.
    • multi-contract test renamed (the old name claimed plurality detection, but headless output doesn't surface per-contract names).
  • globalSetup SIGNER funding failure now throws by default. Set E2E_ALLOW_OFFLINE_SETUP=1 to opt back into lenient behaviour for local dev on a flaky network.

Cold-start init smoke job (new CI job, daily cron only)

The regular e2e job runs on a runner with rustup, IPFS, foundry already on PATH — so init's install code path is never exercised. paritytech/playground-app#118 (newly-installed rustup unreachable from the same init process) is exactly the class of bug this gap masks.

The new init-cold-smoke job runs dot init -y inside ubuntu:22.04 with nothing pre-installed beyond what's needed to bootstrap bun + pnpm. If init reaches "setup complete" with no failed rows, every tool was both installed AND reachable to the next step in the same process. Daily cron + manual dispatch only because it pulls full toolchains and takes several minutes.

Test-only PR + workflow change — no changeset needed.

Follow-ups (separate PRs)

Test plan

  • npx tsc --noEmit clean
  • Per-PR e2e job green
  • Manual workflow_dispatch of init-cold-smoke job green before merge

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Dev build ready — try this branch:

curl -fsSL https://raw.githubusercontent.com/paritytech/playground-cli/main/install.sh | VERSION=dev/becca/e2e-fixed-domains bash

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

E2E Test Pass · ✅ PASS

Tag: e2e-ci-pr · Branch: becca/e2e-fixed-domains · Commit: 8a80ee5 · Run logs

Cell Result Time
pr-preflight ✅ PASS 2m29s
pr-init-session ✅ PASS 1m54s
pr-deploy-frontend ✅ PASS 8m10s
pr-install ✅ PASS 0m48s
pr-deploy-foundry ✅ PASS 1m56s
pr-mod ✅ PASS 1m33s
${{ matrix.cell }} ⏭️ SKIP 0m-1s
${{ matrix.cell }} ⏭️ SKIP 0m-1s

Sentry traces: view spans for this run

EnderOfWorlds007 and others added 12 commits May 5, 2026 14:26
* fix(mod,deploy): reject private repos with a clear error message

- dot deploy --modable now calls assertPublicGitHubRepo during preflight
  and throws ModablePreflightError for private or missing GitHub repos,
  preventing the obscure downstream failure
- resolveDefaultBranch in dot mod detects 401/404 API responses and
  surfaces 'private or does not exist' instead of the misleading
  'pin one in metadata.branch' hint
- Covers both paths with unit tests

* chore: fix biome formatting
Several tests were passing without actually exercising the path their
name claimed. The common patterns:

- Loose regexes like `/checking availability|deploy/i` always matched
  because "deploy" is the command name itself, so the test passed even
  when nothing of substance ran.
- exitCode-only assertions on failure paths — any unrelated crash that
  produced a non-zero exit satisfied the check.
- Generic word matches (`/not found/i`, `/signer|init|log.?in/i`) that
  could match help text, stack traces, or transient gateway errors
  rather than the specific failure mode under test.

Fixes per file:

- deploy.test.ts: replaced `/checking availability|deploy/i` with the
  literal "Checking availability" checkpoint (only printed after
  preflight succeeds). Tightened the mainnet, --contracts-no-project,
  and "domain taken by another account" tests with exit-code
  assertions and exact error fragments.

- mod.test.ts: registry-miss now asserts the exact "not found in
  registry" wording plus the domain; signer-suggestion asserts the
  exact "No signer available" / "dot init" text rather than matching
  any of those words anywhere.

- diagnostic.test.ts: DOT_DEPLOY_VERBOSE test switched off ALICE
  (unfunded → would abort before producing verbose output) onto
  SIGNER + --no-build so it actually reaches the availability check.

- session.test.ts: "build does not touch sessions" now verifies both
  builds returned exit 0 (a crashed build can't write any file, so the
  previous version was tautological); "logout with no session" now
  asserts exitCode === 0.

- init.test.ts: dropped `-y` from the session-detection tests — with
  -y, init skips the connect()/login block entirely, so the tests
  were really just verifying that toolchain installation in a fresh
  tempHome takes longer than 15s. Now they assert on the actual
  "Scan with the Polkadot mobile app" / "Login skipped" output.
  Added a new `dot init — toolchain detection` test that strips
  rustup/cargo/foundry from PATH and verifies init does NOT report
  ✓ rustup — catches detection regressions even though CI runners
  have toolchains pre-installed.
Second pass on the same fake-passing class of bug, this time catching
weaknesses the previous tighten-up missed:

- install.test.ts: `dot update` now asserts on the actual outcome line
  ("already on latest" / "Updated dot to") instead of just exitCode 0.
  A silent no-op update was previously indistinguishable from a real
  one.

- build.test.ts: each test stages its fixture into a tempdir and runs
  the build there, instead of mutating the source-controlled fixture's
  dist/. This removes a hidden cross-file order dependency where build
  tests would delete the dist that later deploy --no-build tests
  needed. Error-path assertions tightened to the exact strings the
  fixture/runner emit ("Compile error: unexpected token", "No build
  strategy detected") rather than /error|fail|detect/.

- session.test.ts: tightened "no signer" / "logout" assertions to the
  exact "No signer available" / "No account is signed in" strings.

- init.test.ts: the no-session and corrupted-session tests now fail
  loudly when init falls into the "Login skipped" branch (login
  service unreachable from the runner) instead of silently accepting
  it as a pass — which was masking the QR path's testability gap.
  Corrupted-session test additionally asserts the corrupt file is
  unchanged on disk, catching a hypothetical regression where init
  "fixes" parse failures by deleting user data.

- diagnostic.test.ts: both verbose-mode tests now assert on a
  verbose-only marker that proves the env var actually engaged its
  code path (`[+<seconds>s]` for DOT_DEPLOY_VERBOSE,
  `[mem +<seconds>s]` for DOT_MEMORY_TRACE). Previously the tests
  could pass with the env var silently ignored.

- deploy.test.ts:
  - "frontend-only deploy completes end-to-end" now queries the
    registry independently after deploy and verifies the on-chain
    metadata CID matches what the CLI claimed it published.
  - "re-deploy same domain" now mutates the build output between the
    two deploys and asserts the second deploy produced a *different*
    metadata CID. A silently-no-op re-deploy would otherwise still
    print "Deploy complete" and pass.
  - "domain availability check runs before build/upload" now verifies
    the temporal ordering claimed in its name — availability banner
    must precede any build-runner header or storage-phase marker.
  - Multi-contract test renamed; it cannot prove plurality from
    headless output (logHeadlessEvent doesn't surface contract names),
    so the old name was misleading.

- e2e/cli/setup/global.ts: SIGNER funding failure now throws by default
  with a clear message, instead of swallowing into a console.warn that
  let downstream deploy tests fail with cryptic "Invalid Payment"
  errors. Set E2E_ALLOW_OFFLINE_SETUP=1 to opt back into the lenient
  behaviour for local dev on a flaky network.
- Thread signer.address as defaultOrigin through resolveLiveContractAddresses
  so the CDM meta-registry lookup that resolves the live playground registry
  address uses the logged-in account. Previously fell through to the
  Alice dev fallback in @polkadot-apps/contracts and printed
  `[contracts] No origin configured — using dev fallback (Alice) for
  query dry-run` even when the user was signed in via dot init.

- Lazy-probe the picked app's repository via assertPublicGitHubRepo()
  between picker dismount and SetupScreen mount. Catches the case where
  a publisher flips a repo to private after deploying as --modable; user
  gets a clean two-line console message instead of a mid-step
  StepRunner failure. The direct-domain path (`dot mod some-domain.dot`)
  still falls through to resolveDefaultBranch's existing 404/401 message.

- Annotate the cli.mod.repo-check span with `cli.mod.repo` so Sentry
  dashboards can group failures by repository.
The regular e2e job runs on a GitHub Actions runner that already has
rustup, Bun, IPFS, and friends on PATH. As a result, init's
toolchain-install code path is never exercised in CI — regressions
that only surface on a fresh user machine (e.g., paritytech/
playground-app#118, where newly-installed rustup wasn't reachable
from the same init process) pass every PR silently.

Add a sibling job that runs `dot init -y` inside ubuntu:22.04 with
nothing pre-installed beyond what's strictly needed to bootstrap bun
and pnpm. If init reaches "setup complete" with no failed dependency
rows, every tool was both installed AND reachable to the next step
in the same process — which is the exact contract #118 broke.

Excluded from per-PR runs because it pulls full toolchains over the
network and takes several minutes. Daily cron + manual dispatch only.
fix(mod): silence false origin warning and verify repo before download
Bring in 19 commits of e2e infrastructure work that landed since this
branch was forked, including:

- PR #68: dynamic registry contract address resolution at runtime
- PR #78: e2e fixture fix to use the same live address (otherwise our
  registry-state assertions would query the snapshot contract while
  deploys write to the live one)
- PR #113: rustup post-install path patching — the actual fix for the
  paritytech/playground-app#118 bug our cold-start smoke job was added
  to catch
- Phase 5/6/7/8 e2e workflow restructuring (matrix-based cells,
  surface-failure helpers, post-release smoke, sticky PR comments)

Conflict resolution notes:

- e2e/cli/deploy.test.ts: kept main's runContractDeployTest helper +
  the new full-pipeline contract describes (foundry/hardhat/multi).
  Re-applied my QA tightening to the preflight tests (mainnet exact
  text + exit code, contracts-detected use the 'Checking availability'
  checkpoint, --contracts-no-project asserts exit code, availability-
  before-build verifies temporal ordering). Added registry-state
  verification to the frontend-only and re-deploy tests via the new
  getApp() fixture. DROPPED my 'already registered' tightening on the
  domain-taken test — in dev mode the availability check defers
  ownership to bulletin-deploy's preflight, so the rejection comes
  from the registry contract revert (Revive.ContractReverted), not
  the friendly availability message. Main's loose regex catches both
  paths correctly.

- e2e/cli/mod.test.ts: kept main's split into 'dot mod — clone' and
  'dot mod — registry miss' describes. Re-applied my exact-text
  tightening to both tests.

- .github/workflows/e2e.yml: kept main's matrix-cell structure
  entirely, appended my init-cold-smoke job at the end (still daily
  cron + workflow_dispatch only).
… test

- Bump init session-detection timeouts 15s → 25s. The connect()/QR
  flow can take >15s when the statement-store endpoint is slow,
  producing flaky 'expected QR prompt' failures even though the
  CLI behaviour is correct.
- Pass `-y` to the toolchain-detection test (was reaching SIGTERM
  before the TUI rendered) so the dependency table renders within
  ~1s. The 12s execa timeout still terminates well before any real
  rustup install completes — exactly the window where we should see
  '· rustup' / '⠋ rustup', not '✓ rustup'.

Verified locally: full offline-eligible subset (init, install, build,
session) passes with E2E_ALLOW_OFFLINE_SETUP=1.
The previous version mutated `dist/index.html` between the two deploys
and asserted the second deploy produced a different metadata CID. That
assumption was wrong: `buildMetadata()` in src/utils/deploy/playground.ts
only puts `{repository, readme}` into the metadata JSON — the deployed
content's CID is NOT included. Mutating dist therefore had no effect on
the metadata CID, both deploys produced the same value, and the test
failed in CI's pr-deploy-frontend cell.

Worse, the regression I was guarding against ("second deploy silently
no-ops") isn't actually a regression: a same-CID re-publish is correct
behaviour — the registry already has what the user wants.

Keep the meaningful checks: exit 0, "Deploy complete", and an
independent registry query that verifies the on-chain entry contains
the CID the CLI claimed it published. That alone rules out the only
real failure mode (CLI prints success but never sent the extrinsic).
The previous version of this test verified `exitCode === 0` and that a
slug-prefix directory existed with a `package.json`. That assertion
would pass even if `dot mod` invented an arbitrary `package.json` from
thin air and silently skipped both setup.sh and the side effects of
the source-download step.

`dot mod`'s flow has three steps (see src/commands/mod/SetupScreen.tsx):

  1. fetch app metadata
  2. download source — extract tarball + writeDotJson + ignoreModLogs
     + stripPostinstall + createOptionalGitBaseline
  3. run setup.sh (or warn if absent)

Steps 2's side effects and step 3 weren't observed at all. Strengthen
the test to assert on:

- multiple specific files from the upstream paritytech/Rock-Paper-
  Scissors fixture (README.md, setup.sh, src/, vite.config.ts) — proves
  the GitHub tarball was actually fetched and extracted, not invented.
- dot.json was written with the expected `domain` (relative slug-suffix
  path, per writeDotJson()) and `name` fields.
- .gitignore contains the mod-log entries appended by ignoreModLogs().
- .dot-mod-setup.log exists with `[setup]`-prefixed content from the
  fixture's setup.sh — proves step 3 ran, not just that the file was
  copied across.

Bumped the per-test timeout 180s → 240s to give setup.sh's `npm
install` enough headroom on a cold cache. Same fixture, no new
registration needed.
@ottovlotto ottovlotto force-pushed the becca/e2e-fixed-domains branch from 4fe6328 to 8bb6b97 Compare May 5, 2026 12:35
ottovlotto added 6 commits May 5, 2026 14:37
Hides the e2e-cli-* fixture domains from the public playground registry
grid (visibility=0). Tests still publish and verify exactly as before;
SIGNER still sees them under My Apps. Cleans up new test traffic only —
the existing 9 visible entries need a one-off setVisibility flip.
- init-cold-smoke: switch runs-on to the parity-default conditional
  expression added in #120 (was the only ubuntu-latest literal left in
  the file). Tighten the failed-row grep: the previous " error" /
  " failed" substring match would also trip on benign output like
  "0 errors" and apt warnings; now anchor on "✕ "/"✗ " rendered by
  the dot TUI plus the literal "failed dependency" marker. Add a
  comment confirming `bun run src/index.ts` is intentional — it
  matches the entry path used by the e2e/cli/ vitest suite (per
  CLAUDE.md), and the published-binary install is covered separately
  by e2e-post-release.yml.

- deploy.test.ts ordering check: the previous regex /\n>\s+\w/ also
  matches build-tool stdout that the CLI pipes through (pnpm prints
  `> @scope/pkg@1 build /path` for every run-script, vite logs
  `> built in 234ms`, etc.). Anchor on the dot CLI's exact banner
  format `\n> (pnpm|npm|yarn|bun|npx) ` from src/commands/build.ts:15
  + src/utils/build/detect.ts so unrelated `> ` lines can't satisfy
  it.

- init.test.ts toolchain detection: drop the timing-based 12s window
  in favour of forcing the rustup install to fail deterministically
  via HTTPS_PROXY=http://127.0.0.1:1 (curl's connection-refused is
  ~instant). The test's contract is that init should not falsely
  conclude rustup is installed when PATH is stripped — that no
  longer depends on whether a cache makes the real install fast.
  Cold-start install success is covered separately by the smoke job.
… just patterns

The previous version stripped any PATH segment matching /cargo/, /rustup/,
or /foundry/. That works on standard GitHub runners (rustup at
~/.cargo/bin) but fails on parity-default, where the rustup binary lives
at a path that doesn't contain any of those tokens (e.g. /usr/local/bin).
Result: commandExists("rustup") in init returns true even with PATH
"stripped", and the test asserts ✕ rustup but observes ✓ rustup.

Resolve each toolchain binary's actual location(s) via `which -a` up
front and strip exactly those parent dirs. Keep the original
pattern-strip as a belt-and-braces fallback for related dirs that don't
contain a binary directly (e.g. an empty ~/.cargo/bin on a clean home).
The "detects rustup as missing when not on PATH" test failed in CI in a
way that no PATH-strip approach can fix on this runner. Pattern: with
PATH stripped of every directory we can resolve, `command -v rustup`
inside init STILL returned 0 — but actually executing `rustup ...`
returned 127 (command not found). That signature is consistent with
parity-default having a rustup wrapper script in /usr/bin (or similar
shared system path) that delegates into $HOME/.cargo/bin/rustup. We
set HOME to a clean tempHome so the delegation target doesn't exist —
the wrapper is found but exec fails. We can't strip /usr/bin without
breaking sibling binaries init legitimately needs.

The cold-start smoke job in .github/workflows/e2e.yml runs `dot init`
in a fresh ubuntu:22.04 with no toolchain pre-installed. That's the
correct environment to assert detection + install-then-use against, and
it's already wired up (daily cron + workflow_dispatch). Drop the
per-PR test and update the file's KNOWN GAP comment to point readers
to the smoke job.
Comment thread .github/workflows/e2e.yml Outdated
# every tool was installed AND was reachable to the
# next step in the same process — which is the exact
# path config that #118 broke.
bun run src/index.ts init -y 2>&1 | tee /tmp/init.log
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be running the test itself inside of the ci yml file. This should just minimally setup and call to a typescript e2e test

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other important note on this line is that we should call dot init through the compiled bun binary (which is the same as what an end user would download & use).

There are some errors that only show up on the compiled bun binary, but don't when when we call bun run ...

Comment thread e2e/cli/init.test.ts
* - Corrupted session file handling
* - Dev signer (--suri) bypasses session requirement
*
* KNOWN GAP — toolchain install paths (rustup, IPFS, foundry, cdm) are NOT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm i see this is why you have the one test directly in CI. I suppose for now it could be fine to keep that one test scenario as a CI job for simplicity

Copy link
Copy Markdown
Collaborator

@charlesHetterich charlesHetterich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can merge and look at improving individual test cases in the future.

On the init-cold-smoke test in CI: on every PR, we have a bot comment that goes up such as:

curl -fsSL https://raw.githubusercontent.com/paritytech/playground-cli/main/install.sh | VERSION=dev/becca/e2e-fixed-domains bash

This is what the init-cold-smoke test should use instead.

  1. This test should run after the CI job that updates the branch version of the CLI
  2. install the CLI inside the test using that single curl command
  3. run the dot init command

Please update the test to follow this flow then feel free to merge

Replaces the bun + pnpm + `bun run src/index.ts` setup with the same
`curl … install.sh | VERSION=dev/<branch> bash` one-liner the dev-release
bot posts on each PR, so the smoke exercises the SEA binary end users
actually install (catches Bun-compile-only regressions that source-mode
masked) and triggers via workflow_run after Dev Release succeeds.
Schedule + workflow_dispatch keep working against the latest stable tag.
@ottovlotto
Copy link
Copy Markdown
Collaborator Author

Addressed in a41c046:

  • cold-smoke now installs via the curl … install.sh | VERSION=dev/<branch> bash one-liner, so it exercises the SEA binary an end user actually downloads (also covers the compiled-binary-only-bugs point from the earlier subthread)
  • new workflow_run trigger on Dev Release fires the smoke after the per-PR dev/<branch> tag is published
  • schedule + workflow_dispatch on main keep working; with no VERSION set, install.sh resolves the latest stable tag
  • dropped the workspace mount + checkout step — install.sh fetches the binary on its own, no source needed in the container

ensureTemplateRegistered() previously only verified the fixture exists
and threw if missing — leaving the mod test red whenever the registry
contract was redeployed. The mod test is read-only, so unlike the
deploy fixtures (which re-publish on every run via the deploy tests
themselves), the mod fixture had no self-healing path.

Now: register-if-missing using the same SIGNER fundDeployerIfLow has
already topped up. Same domain, same owner, idempotent re-publish; no
fresh DotNS entry is ever burned, so the registry stays clean.
@ottovlotto ottovlotto merged commit b147280 into main May 5, 2026
18 checks passed
@ottovlotto ottovlotto deleted the becca/e2e-fixed-domains branch May 5, 2026 18:08
ottovlotto added a commit that referenced this pull request May 6, 2026
Mirrors the test-side --private flag added in #114 to the bootstrap tool. Without this, re-running register-e2e-fixtures.ts re-publishes fixtures as PUBLIC and undoes the visibility flip.

CLI tests that hit these domains use direct getMetadataUri queries, which are unaffected by visibility — so no test changes needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Once filter / delete / unpublish implemented: Update e2e tests to hide/delete/filter out test apps (so they are not visible in the app)

4 participants