refactor(test): migrate destructive e2e (L4/L5) to local Tart VM#80
Merged
Conversation
scripts/vm/run.sh provisions an ephemeral Tart VM, rsyncs the working tree in, runs an in-VM make target over SSH, and tears down on EXIT. New Makefile targets test-vm / test-vm-run / test-vm-inner / test-vm-inner-run plumb this through. Implementation notes vs. the original plan: - The macos-tahoe-base image ships mise but not Go; run.sh installs Go via `mise install go@latest` on each fresh clone. This adds ~10s on first run but keeps the base image unmodified. - An ephemeral ED25519 key is generated per run and injected via `tart exec` before SSH. This avoids the 1Password SSH agent (or any local agent) exhausting MaxAuthTries before a real key is tried. lib.sh's ssh_exec now takes the key path as the second argument. - `OPENBOOT_VM_KEEP=1` debug message updated to print the ephemeral SSH key path for attaching to the running VM. Old destructive targets (test-vm-quick/release/full, test-destructive, test-smoke) stay for now — removed in a follow-up after build tags collapse. See docs/superpowers/specs/2026-05-17-l4-l5-tart-local-design.md.
After Task 1 introduced the Tart VM driver, every destructive test runs inside an ephemeral VM — there is no longer a meaningful 'destructive vs vm' distinction. Merge into a single e2e,vm tag. The e2e,destructive build tag is retired and unused after this commit.
scripts/vm/run.sh sets OPENBOOT_IN_VM=1 over SSH when it invokes the in-VM make target. requireEphemeralHost now accepts that as a more precise signal than CI=true (which leaks in from any GHA runner, not just throwaway ones). CI=true and OPENBOOT_E2E_DESTRUCTIVE=1 stay as fallbacks for ad-hoc/legacy use. Comment block at top of file rewritten to drop the obsolete 'no Tart VM, no SSH' description.
Removed from test.yml:
- macos-e2e job (L4)
- destructive job (L5)
- the run_destructive workflow_dispatch input
Removed from release.yml:
- the 'Destructive tests' step in gate-tests
- the smoke-test job and its dependent edge from release.needs
Removed entirely:
- .github/workflows/smoke-test.yml (redundant with release.yml's
smoke-test, which also goes away here)
Destructive e2e now runs only locally via scripts/vm/run.sh (added in
the previous commits). No CI gate replaces this — running 'make test-vm'
before tagging is a documented expectation, not enforced.
See docs/superpowers/specs/2026-05-17-l4-l5-tart-local-design.md.
Deleted: - test-destructive - test-smoke / test-smoke-prebuilt - test-vm-quick / test-vm-release / test-vm-full - the temporary test-vm-OLD-DELETE-ME alias The new test-vm / test-vm-run / test-vm-inner / test-vm-inner-run targets (added two commits back) are now the only entrypoints. Header comment block rewritten.
Patch (fix:-only) bumps continue to auto-tag and dispatch release.yml. Minor bumps (feat: present) now open a 'release-ready' labeled issue with a checklist instead of auto-tagging — the human is expected to run make test-vm locally and then tag manually. Skipping test-vm is allowed; the issue is a nudge, not a hard gate. Rationale: feat: changes carry more risk and benefit from the local Tart VM e2e suite added in earlier commits. fix: patches keep going through the existing fast lane. Adds 'issues: write' to the workflow permissions. Header comment block rewritten.
CONTRIBUTING.md: L4 and L5 rows collapse to a single L4 VM e2e row
('runs inside Tart VM, local only, no CI gate'). New 'VM E2E setup'
section walks through tart pull / tart clone. Rules of thumb updated.
CLAUDE.md: Commands block drops test-vm-release / test-destructive,
adds test-vm with a 'requires Apple Silicon + Tart' note.
docs/HARNESS.md: table rows for L4/L5 merge; auto-release row
reflects patch-vs-feat split; new 'intentionally NOT' entry explains
why there is no CI gate for VM e2e.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
macos-latest(a shared, dirty runner) into a local Tart VM on Apple Silicon. L4 + L5 collapse into a single L4 VM e2e tier (make test-vm).make test-vmbefore tagging is convention, not enforced.auto-release.ymlkeeps auto-taggingfix:-only patch bumps but now opens arelease-readyissue forfeat:thresholds to nudge a human to run the local VM suite first.MacHostkeeps its API. The migration is driver-layer plumbing:scripts/vm/run.shclones an ephemeral Tart VM from a local base image, rsyncs the working tree, and SSHs in to runmake test-vm-inner.What changed
scripts/vm/{run.sh, lib.sh, README.md}— Tart driver + one-time setup docs (base image:ghcr.io/cirruslabs/macos-tahoe-base:latest).test-vm,test-vm-run TEST=..., plus internaltest-vm-inner/test-vm-inner-runinvoked over SSH.test-vm-quick,test-vm-release,test-vm-full,test-destructive,test-smoke,test-smoke-prebuilt.e2e,destructiveretired; everything ise2e,vmnow.testutil/machost.go:requireEphemeralHostgains anOPENBOOT_IN_VMbranch (set byrun.sh); legacyCI=true/OPENBOOT_E2E_DESTRUCTIVE=1stay as fallbacks.macos-e2e+destructivejobs fromtest.yml;Destructive testsstep +smoke-testjob fromrelease.yml; entiresmoke-test.yml. The remaininggate-testsjob runs Vet + L1 only.auto-release.yml: patch fast lane auto-tags as before; feat threshold opens an issue labeledrelease-readywith amake test-vmchecklist instead of tagging.CONTRIBUTING.md(Test Layering table + new VM E2E setup section),CLAUDE.md(Commands block),docs/HARNESS.md(table + "intentionally NOT in the harness" entry),AGENTS.md(stale tag list), one ship-pr / bootstrap-feature SKILL.md fix.Test plan
make test-unit(L1) — greengo vet ./...— cleango test ./internal/archtest/...— passes (no archtest baseline drift)make test-vm-run TEST=TestVM_Infraend-to-end on local Apple Silicon — VM clones, boots, test passes, VM destroyed (verified during Task 1 implementation)OPENBOOT_VM_KEEP=1debug knob: VM stays, attach viatart ssh <vm>(uses Tart'sadmin/admin)openboot-ephemeral-99999is cleaned by nextrun.shstartupgo test -tags="e2e,vm" -run TestVM_Infra ./test/e2e/...correctly SKIPs with the new message (gate works)test.ymlandrelease.yml— no danglingneedsedgesNotes
tart exec(the base image has no pre-authorized keys; a local 1Password agent floods MaxAuthTries before any password attempt), and Go is installed viamise install go@lateston each cold boot (the base image has mise but not Go). Both are documented in the Task 1 commit message.