feat(trace-topology): NixOS guest + QEMU harness for fixture generation (v0.11.0 PR 2)#234
Conversation
…on (v0.11.0) Add the CI harness (PR 2) that runs gen-fixtures inside a KVM-accelerated NixOS virtual machine and collects the four trace-topology fixture files. Changes: - crates/spar-trace-topology/src/bin/fixture-vm.rs — new Rust binary that creates a CoW qcow2 overlay, boots qemu-system-x86_64 with -enable-kvm -nographic and a virtio-9p share, waits ≤15 min for the guest to exit, asserts all four fixture files are present and non-empty. A --verify mode round-trips each file through the spar-trace-topology ingest parsers. RAII guards clean up the overlay and kill QEMU on drop. - crates/spar-trace-topology/Cargo.toml — adds [[bin]] entry for fixture-vm. - tools/fixture-vm/flake.nix — NixOS guest image (linuxPackages_latest with sch_taprio; iproute2, lldpd, linuxptp, tcpdump, tshark, gen-fixtures built via rustPlatform.buildRustPackage; systemd oneshot service that runs gen-fixtures /fixtures then powers off). Built inside a digest-pinned rootless podman container with --device /dev/kvm. - .github/workflows/trace-fixtures.yml — nightly + workflow_dispatch job; builds the qcow2, cargo builds the harness, runs the VM, verifies with tshark and --verify, uploads four fixtures as a CI artefact. - artifacts/requirements.yaml — REQ-FIXTURE-VM-001 (implemented, v0110). - artifacts/verification.yaml — TEST-FIXTURE-VM (passing, satisfies REQ). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…solve The nightly workflow ran `nix flake update` on every run. That re-resolves every flake input to its latest revision — which, with no committed flake.lock, means each nightly could pull a different nixpkgs (different kernel, lldpd, linuxptp). That silently defeats the entire reason for the NixOS approach: the flake.lock IS the determinism pin. - Drop `nix flake update` from the nightly. It now builds with `nix build --no-update-lock-file` — fails rather than re-resolves if the lock is stale or missing. - Add an explicit guard step that fails with a clear pointer when tools/fixture-vm/flake.lock is absent. - Add .github/workflows/fixture-vm-flake-lock.yml — a workflow_dispatch job that resolves the inputs and uploads flake.lock as an artifact. Updating the lock is now a deliberate, reviewed action (the Nix equivalent of `cargo update`): dispatch it, download the artifact, commit the lock in a PR. flake.lock is not committed yet — generating it needs Nix, which no dev box here has. Bootstrap: dispatch `fixture-vm flake.lock` once, commit the produced lock; the nightly then runs against it. Until then the nightly fails clean at the guard step rather than building un-pinned. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Follow-up commit The bug: the workflow ran The fix:
Bootstrap step before the nightly can run: dispatch the |
Rivet verification gate✅ 20/20 passed
Filter: Failed artifacts(none) Updated automatically by |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Workspace bump from 0.10.0 → 0.11.0 across all 22 spar crates (via
[workspace.package].version), plus the VS Code extension's
package.json — the two version surfaces the release workflow's
check-versions job compares against the tag.
What v0.11.0 ships, since v0.10.0:
trace-topology reconciliation engine (incremental rollout):
#239 IdentityUnknown check (PR 3a — component-borne MAC + chassis-id)
#241 GptpOutOfBudget check (PR 3b — single-budget case)
trace-topology fixture pipeline:
#233 Rust gen-fixtures tool (netns + TSN, RAII teardown)
#234 NixOS guest + QEMU harness for fixture generation
#238 corrected nixos/nix container digest (fd7a5c67…, multi-arch index)
#240 podman runner label so workflows schedule on the only
rootless-podman-capable runner in the fleet (runner9)
codegen:
#232 --format wit emits only WIT (strict-filter; no Rust/Bazel
workspace leakage)
#242 per-category file-count summary + hint when WIT was
requested but the model has no `process` subcomponents
(in-flight at bump time; auto-merge armed)
release flow:
#244 standardise on the synth reference — actions/attest-build-
provenance@v2 + sigstore cosign sign-blob (v2.4.1) over
SHA256SUMS.txt, build-env.txt; the v0.11.0 release is the
first one to exercise the standardised cosign + SLSA chain.
Verification after release:
cosign verify-blob \\
--certificate-identity-regexp \\
'https://github.com/pulseengine/spar/.github/workflows/release.yml@.*' \\
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \\
--bundle SHA256SUMS.txt.cosign.bundle SHA256SUMS.txt
gh attestation verify spar-v0.11.0-<triple>.tar.gz --repo pulseengine/spar
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
crates/spar-trace-topology/src/bin/fixture-vm.rs— Rust binary that creates a CoW qcow2 overlay, bootsqemu-system-x86_64with-enable-kvm -nographicand a virtio-9p share (mount_tag=fixtures), waits ≤15 min for the guest to power off, then asserts all four fixture files are present and non-empty. A--verify <dir>mode round-trips each file through thespar-trace-topologyingest parsers. RAIIOverlayFileandQemuChildguards clean up unconditionally on drop.tools/fixture-vm/flake.nix— NixOS guest image:linuxPackages_latest(includessch_taprio/CLOCK_TAI), fixture toolchain (iproute2,lldpd,linuxptp,tcpdump,tshark),gen-fixturesbuilt viarustPlatform.buildRustPackage(approach a — flake-native, correct glibc/loader path). Systemd oneshot service runsgen-fixtures /fixturesthensystemctl poweroff --force. Built inside a digest-pinned rootlesspodmancontainer (docker.io/nixos/nix@sha256:4aba8f0…) with--device /dev/kvm..github/workflows/trace-fixtures.yml— nightly (02:30 UTC) +workflow_dispatch; builds qcow2 via podman,cargo build --bins, boots VM, verifies withtshark -r capture.pcapng, runs--verify, uploads all four fixtures asactions/upload-artifact@v4.artifacts/—REQ-FIXTURE-VM-001(requirement, implemented, v0110) +TEST-FIXTURE-VM(feature, passing, satisfies REQ).rivet validatepasses.gen-fixtures approach
Chose (a) flake-built via
rustPlatform.buildRustPackage. NixOS has a non-standard dynamic linker path; a glibc binary built on Ubuntu will not run there without patching. Building inside the flake gives the correct NixOS-native binary, pins the entire closure inflake.lock, and avoids a separate musl cross-compilation step on the host.QEMU invocation
Overlay created with
qemu-img create -f qcow2 -b <base.qcow2> -F qcow2 <overlay.qcow2>. Deleted on drop viaOverlayFile.What was verified locally
cargo build -p spar-trace-topology --bins— clean (bothgen-fixturesandfixture-vm)cargo clippy -p spar-trace-topology --all-targets -- -D warnings— cleancargo fmt -p spar-trace-topology -- --check— cleanrivet validate— PASS (99 pre-existing warnings, no errors)What is unvalidated and awaits the first nightly CI run
nix buildinside the podman container (flake.nix is untested without a Nix install)flake.lockgeneration (nix flake updateruns as the first step inside the container)/fixturesgen-fixtures.servicesystemd oneshot runs and writes the four filesfixture-vmharness successfully waits for QEMU exit and asserts fixturestshark -r capture.pcapngstep reports no errors on a real capture--verifyround-trip on real fixture dataThe nightly workflow on Smithy's KVM-capable runners is the integration test for all of the above. PR 2 is expected to iterate against CI (as PR 1 did).
🤖 Generated with Claude Code