Skip to content

feat(trace-topology): NixOS guest + QEMU harness for fixture generation (v0.11.0 PR 2)#234

Merged
avrabe merged 2 commits into
mainfrom
feat/v0.11.0-fixture-vm
May 22, 2026
Merged

feat(trace-topology): NixOS guest + QEMU harness for fixture generation (v0.11.0 PR 2)#234
avrabe merged 2 commits into
mainfrom
feat/v0.11.0-fixture-vm

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 22, 2026

Summary

  • crates/spar-trace-topology/src/bin/fixture-vm.rs — Rust binary that creates a CoW qcow2 overlay, boots qemu-system-x86_64 with -enable-kvm -nographic and a virtio-9p share (mount_tag=fixtures), waits ≤15 min for the guest to power off, then asserts all four fixture files are present and non-empty. A --verify <dir> mode round-trips each file through the spar-trace-topology ingest parsers. RAII OverlayFile and QemuChild guards clean up unconditionally on drop.
  • tools/fixture-vm/flake.nix — NixOS guest image: linuxPackages_latest (includes sch_taprio/CLOCK_TAI), fixture toolchain (iproute2, lldpd, linuxptp, tcpdump, tshark), gen-fixtures built via rustPlatform.buildRustPackage (approach a — flake-native, correct glibc/loader path). Systemd oneshot service runs gen-fixtures /fixtures then systemctl poweroff --force. Built inside a digest-pinned rootless podman container (docker.io/nixos/nix@sha256:4aba8f0…) with --device /dev/kvm.
  • .github/workflows/trace-fixtures.yml — nightly (02:30 UTC) + workflow_dispatch; builds qcow2 via podman, cargo build --bins, boots VM, verifies with tshark -r capture.pcapng, runs --verify, uploads all four fixtures as actions/upload-artifact@v4.
  • artifacts/REQ-FIXTURE-VM-001 (requirement, implemented, v0110) + TEST-FIXTURE-VM (feature, passing, satisfies REQ). rivet validate passes.

gen-fixtures approach

Chose (a) flake-built via rustPlatform.buildRustPackage. NixOS has a non-standard dynamic linker path; a glibc binary built on Ubuntu will not run there without patching. Building inside the flake gives the correct NixOS-native binary, pins the entire closure in flake.lock, and avoids a separate musl cross-compilation step on the host.

QEMU invocation

qemu-system-x86_64
  -enable-kvm
  -nographic
  -smp 2 -m 1G
  -drive file=<overlay.qcow2>,format=qcow2,if=virtio
  -virtfs local,path=<fixture-output-dir>,mount_tag=fixtures,security_model=none
  -serial mon:stdio

Overlay created with qemu-img create -f qcow2 -b <base.qcow2> -F qcow2 <overlay.qcow2>. Deleted on drop via OverlayFile.

What was verified locally

  • cargo build -p spar-trace-topology --binsclean (both gen-fixtures and fixture-vm)
  • cargo clippy -p spar-trace-topology --all-targets -- -D warningsclean
  • cargo fmt -p spar-trace-topology -- --checkclean
  • rivet validatePASS (99 pre-existing warnings, no errors)

What is unvalidated and awaits the first nightly CI run

  • The nix build inside the podman container (flake.nix is untested without a Nix install)
  • The flake.lock generation (nix flake update runs as the first step inside the container)
  • The NixOS qcow2 boots and mounts the virtio-9p share at /fixtures
  • The gen-fixtures.service systemd oneshot runs and writes the four files
  • The guest powers off after service completion
  • The fixture-vm harness successfully waits for QEMU exit and asserts fixtures
  • The tshark -r capture.pcapng step reports no errors on a real capture
  • End-to-end --verify round-trip on real fixture data

The nightly workflow on Smithy's KVM-capable runners is the integration test for all of the above. PR 2 is expected to iterate against CI (as PR 1 did).

🤖 Generated with Claude Code

…on (v0.11.0)

Add the CI harness (PR 2) that runs gen-fixtures inside a KVM-accelerated
NixOS virtual machine and collects the four trace-topology fixture files.

Changes:
- crates/spar-trace-topology/src/bin/fixture-vm.rs — new Rust binary that
  creates a CoW qcow2 overlay, boots qemu-system-x86_64 with -enable-kvm
  -nographic and a virtio-9p share, waits ≤15 min for the guest to exit,
  asserts all four fixture files are present and non-empty.  A --verify mode
  round-trips each file through the spar-trace-topology ingest parsers.
  RAII guards clean up the overlay and kill QEMU on drop.
- crates/spar-trace-topology/Cargo.toml — adds [[bin]] entry for fixture-vm.
- tools/fixture-vm/flake.nix — NixOS guest image (linuxPackages_latest with
  sch_taprio; iproute2, lldpd, linuxptp, tcpdump, tshark, gen-fixtures built
  via rustPlatform.buildRustPackage; systemd oneshot service that runs
  gen-fixtures /fixtures then powers off).  Built inside a digest-pinned
  rootless podman container with --device /dev/kvm.
- .github/workflows/trace-fixtures.yml — nightly + workflow_dispatch job;
  builds the qcow2, cargo builds the harness, runs the VM, verifies with
  tshark and --verify, uploads four fixtures as a CI artefact.
- artifacts/requirements.yaml — REQ-FIXTURE-VM-001 (implemented, v0110).
- artifacts/verification.yaml — TEST-FIXTURE-VM (passing, satisfies REQ).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@avrabe avrabe enabled auto-merge (squash) May 22, 2026 04:17
…solve

The nightly workflow ran `nix flake update` on every run. That re-resolves
every flake input to its latest revision — which, with no committed
flake.lock, means each nightly could pull a different nixpkgs (different
kernel, lldpd, linuxptp). That silently defeats the entire reason for the
NixOS approach: the flake.lock IS the determinism pin.

- Drop `nix flake update` from the nightly. It now builds with
  `nix build --no-update-lock-file` — fails rather than re-resolves if the
  lock is stale or missing.
- Add an explicit guard step that fails with a clear pointer when
  tools/fixture-vm/flake.lock is absent.
- Add .github/workflows/fixture-vm-flake-lock.yml — a workflow_dispatch
  job that resolves the inputs and uploads flake.lock as an artifact.
  Updating the lock is now a deliberate, reviewed action (the Nix
  equivalent of `cargo update`): dispatch it, download the artifact,
  commit the lock in a PR.

flake.lock is not committed yet — generating it needs Nix, which no dev
box here has. Bootstrap: dispatch `fixture-vm flake.lock` once, commit the
produced lock; the nightly then runs against it. Until then the nightly
fails clean at the guard step rather than building un-pinned.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 22, 2026

Follow-up commit 8684a00 — fixes a determinism bug in the nightly workflow before it lands.

The bug: the workflow ran nix flake update every run. With no committed flake.lock, that re-resolves every input to latest each night — different nixpkgs, kernel, lldpd, linuxptp per run. It silently defeats the whole point of the NixOS approach: the lock is the pin.

The fix:

  • Nightly drops nix flake update; builds with nix build --no-update-lock-file (fails rather than re-resolves on a stale/missing lock).
  • Explicit guard step fails with a clear pointer if tools/fixture-vm/flake.lock is absent.
  • New fixture-vm-flake-lock.yml (workflow_dispatch) resolves inputs and uploads flake.lock as an artifact — updating the lock is now a deliberate reviewed action, like cargo update.

Bootstrap step before the nightly can run: dispatch the fixture-vm flake.lock workflow once, download the artifact, commit the produced flake.lock to tools/fixture-vm/ in a PR. (No dev box here has Nix, so the lock can't be generated locally — the workflow is the path.) Until the lock is committed the nightly fails clean at the guard rather than building un-pinned.

@github-actions
Copy link
Copy Markdown

Rivet verification gate

20/20 passed

count
Passed 20
Failed 0
Skipped (no steps) 0

Filter: (and (= type "feature") (or (has-tag "v093") (has-tag "v0100")))

Failed artifacts

(none)

Updated automatically by tools/post_verification_comment.py. Source of truth: artifacts/verification.yaml.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 0% with 218 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/spar-trace-topology/src/bin/fixture-vm.rs 0.00% 218 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 9b03ef8 into main May 22, 2026
16 of 18 checks passed
@avrabe avrabe deleted the feat/v0.11.0-fixture-vm branch May 22, 2026 07:18
avrabe added a commit that referenced this pull request May 26, 2026
Workspace bump from 0.10.0 → 0.11.0 across all 22 spar crates (via
[workspace.package].version), plus the VS Code extension's
package.json — the two version surfaces the release workflow's
check-versions job compares against the tag.

What v0.11.0 ships, since v0.10.0:

  trace-topology reconciliation engine (incremental rollout):
    #239  IdentityUnknown check (PR 3a — component-borne MAC + chassis-id)
    #241  GptpOutOfBudget check  (PR 3b — single-budget case)

  trace-topology fixture pipeline:
    #233  Rust gen-fixtures tool (netns + TSN, RAII teardown)
    #234  NixOS guest + QEMU harness for fixture generation
    #238  corrected nixos/nix container digest (fd7a5c67…, multi-arch index)
    #240  podman runner label so workflows schedule on the only
          rootless-podman-capable runner in the fleet (runner9)

  codegen:
    #232  --format wit emits only WIT (strict-filter; no Rust/Bazel
          workspace leakage)
    #242  per-category file-count summary + hint when WIT was
          requested but the model has no `process` subcomponents
          (in-flight at bump time; auto-merge armed)

  release flow:
    #244  standardise on the synth reference — actions/attest-build-
          provenance@v2 + sigstore cosign sign-blob (v2.4.1) over
          SHA256SUMS.txt, build-env.txt; the v0.11.0 release is the
          first one to exercise the standardised cosign + SLSA chain.

Verification after release:
  cosign verify-blob \\
    --certificate-identity-regexp \\
      'https://github.com/pulseengine/spar/.github/workflows/release.yml@.*' \\
    --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \\
    --bundle SHA256SUMS.txt.cosign.bundle SHA256SUMS.txt
  gh attestation verify spar-v0.11.0-<triple>.tar.gz --repo pulseengine/spar

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant