Skip to content

fix(ci): perf job gates on the real frame-budget guard, not TDD stubs#915

Merged
ruvnet merged 1 commit into
mainfrom
fix/perf-job-real-guard
Jun 2, 2026
Merged

fix(ci): perf job gates on the real frame-budget guard, not TDD stubs#915
ruvnet merged 1 commit into
mainfrom
fix/perf-job-real-guard

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented Jun 2, 2026

Background

#914 fixed the perf job's collection error (No module named 'src'). With collection working, the suite actually executed on the main push and revealed the residual failures are not regressions — they're pre-existing, by-design TDD red-phase stubs in archived v1 code.

What the failures actually are

test_api_throughput.py and test_inference_speed.py: every test is named ..._should_fail_initially (TDD red-phase) and times a mock that sleeps — not a real performance signal. They carry:

  • machine-dependent wall-clock asserts (actual_rps >= 40, batch_time < individual_time) — inherently flaky on shared CI runners
  • a cross-class fixture-scope bug → fixture 'standard_model' not found (10 errors at setup)

Net on the main push: 3 failed, 10 errored — by design, 16 others pass only because the mock happens to satisfy them.

Forcing them green (tuning thresholds) would manufacture a false perf signal.

Fix

Gate the perf job on test_frame_budget.py only — it times the real CSIProcessor pipeline against the ADR 50 ms per-frame budget (single-frame, p95 over 100 frames, +Doppler). That's a genuine regression guard.

python -m pytest tests/performance/test_frame_budget.py -o addopts="" -v --junitxml=perf-junit.xml

The stub files stay in-repo for local TDD; they re-enter CI when their features are implemented and the mock-timing asserts are made deterministic.

Verification (local, exact CI command)

3 passed, 5 warnings in 10.38s
  test_single_frame_under_50ms PASSED
  test_sustained_100_frames_p95 PASSED
  test_doppler_pipeline PASSED

YAML validated.

🤖 Generated with claude-flow

After #914 fixed collection, the perf job actually ran the suite and
exposed that test_api_throughput.py / test_inference_speed.py are TDD
red-phase stubs (every test suffixed `_should_fail_initially`) that time
a *mock that sleeps* — not a real perf signal. They carry machine-
dependent wall-clock asserts (actual_rps >= 40, batch_time < individual_time)
that are inherently flaky on shared CI runners, plus a cross-class
fixture-scope bug (`fixture 'standard_model' not found`). Result: 3 failed,
10 errored — by design, not a regression.

Forcing those green would manufacture a false signal. Instead, gate only
on test_frame_budget.py, which times the *real* CSIProcessor pipeline
against the ADR 50 ms per-frame budget (single-frame, p95/100-frames,
+Doppler) — a genuine regression guard. Verified locally: 3 passed.

The stub files remain in-repo for local TDD; they re-enter CI when their
features are implemented and the mock-timing asserts are made deterministic.

Co-Authored-By: claude-flow <ruv@ruv.net>
@ruvnet ruvnet merged commit 88b835d into main Jun 2, 2026
20 checks passed
@ruvnet ruvnet deleted the fix/perf-job-real-guard branch June 2, 2026 16:31
ruvnet added a commit that referenced this pull request Jun 2, 2026
Since #915 the perf job gates only on test_frame_budget.py, which drives
the CSIProcessor pipeline in-process and makes no HTTP calls. The
"Start application" step (uvicorn + `sleep 10`) was therefore dead weight:
it existed only for the now-excluded api_throughput/inference_speed tests,
wasted ~10-15 s per main-push run, and dumped ~50 misleading
"router requires hardware setup" ERROR lines into every CI log for a
server no test touched. MOCK_POSE_DATA is server-only, unused here.

Removed the step and the vestigial env. The gated test is unchanged and
passes (verified locally, 3/3).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant