fix(ci): perf job gates on the real frame-budget guard, not TDD stubs by ruvnet · Pull Request #915 · ruvnet/RuView

ruvnet · 2026-06-02T16:24:19Z

Background

#914 fixed the perf job's collection error (No module named 'src'). With collection working, the suite actually executed on the main push and revealed the residual failures are not regressions — they're pre-existing, by-design TDD red-phase stubs in archived v1 code.

What the failures actually are

test_api_throughput.py and test_inference_speed.py: every test is named ..._should_fail_initially (TDD red-phase) and times a mock that sleeps — not a real performance signal. They carry:

machine-dependent wall-clock asserts (actual_rps >= 40, batch_time < individual_time) — inherently flaky on shared CI runners
a cross-class fixture-scope bug → fixture 'standard_model' not found (10 errors at setup)

Net on the main push: 3 failed, 10 errored — by design, 16 others pass only because the mock happens to satisfy them.

Forcing them green (tuning thresholds) would manufacture a false perf signal.

Fix

Gate the perf job on test_frame_budget.py only — it times the real CSIProcessor pipeline against the ADR 50 ms per-frame budget (single-frame, p95 over 100 frames, +Doppler). That's a genuine regression guard.

python -m pytest tests/performance/test_frame_budget.py -o addopts="" -v --junitxml=perf-junit.xml

The stub files stay in-repo for local TDD; they re-enter CI when their features are implemented and the mock-timing asserts are made deterministic.

Verification (local, exact CI command)

3 passed, 5 warnings in 10.38s
  test_single_frame_under_50ms PASSED
  test_sustained_100_frames_p95 PASSED
  test_doppler_pipeline PASSED

YAML validated.

🤖 Generated with claude-flow

After #914 fixed collection, the perf job actually ran the suite and exposed that test_api_throughput.py / test_inference_speed.py are TDD red-phase stubs (every test suffixed `_should_fail_initially`) that time a *mock that sleeps* — not a real perf signal. They carry machine- dependent wall-clock asserts (actual_rps >= 40, batch_time < individual_time) that are inherently flaky on shared CI runners, plus a cross-class fixture-scope bug (`fixture 'standard_model' not found`). Result: 3 failed, 10 errored — by design, not a regression. Forcing those green would manufacture a false signal. Instead, gate only on test_frame_budget.py, which times the *real* CSIProcessor pipeline against the ADR 50 ms per-frame budget (single-frame, p95/100-frames, +Doppler) — a genuine regression guard. Verified locally: 3 passed. The stub files remain in-repo for local TDD; they re-enter CI when their features are implemented and the mock-timing asserts are made deterministic. Co-Authored-By: claude-flow <ruv@ruv.net>

Since #915 the perf job gates only on test_frame_budget.py, which drives the CSIProcessor pipeline in-process and makes no HTTP calls. The "Start application" step (uvicorn + `sleep 10`) was therefore dead weight: it existed only for the now-excluded api_throughput/inference_speed tests, wasted ~10-15 s per main-push run, and dumped ~50 misleading "router requires hardware setup" ERROR lines into every CI log for a server no test touched. MOCK_POSE_DATA is server-only, unused here. Removed the step and the vestigial env. The gated test is unchanged and passes (verified locally, 3/3).

ruvnet merged commit 88b835d into main Jun 2, 2026
20 checks passed

ruvnet deleted the fix/perf-job-real-guard branch June 2, 2026 16:31

ruvnet mentioned this pull request Jun 2, 2026

perf(ci): drop dead uvicorn start from perf job #917

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): perf job gates on the real frame-budget guard, not TDD stubs#915

fix(ci): perf job gates on the real frame-budget guard, not TDD stubs#915
ruvnet merged 1 commit into
mainfrom
fix/perf-job-real-guard

ruvnet commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ruvnet commented Jun 2, 2026

Background

What the failures actually are

Fix

Verification (local, exact CI command)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant