fix(ci): perf job gates on the real frame-budget guard, not TDD stubs#915
Merged
Conversation
After #914 fixed collection, the perf job actually ran the suite and exposed that test_api_throughput.py / test_inference_speed.py are TDD red-phase stubs (every test suffixed `_should_fail_initially`) that time a *mock that sleeps* — not a real perf signal. They carry machine- dependent wall-clock asserts (actual_rps >= 40, batch_time < individual_time) that are inherently flaky on shared CI runners, plus a cross-class fixture-scope bug (`fixture 'standard_model' not found`). Result: 3 failed, 10 errored — by design, not a regression. Forcing those green would manufacture a false signal. Instead, gate only on test_frame_budget.py, which times the *real* CSIProcessor pipeline against the ADR 50 ms per-frame budget (single-frame, p95/100-frames, +Doppler) — a genuine regression guard. Verified locally: 3 passed. The stub files remain in-repo for local TDD; they re-enter CI when their features are implemented and the mock-timing asserts are made deterministic. Co-Authored-By: claude-flow <ruv@ruv.net>
ruvnet
added a commit
that referenced
this pull request
Jun 2, 2026
Since #915 the perf job gates only on test_frame_budget.py, which drives the CSIProcessor pipeline in-process and makes no HTTP calls. The "Start application" step (uvicorn + `sleep 10`) was therefore dead weight: it existed only for the now-excluded api_throughput/inference_speed tests, wasted ~10-15 s per main-push run, and dumped ~50 misleading "router requires hardware setup" ERROR lines into every CI log for a server no test touched. MOCK_POSE_DATA is server-only, unused here. Removed the step and the vestigial env. The gated test is unchanged and passes (verified locally, 3/3).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
#914 fixed the perf job's collection error (
No module named 'src'). With collection working, the suite actually executed on the main push and revealed the residual failures are not regressions — they're pre-existing, by-design TDD red-phase stubs in archived v1 code.What the failures actually are
test_api_throughput.pyandtest_inference_speed.py: every test is named..._should_fail_initially(TDD red-phase) and times a mock that sleeps — not a real performance signal. They carry:actual_rps >= 40,batch_time < individual_time) — inherently flaky on shared CI runnersfixture 'standard_model' not found(10 errors at setup)Net on the main push: 3 failed, 10 errored — by design, 16 others pass only because the mock happens to satisfy them.
Forcing them green (tuning thresholds) would manufacture a false perf signal.
Fix
Gate the perf job on
test_frame_budget.pyonly — it times the realCSIProcessorpipeline against the ADR 50 ms per-frame budget (single-frame, p95 over 100 frames, +Doppler). That's a genuine regression guard.The stub files stay in-repo for local TDD; they re-enter CI when their features are implemented and the mock-timing asserts are made deterministic.
Verification (local, exact CI command)
YAML validated.
🤖 Generated with claude-flow