test: fail proxysql-tester.py TAP runs when zero binaries are discovered (safety net)#5602
Conversation
Defence-in-depth against the silent false-green failure mode we just
tracked down. Previously, if CI-builds produced no TAP test binaries
on the host (for any reason -- wrong make target, broken volume mount,
typo in a docker-compose override, misconfigured groups.json), the
tester would walk three workdirs, find zero `*-t` binaries in each,
report:
SUMMARY: 'tests' PASS 0/0 : FAIL 0/0 : SKIP 0/0
SUMMARY: 'deprecate_eof_support' PASS 0/0 : FAIL 0/0 : SKIP 0/0
SUMMARY: 'unit' PASS 0/0 : FAIL 0/0 : SKIP 0/0
SUMMARY: ret_rc = [0]
...and exit 0. Zero failures out of zero tests is technically "success"
by the old logic, so the workflow would turn green. Combined with a
build-target regression on the GH-Actions branch, this masked the fact
that the entire TAP test suite (CI-legacy-g*, CI-mysql84-g*, CI-basictests,
CI-taptests-pgsql-cluster, CI-legacy-clickhouse-g1, CI-legacy-g2-genai)
had been running zero tests per commit for several weeks while still
reporting green.
Safety net
----------
At the end of `run_tap_tests`, check whether `glob(*-t)` found ANY
binaries across ALL workdirs. The summary tuple (cmd, rc) is populated
for every discovered binary, including ones that end up skipped by
version filter or INCL/EXCL regexes -- those get `rc=None` rather
than being absent. So `len(ret_summary) == 0` is a strict "nothing
was even on disk" signal.
When that happens, check groups.json for how many tests TAP_GROUP is
expected to run. If N > 0, log a SAFETY NET FAIL line and bump ret_rc
to 1. Version-filter skips and INCL/EXCL skips are not affected -- they
still produce summary entries, so ret_summary is non-empty in those
cases. Only the actual "test binaries were never built" case trips
the guard.
If TAP_GROUP is unset (running the full TAP suite), any zero-discovery
result is a failure regardless -- there's no legitimate configuration
where proxysql-tester should run zero tests.
This complements the ci-builds.yml fix on the GH-Actions branch that
restores the TAP build target. With both in place, a future regression
in either the build or the test discovery chain surfaces loudly
instead of silently.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🧰 Additional context used🧠 Learnings (6)📓 Common learnings📚 Learning: 2026-04-01T21:27:03.216ZApplied to files:
📚 Learning: 2026-04-11T05:43:20.598ZApplied to files:
📚 Learning: 2026-04-11T05:43:20.598ZApplied to files:
📚 Learning: 2026-01-20T09:34:27.165ZApplied to files:
📚 Learning: 2026-01-20T07:40:34.938ZApplied to files:
🪛 Ruff (0.15.9)test/scripts/bin/proxysql-tester.py[warning] 1121-1121: Do not catch blind exception: (BLE001) 🔇 Additional comments (1)
📝 WalkthroughWalkthroughThe pull request adds a safety net check to the Changes
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a safety net to the TAP test runner to prevent silent CI successes when no test binaries are discovered. The review feedback highlights that the logic for counting expected tests should exclude metadata tags (keys starting with '@') and that the safety net should also trigger if a specified group is not found in the configuration file, as the system defaults to running all tests in that scenario.
| expected_tests = sum( | ||
| 1 for _, test_groups in groups.items() | ||
| if tap_group in test_groups | ||
| ) |
There was a problem hiding this comment.
The current logic counts all entries in groups.json where the group matches, including metadata entries (keys starting with @). According to the logic in test/tap/groups/check_groups.py (line 71), keys starting with @ are metadata tags and not actual test names. These should be excluded from the expected_tests count to ensure the safety net accurately reflects the number of runnable tests.
| expected_tests = sum( | |
| 1 for _, test_groups in groups.items() | |
| if tap_group in test_groups | |
| ) | |
| expected_tests = sum( | |
| 1 for test_name, test_groups in groups.items() | |
| if not test_name.startswith("@") and tap_group in test_groups | |
| ) |
| ) | ||
| except Exception as e: | ||
| log.warning(f"Safety net: could not parse groups.json: {e}") | ||
| if not tap_group or expected_tests > 0: |
There was a problem hiding this comment.
If TAP_GROUP is set but not found in groups.json (or if groups.json is missing/unreadable), the tester defaults to running all discovered TAP tests (see lines 650 and 653). In this scenario, if zero binaries are discovered, the safety net will remain silent because expected_tests will be 0. To ensure the safety net catches these cases, it should also trigger if a group is specified but cannot be validated against the configuration file, as we still expect the full suite to be present.
Validates that #5601 (GH-Actions ci-builds -tap target fix) and #5602 (v3.0 proxysql-tester.py zero-test safety net) combine to produce an actually-running TAP test pipeline. Expected signals in CI-builds log: - make ubuntu22-tap (not ubuntu22-dbg) - >>>tap-matrix.txt<<< section with 3+ workdirs and hundreds of -t entries - _test cache size >> 2 MB Expected signals in CI-legacy-g* / CI-mysql84-g* runs: - Run <group> tests step takes minutes, not seconds - proxysql-tester.py reports PASS N/N with N > 0
Summary
Adds a defence-in-depth safety net to `test/scripts/bin/proxysql-tester.py` so that a TAP run with zero discovered test binaries fails loudly instead of silently reporting green.
Depends conceptually on #5601 (GH-Actions), which fixes the underlying build regression. This PR is the backstop that will catch any future regression of the same class.
Why
Up until now, if CI-builds produced no `test/tap/tests/**/-t` binaries on the host (for any reason -- wrong make target, broken docker-compose mount, typo in a rewrite, removal of a build target), `proxysql-tester.py` would walk the three TAP workdirs (`tests`, `tests_with_deps/deprecate_eof_support`, `tests/unit`), `glob(-t)` would return zero in each, and the tester would print:
```
SUMMARY: 'tests' PASS 0/0 : FAIL 0/0 : SKIP 0/0
SUMMARY: 'deprecate_eof_support' PASS 0/0 : FAIL 0/0 : SKIP 0/0
SUMMARY: 'unit' PASS 0/0 : FAIL 0/0 : SKIP 0/0
SUMMARY: ret_rc = [0]
```
Zero failures out of zero tests is "success" by the old logic, so the workflow turns green.
That is exactly what happened during the multi-week silent regression on the `GH-Actions` branch after commit `6e7d93229` landed the wrong `make` target in `ci-builds.yml`. The entire TAP test suite -- CI-legacy-g*, CI-mysql84-g*, CI-basictests, CI-taptests-pgsql-cluster, CI-legacy-clickhouse-g1, CI-legacy-g2-genai -- was reporting green every commit while running zero tests. The build-target fix is in #5601. This PR is the guard that means the next regression of this kind surfaces immediately instead of hiding.
What the safety net does
At the end of `Psqlt.run_tap_tests`, after the workdir loop completes, it checks `len(ret_summary)`.
The `summary` tuple `(cmd, rc)` is populated for every discovered binary, including ones that end up skipped by version filter, group membership, or `TEST_PY_TAP_INCL/EXCL` regexes -- those get `rc=None` rather than being absent from the summary. So `len(ret_summary) == 0` is a strict "`glob(*-t)` found nothing in any workdir" signal.
When that happens, the safety net looks up `TAP_GROUP` in `test/tap/groups/groups.json` and counts how many tests that group is expected to contain. If the expected count is > 0 (or `TAP_GROUP` is unset, meaning we're running the full TAP suite), the safety net logs a `SAFETY NET FAIL` message and bumps `ret_rc` to at least 1, making the workflow fail.
What does NOT trip the safety net:
Only the "nothing was even on disk" case trips it. That's the case that used to silently report green and must stop doing so.
Test plan
Trade-offs
Summary by CodeRabbit