Run your test command N times. Find out which tests don't always agree with themselves.
TL;DR:
/flaky-detector --cmd "pytest -v" --runs 10→ per-test flakiness %, sorted worst-first, ready to triage.
A test that fails 1-in-20 wastes more team time than a test that never fails. CI flakes erode trust; everyone learns to "just re-run it". This tool runs your suite N times, parses pass/fail per test, and tells you exactly which tests are flaky and at what rate — so you can decide: rerun, isolate, mark @pytest.mark.flaky, or fix the underlying race.
git clone https://github.com/mturac/pluginpool-flaky-detector ~/.claude/plugins/flaky-detectorRestart Claude Code; the slash command /flaky-detector appears.
python3 scripts/flaky.py --cmd "pytest -v" --runs 10 --format md
python3 scripts/flaky.py --cmd "go test ./..." --runs 20 --parallel 4 --out report.json
python3 scripts/flaky.py --cmd "jest --ci" --parser jest --runs 5Tip: prefer
pytest -voverpytest -qso every result lands on its own line. If you use-q, flaky-detector still picks up the tailFAILED path::testsummary lines and exits non-zero with a warning rather than reporting a false green.
| Flag | Default | Description |
|---|---|---|
--cmd |
required | The test command (single line, no shell wrapping) |
--runs |
10 |
How many times to invoke the command |
--parallel |
1 |
Concurrent runs (only safe for parallel-clean suites) |
--parser |
auto |
pytest, jest, gotest, tap, or auto |
--out |
none | Write JSON report to this path |
--format |
json |
json or md |
| Parser | Matches |
|---|---|
pytest |
`tests/foo.py::test_bar PASSED |
jest / vitest |
✓ name, ✗ name, PASS file, FAIL file |
gotest |
--- PASS:, --- FAIL:, --- SKIP: |
tap |
ok N - name, not ok N - name |
| Code | Meaning |
|---|---|
0 |
No flakies, no always-failing |
1 |
At least one test is flaky (0 < flakiness_pct < 100) |
2 |
At least one test is always-failing |
3 |
Zero tests parsed but the runner reported activity — re-run with -v |
# Flaky-detector report (10 runs)
- flaky: **2** | always-failing: **0** | always-passing: 47
| test | pass | fail | flakiness % |
|---|---|---|---|
| tests/test_payment.py::test_idempotency | 6 | 4 | 40.0 |
| tests/test_search.py::test_index_warmup | 8 | 2 | 20.0 |
--parallel > 1only works for parallel-safe suites; otherwise concurrent runs share state and lie.- Streaming stdout from very long suites is buffered — be patient on the first run.
- The parser is tuned for default reporters. Custom plugins (pytest-rich, etc.) may need a tweak.
Step-by-step walkthroughs with real input fixtures and the helper's actual output live in examples/. Three or four scenarios per plugin — from the happy path to the edge cases the test suite guards.
Ten focused Claude Code plugins for everyday productivity: commit-narrator · pr-storyteller · test-gap · deps-doctor · env-lint · secret-guard · standup-gen · todo-harvest · flaky-detector · changelog-forge
MIT — see LICENSE. Contributions welcome.