[DX-3769] CLI and AI Skill to Discover and Diagnose Flaky Tests (and easier test runs in general)#22125
[DX-3769] CLI and AI Skill to Discover and Diagnose Flaky Tests (and easier test runs in general)#22125
Conversation
|
👋 kalverra, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
✅ No conflicts with other open PRs targeting |
There was a problem hiding this comment.
Pull request overview
Risk Rating: MEDIUM
Adds a new nested Go module (tools/test) that provides a developer-friendly harness for running Chainlink unit tests (with optional ephemeral Postgres) plus a diagnose mode for surfacing flakes/timeouts/slow tests, and ships a corresponding agent skill/playbook.
Changes:
- Introduce
tools/testCLI withtest,gotestsum, and multi-iterationdiagnoseworkflows (including progress UI + report/log generation). - Add Postgres lifecycle management via
testcontainers-go, plus runner/analyzer logic and unit tests. - Wire repository docs + make targets + agent skill docs to make the new workflows discoverable.
Scrupulous human review recommended (high-impact areas):
tools/test/internal/db/db.go: container lifecycle, Ryuk disablement, and failure/interrupt cleanup paths.tools/test/internal/runner/runner.go: correctness of stdout/stderr capture and JSONL integrity under concurrent writes/cancellation.tools/test/go.mod: toolchain version selection and compatibility with the root module.
Reviewed changes
Copilot reviewed 28 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/test/main.go | Entry point for the nested tools/test module CLI. |
| tools/test/internal/termstyle/termstyle.go | Shared lipgloss palette for CLI progress/summary output. |
| tools/test/internal/runner/runner.go | Implements go test, gotestsum, and the diagnose loop orchestration. |
| tools/test/internal/runner/runner_test.go | Tests diagnose arg building, results dir naming, canceled-ctx behavior, etc. |
| tools/test/internal/runner/diagnose_results_dir.go | Generates bounded-length results directory basenames. |
| tools/test/internal/runner/diagnose_progress.go | Parses go test -json to track progress and render status lines. |
| tools/test/internal/runner/diagnose_progress_test.go | Tests progress parsing/rendering helpers. |
| tools/test/internal/runner/analyze.go | Parses iteration logs into report.json/csv + per-test log files + summary rendering. |
| tools/test/internal/runner/analyze_test.go | Unit tests for flake/failure/timeout/slow classification and log capture. |
| tools/test/internal/runner/analyze_files_test.go | Unit tests for writing log files and CSV output. |
| tools/test/internal/repo/repo.go | Locates the chainlink repo root by walking up to the root go.mod. |
| tools/test/internal/repo/repo_test.go | Tests repo root discovery behavior. |
| tools/test/internal/db/db.go | Manages ephemeral Postgres via testcontainers + reset/dump/cleanup helpers. |
| tools/test/internal/db/db_test.go | Tests dump-state nil/no-container behavior. |
| tools/test/internal/config/config.go | Viper/cobra config binding and defaults for the harness. |
| tools/test/internal/config/config_test.go | Tests persistent + local flag binding into config. |
| tools/test/internal/cmd/root.go | Cobra root command, signal-aware execution, and DB setup pre-run. |
| tools/test/internal/cmd/root_test.go | Tests displayed command path formatting. |
| tools/test/internal/cmd/test.go | test subcommand (passthrough to go test). |
| tools/test/internal/cmd/gotestsum.go | gotestsum subcommand wiring. |
| tools/test/internal/cmd/diagnose.go | diagnose subcommand wiring and flags. |
| tools/test/go.mod | Declares the nested module and dependencies. |
| tools/test/go.sum | Adds dependency checksums for the nested module. |
| tools/test/README.md | Basic usage docs for the new harness. |
| tools/test/AGENTS.md | Agent-facing constraints/goals/commands for working on this tool. |
| tools/test/fixing-flaky-tests.md | General playbook for diagnosing/fixing flakes. |
| tools/README.md | Adds a pointer to the new tools/test harness. |
| GNUmakefile | Adds new_test, new_gotestsum, and new_test_diagnose targets. |
| .gitignore | Ignores diagnose-* output dirs while explicitly keeping the agent skill directory. |
| .agents/skills/diagnose-tests/SKILL.md | Adds an AI skill for running diagnose and using its report/logs to drive fixes. |
|
|
Had to revert the |





Includes a new tool,
tools/testthat provides a simpler way for devs to run /chainlink unit tests, and includes features to help hunt down flakes and races.Simpler Test Runs
Running tests in /chainlink requires chaining together 2-3 make commands, and you can miss steps. Not anymore!
Why not just
go test ./...?I'd love to, but the problem is overhead.
go testdoesn't have any sort of "before all test runs" control. So, to makego test ./...work, we'd need to launch a new PostgreSQL test container for every package that needs a real DB. This can cause serious memory, processing, and runtime overhead.It is possible this isn't a big deal, and we could reduce/eliminate this concern with some refactoring, but that's outside the scope of this approach.
diagnoseYou can now easily diagnose test issues with a single command, re-running tests, packages, and suites, and generating summarized results. With one command, you can exhaustively search for:
/diagnose-testsAI SkillIt contains steps for the agent to run various survey and test commands, review logs and code, and provide fixes for issues you encounter.
Note: Current sandbox and security restrictions mean that most agents will have to ask you to run the commands for them. Working with the security team on a workaround.
Note: Most agents will discover this skill automatically. But Claude Code won't, you'll need to point it to the file.
Future Plans/Improvements
/diagnose-testsas I use it more, include more advice and restrictionsgo tool testinstead ofgo -C ./tools/testonce update to testcontainers-go version with moby chainlink-testing-framework#2510 is merged.