Skip to content

[DX-3769] CLI and AI Skill to Discover and Diagnose Flaky Tests (and easier test runs in general)#22125

Open
kalverra wants to merge 54 commits intodevelopfrom
testRunner
Open

[DX-3769] CLI and AI Skill to Discover and Diagnose Flaky Tests (and easier test runs in general)#22125
kalverra wants to merge 54 commits intodevelopfrom
testRunner

Conversation

@kalverra
Copy link
Copy Markdown
Collaborator

@kalverra kalverra commented Apr 22, 2026

Includes a new tool, tools/test that provides a simpler way for devs to run /chainlink unit tests, and includes features to help hunt down flakes and races.

Simpler Test Runs

Running tests in /chainlink requires chaining together 2-3 make commands, and you can miss steps. Not anymore!

# Help menu for all available commands
go -C ./tools/test run . -h

# Spin up a Postgres container, run go tests, and tear it down all with one command!
go -C ./tools/test run . run -v -count=1 -p 4 ./core/...
# Or use make
make new_test ARGS="-v -p 4 ./core/..."

# Use gotestsum if you prefer!
go -C ./tools/test run . gotestsum --format=dots -- -count=1 ./core/...
# Even with make!
make new_gotestsum ARGS="--format=dots -- -count=1 ./core/..."

Why not just go test ./...?

I'd love to, but the problem is overhead.

go test doesn't have any sort of "before all test runs" control. So, to make go test ./... work, we'd need to launch a new PostgreSQL test container for every package that needs a real DB. This can cause serious memory, processing, and runtime overhead.

It is possible this isn't a big deal, and we could reduce/eliminate this concern with some refactoring, but that's outside the scope of this approach.

diagnose

You can now easily diagnose test issues with a single command, re-running tests, packages, and suites, and generating summarized results. With one command, you can exhaustively search for:

  • Flakes
  • Races
  • Slow tests
  • Deadlocks
# Run a package 25 times to look for slow tests (30s or more)
go -C ./tools/test run . diagnose --iterations 25 --slow-threshold 30s -- ./core/services/ocr2/plugins/ocr2keeper/...
# Run all of core 10 times with race (will take a while!)
go -C ./tools/test run . diagnose --iterations 10 -- --timeout=10m -race ./core/...

/diagnose-tests AI Skill

It contains steps for the agent to run various survey and test commands, review logs and code, and provide fixes for issues you encounter.

Note: Current sandbox and security restrictions mean that most agents will have to ask you to run the commands for them. Working with the security team on a workaround.

Note: Most agents will discover this skill automatically. But Claude Code won't, you'll need to point it to the file.

Future Plans/Improvements

Copilot AI review requested due to automatic review settings April 22, 2026 01:41
@kalverra kalverra requested review from a team as code owners April 22, 2026 01:41
@github-actions
Copy link
Copy Markdown
Contributor

👋 kalverra, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

✅ No conflicts with other open PRs targeting develop

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Risk Rating: MEDIUM

Adds a new nested Go module (tools/test) that provides a developer-friendly harness for running Chainlink unit tests (with optional ephemeral Postgres) plus a diagnose mode for surfacing flakes/timeouts/slow tests, and ships a corresponding agent skill/playbook.

Changes:

  • Introduce tools/test CLI with test, gotestsum, and multi-iteration diagnose workflows (including progress UI + report/log generation).
  • Add Postgres lifecycle management via testcontainers-go, plus runner/analyzer logic and unit tests.
  • Wire repository docs + make targets + agent skill docs to make the new workflows discoverable.

Scrupulous human review recommended (high-impact areas):

  • tools/test/internal/db/db.go: container lifecycle, Ryuk disablement, and failure/interrupt cleanup paths.
  • tools/test/internal/runner/runner.go: correctness of stdout/stderr capture and JSONL integrity under concurrent writes/cancellation.
  • tools/test/go.mod: toolchain version selection and compatibility with the root module.

Reviewed changes

Copilot reviewed 28 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/test/main.go Entry point for the nested tools/test module CLI.
tools/test/internal/termstyle/termstyle.go Shared lipgloss palette for CLI progress/summary output.
tools/test/internal/runner/runner.go Implements go test, gotestsum, and the diagnose loop orchestration.
tools/test/internal/runner/runner_test.go Tests diagnose arg building, results dir naming, canceled-ctx behavior, etc.
tools/test/internal/runner/diagnose_results_dir.go Generates bounded-length results directory basenames.
tools/test/internal/runner/diagnose_progress.go Parses go test -json to track progress and render status lines.
tools/test/internal/runner/diagnose_progress_test.go Tests progress parsing/rendering helpers.
tools/test/internal/runner/analyze.go Parses iteration logs into report.json/csv + per-test log files + summary rendering.
tools/test/internal/runner/analyze_test.go Unit tests for flake/failure/timeout/slow classification and log capture.
tools/test/internal/runner/analyze_files_test.go Unit tests for writing log files and CSV output.
tools/test/internal/repo/repo.go Locates the chainlink repo root by walking up to the root go.mod.
tools/test/internal/repo/repo_test.go Tests repo root discovery behavior.
tools/test/internal/db/db.go Manages ephemeral Postgres via testcontainers + reset/dump/cleanup helpers.
tools/test/internal/db/db_test.go Tests dump-state nil/no-container behavior.
tools/test/internal/config/config.go Viper/cobra config binding and defaults for the harness.
tools/test/internal/config/config_test.go Tests persistent + local flag binding into config.
tools/test/internal/cmd/root.go Cobra root command, signal-aware execution, and DB setup pre-run.
tools/test/internal/cmd/root_test.go Tests displayed command path formatting.
tools/test/internal/cmd/test.go test subcommand (passthrough to go test).
tools/test/internal/cmd/gotestsum.go gotestsum subcommand wiring.
tools/test/internal/cmd/diagnose.go diagnose subcommand wiring and flags.
tools/test/go.mod Declares the nested module and dependencies.
tools/test/go.sum Adds dependency checksums for the nested module.
tools/test/README.md Basic usage docs for the new harness.
tools/test/AGENTS.md Agent-facing constraints/goals/commands for working on this tool.
tools/test/fixing-flaky-tests.md General playbook for diagnosing/fixing flakes.
tools/README.md Adds a pointer to the new tools/test harness.
GNUmakefile Adds new_test, new_gotestsum, and new_test_diagnose targets.
.gitignore Ignores diagnose-* output dirs while explicitly keeping the agent skill directory.
.agents/skills/diagnose-tests/SKILL.md Adds an AI skill for running diagnose and using its report/logs to drive fixes.

Comment thread tools/test/internal/runner/analyze.go Outdated
Comment thread tools/test/internal/runner/analyze_files_test.go Outdated
Comment thread tools/test/internal/db/db.go Outdated
Comment thread tools/test/internal/cmd/diagnose.go Outdated
Comment thread tools/test/internal/config/config.go Outdated
Comment thread tools/test/internal/runner/runner.go Outdated
Comment thread tools/test/internal/runner/runner.go
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Apr 22, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

Comment thread GNUmakefile Outdated
Comment thread tools/test/internal/runner/analyze.go Outdated
Comment thread tools/test/internal/runner/runner.go
Comment thread tools/test/internal/runner/runner.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 50 changed files in this pull request and generated 5 comments.

Comment thread tools/test/internal/runner/runner.go
Comment thread tools/test/internal/runner/analyze.go
Comment thread tools/test/internal/repo/repo.go
Comment thread tools/test/internal/cmd/run.go Outdated
Comment thread tools/test/internal/cmd/gotestsum.go Outdated
@cl-sonarqube-production
Copy link
Copy Markdown

@kalverra
Copy link
Copy Markdown
Collaborator Author

Had to revert the go tool usage as it was causing dependency hell. Need to wait until smartcontractkit/chainlink-testing-framework#2510 is merged.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 49 changed files in this pull request and generated 2 comments.

Comment thread tools/test/internal/runner/diagnose_progress.go Outdated
Comment thread tools/test/internal/db/db.go
@kalverra kalverra requested review from Tofel and sebawo April 27, 2026 20:22
@kalverra kalverra enabled auto-merge April 27, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants