Skip to content

Split approval matrix test groups#19454

Merged
dylan-hurd-oai merged 5 commits intomainfrom
codex/split-approval-matrix-tests
Apr 25, 2026
Merged

Split approval matrix test groups#19454
dylan-hurd-oai merged 5 commits intomainfrom
codex/split-approval-matrix-tests

Conversation

@dylan-hurd-oai
Copy link
Copy Markdown
Collaborator

@dylan-hurd-oai dylan-hurd-oai commented Apr 24, 2026

Why

Recent main CI repeatedly timed out in:

  • codex-core::all suite::approvals::approval_matrix_covers_all_modes

It failed in runs 24909500958, 24908076251, 24906197645, 24905823212, 24903439629, 24903336028, and 24898949647.

The failure pattern was a 60s Linux remote timeout. Logs showed many approval scenarios completing before the single matrix test timed out.

Root Cause

approval_matrix_covers_all_modes packed every approval/sandbox/tool scenario into one test case. That made the test vulnerable to normal CI variance: one slow scenario or a slow process startup could push the whole monolithic case past the 60s per-test timeout. It also hid which part of the matrix was slow because the runner only reported the one large matrix test.

What Changed

  • Keep the shared scenarios() table as the single source of approval matrix coverage.
  • Use one #[test_case] per ScenarioGroup to generate five async Tokio tests: danger/full-access, read-only, workspace-write, apply-patch, and unified-exec.
  • Keep the group runner small and add per-scenario error context so a failure still reports the specific scenario name.

Why This Should Be Reliable

Each scenario group now has its own test harness timeout instead of sharing one timeout window with the full matrix. That removes the long sequential loop from a single test while keeping the implementation compact and easy to scan.

The tests still run through the same scenario definitions and runner, so this preserves coverage. test-case already composes with #[tokio::test] in this crate and is already available for test code.

Verification

  • cargo test -p codex-core --test all approval_matrix_ -- --list
  • cargo test -p codex-core --test all approval_matrix_

@dylan-hurd-oai dylan-hurd-oai requested a review from a team as a code owner April 24, 2026 22:05
Comment thread codex-rs/core/tests/suite/approvals.rs Outdated

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn approval_matrix_covers_all_modes() -> Result<()> {
async fn approval_matrix_covers_danger_full_access_modes() -> Result<()> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should you use https://crates.io/crates/test-case or does it not work with tokio test?

@dylan-hurd-oai dylan-hurd-oai merged commit f5497f4 into main Apr 25, 2026
25 checks passed
@dylan-hurd-oai dylan-hurd-oai deleted the codex/split-approval-matrix-tests branch April 25, 2026 04:38
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants