Split approval matrix test groups#19454
Merged
dylan-hurd-oai merged 5 commits intomainfrom Apr 25, 2026
Merged
Conversation
bolinfest
approved these changes
Apr 24, 2026
bolinfest
reviewed
Apr 24, 2026
|
|
||
| #[tokio::test(flavor = "multi_thread", worker_threads = 2)] | ||
| async fn approval_matrix_covers_all_modes() -> Result<()> { | ||
| async fn approval_matrix_covers_danger_full_access_modes() -> Result<()> { |
Collaborator
There was a problem hiding this comment.
Can/should you use https://crates.io/crates/test-case or does it not work with tokio test?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Recent
mainCI repeatedly timed out in:codex-core::all suite::approvals::approval_matrix_covers_all_modesIt failed in runs 24909500958, 24908076251, 24906197645, 24905823212, 24903439629, 24903336028, and 24898949647.
The failure pattern was a 60s Linux remote timeout. Logs showed many approval scenarios completing before the single matrix test timed out.
Root Cause
approval_matrix_covers_all_modespacked every approval/sandbox/tool scenario into one test case. That made the test vulnerable to normal CI variance: one slow scenario or a slow process startup could push the whole monolithic case past the 60s per-test timeout. It also hid which part of the matrix was slow because the runner only reported the one large matrix test.What Changed
scenarios()table as the single source of approval matrix coverage.#[test_case]perScenarioGroupto generate five async Tokio tests: danger/full-access, read-only, workspace-write, apply-patch, and unified-exec.Why This Should Be Reliable
Each scenario group now has its own test harness timeout instead of sharing one timeout window with the full matrix. That removes the long sequential loop from a single test while keeping the implementation compact and easy to scan.
The tests still run through the same scenario definitions and runner, so this preserves coverage.
test-casealready composes with#[tokio::test]in this crate and is already available for test code.Verification
cargo test -p codex-core --test all approval_matrix_ -- --listcargo test -p codex-core --test all approval_matrix_