Skip to content

test: harden app-server integration tests#19683

Merged
bolinfest merged 1 commit intomainfrom
pr19683
Apr 26, 2026
Merged

test: harden app-server integration tests#19683
bolinfest merged 1 commit intomainfrom
pr19683

Conversation

@bolinfest
Copy link
Copy Markdown
Collaborator

@bolinfest bolinfest commented Apr 26, 2026

Why

Windows Bazel runs in the permissions stack exposed that app-server integration tests were launching normal plugin startup warmups in every subprocess. Those warmups can call https://chatgpt.com/backend-api/plugins/featured when a test is not specifically exercising plugin startup, which adds slow background work, noisy stderr, and dependence on external network state. The relevant startup/featured-plugin behavior was introduced across #15042 and #15264.

A few app-server tests also had long optional waits or unbounded cleanup paths, making failures expensive to diagnose and contributing to slow Windows shards. One external-agent config test from #18246 used a GitHub-style marketplace source, which was enough to exercise the pending remote-import path but also meant the background completion task could attempt a real clone.

What Changed

  • Adds explicit AppServerRuntimeOptions / PluginStartupTasks plumbing and a hidden debug-only --disable-plugin-startup-tasks-for-tests app-server flag, so integration tests can suppress startup plugin warmups without adding a production env-var gate.
  • Has the app-server test harness pass that hidden flag by default, while opting plugin-startup coverage back in for tests that intentionally exercise startup sync and featured-plugin warmup behavior.
  • Lowers normal app-server subprocess logging from info/debug to warn to avoid multi-megabyte stderr output in Bazel logs.
  • Prevents the external-agent config test from attempting a real marketplace clone by using an invalid non-local source while still exercising the pending-import completion path.
  • Bounds optional filesystem/realtime waits and fake WebSocket test-server shutdown so failures produce targeted timeouts instead of hanging a shard.
  • Fixes the Unix script-resolution test in rmcp-client to exercise PATH resolution directly and include the actual spawn error in failures.

Verification

  • cargo check -p codex-app-server
  • cargo clippy -p codex-app-server --tests -- -D warnings
  • cargo test -p codex-rmcp-client program_resolver::tests::test_unix_executes_script_without_extension
  • cargo test -p codex-app-server --test all external_agent_config_import_sends_completion_notification_after_pending_plugins_finish -- --nocapture
  • cargo test -p codex-app-server --test all plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request -- --nocapture
  • Windows Local Bazel passed with this test-hardening bundle before it was extracted from permissions: make runtime config profile-backed #19606.

Stack created with Sapling. Best reviewed with ReviewStack.

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3099353c93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

thread_manager
.plugins_manager()
.maybe_start_plugin_startup_tasks_for_config(&config, auth_manager.clone());
if std::env::var_os(TEST_DISABLE_PLUGIN_STARTUP_TASKS_ENV_VAR).is_none() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid shipping a test env gate in startup path

Gating maybe_start_plugin_startup_tasks_for_config behind CODEX_APP_SERVER_TEST_DISABLE_PLUGIN_STARTUP_TASKS changes production behavior based on a test-only environment variable. If that variable is present in a real app-server environment (for example from reused CI/test wrappers), startup plugin tasks are silently skipped, which prevents startup sync/warmup behavior and can leave plugin state stale until later user actions. This toggle should be constrained to test-only code paths rather than the main runtime constructor.

Useful? React with 👍 / 👎.

@bolinfest bolinfest disabled auto-merge April 26, 2026 19:43
@bolinfest bolinfest merged commit ac2bffa into main Apr 26, 2026
39 checks passed
@bolinfest bolinfest deleted the pr19683 branch April 26, 2026 19:43
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants