Skip to content

Harden timing-sensitive tests: replace Process.sleep with proper synchronization#43

Merged
guess merged 16 commits intoguess:mainfrom
ppsplus-bradh:fix/harden-timing-sensitive-tests
Mar 28, 2026
Merged

Harden timing-sensitive tests: replace Process.sleep with proper synchronization#43
guess merged 16 commits intoguess:mainfrom
ppsplus-bradh:fix/harden-timing-sensitive-tests

Conversation

@ppsplus-bradh
Copy link
Copy Markdown
Contributor

Summary

Replace Process.sleep calls across 3 test files with deterministic synchronization, eliminating a class of flaky test failures on fast or slow CI runners.

Motivation

Several tests use fixed Process.sleep(50-100) calls to wait for async operations (request delivery, stream cleanup, process shutdown, supervisor restarts). These are brittle — too short on slow runners, unnecessarily slow on fast ones. The codebase already has MockCLI.poll_until/2 for proper async polling; these tests just weren't using it.

Note on timeouts

MockCLI.poll_until/2 has a default timeout of 5 seconds and a polling interval of 10ms. Both can be overridden via options (timeout:, interval:) to shorten the overall polling period where appropriate. If the condition is never met, the test fails with a raised exception rather than hanging indefinitely.

Changes

test/claude_code/session_test.exs

Test Was Now
"stream cleanup removes request" (adapter) Process.sleep(50) x2 Direct :sys.get_state assert (sync call guarantees state) + MockCLI.poll_until for cleanup
"discards raw message for unknown request_id" Process.sleep(50) Synchronous :sys.get_state call
"logs warning for unparseable raw message" Process.sleep(50) Synchronous :sys.get_state call
"stream cleanup removes request" (streaming) Process.sleep(100) + Process.sleep(50) Direct :sys.get_state assert + MockCLI.poll_until for cleanup
"handles multiple concurrent queries" Process.sleep(100) MockCLI.poll_until on map_size(state.requests)
"handles sequential streaming queries" Process.sleep(100) MockCLI.poll_until on map_size(state.requests)

test/claude_code_test.exs

Test Was Now
"returns false for stopped session" Process.sleep(100) Process.monitor + assert_receive {:DOWN, ...}
"stops a running session" Process.sleep(100) Process.monitor + assert_receive {:DOWN, ...}

test/claude_code/supervisor_test.exs

Test Was Now
"restarts an existing session" Process.sleep(50) MockCLI.poll_until for new pid
"restarts crashed sessions automatically" Process.sleep(100) MockCLI.poll_until for new pid
"handles multiple session crashes independently" Process.sleep(100) MockCLI.poll_until for count + pid change

Additionally, two supervisor tests were corrected after validation revealed their premises were no longer accurate:

Test Was Now Reason
"handles session startup errors gracefully" Process.sleep(100) + assert length(children) <= 1 Renamed to "handles session startup with empty config", asserts count == 1 Empty config [] is valid — api_key is now optional (defaults to ANTHROPIC_API_KEY env var). The child starts successfully and stays alive; it never crashes. The original test assumed a missing api_key would cause a crash, which is no longer the case.
"handles invalid session config" Process.sleep(50) + assert count >= 0 Renamed to "starts session with empty config", asserts count == 1 and Process.alive?(pid) Same reason — testing the inverse (child survives) validates current behavior.

Test plan

  • All 1509 tests pass
  • mix quality passes (compile, format, credo, dialyzer)
  • Zero Process.sleep calls remain in the 3 modified files
  • Code review verified polling conditions are correct and no no-op polls remain
  • CHANGELOG.md updated

Eliminates shell escaping entirely by spawning the CLI binary directly
via Port.open with native :args, :env, and :cd options instead of
building a concatenated command string for /bin/sh -c.
Prevents leaking sensitive host environment (SSH keys, database URLs,
cloud credentials) to the CLI subprocess. Filters by CLI-recognized
prefixes (ANTHROPIC_, CLAUDE_CODE_, CLAUDE_, VERTEX_REGION_), an
explicit allowlist of non-namespaced CLI vars, and essential system
vars (PATH, HOME, etc.). User-provided :env bypasses the filter.
The test expected exactly {:unhealthy, :provisioning} but on fast runners
the adapter can resolve (and fail) before the assertion, landing in
:not_connected. Accept either state since both are valid unhealthy
states during startup without a real CLI.
Direct spawn_executable resolves faster than sh -c, making it more
likely that the adapter fails before the stream request is queued.
Accept both :stream_init_error and :stream_error in session_test.exs
and session_adapter_test.exs.
@ppsplus-bradh
Copy link
Copy Markdown
Contributor Author

CI Note: The test failure (health/1 during provisioning in port_integration_test.exs) is a pre-existing race condition that is already fixed in #42. This PR is dependent on #42 merging first.

Once #42 is merged, this branch will need to be rebased onto the updated main for CI to pass.

If #42 is rejected or closed, the fix for that test can be brought into this PR directly.

Copy link
Copy Markdown
Owner

@guess guess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - we can merge after getting in #42

ppsplus-bradh and others added 7 commits March 28, 2026 10:22
Add `allowed_env` option that accepts a list of environment variable
names to pass through from the system environment to the CLI, beyond
the built-in allowlist. Unlike `env` (key-value pairs), `allowed_env`
takes only keys — values are read from System.get_env() at spawn time.

This enables applications to forward specific env vars (e.g.
DATABASE_URL, custom config) without hardcoding values in the `env`
map, while still benefiting from the security filtering that excludes
RELEASE_*, SSH keys, and other sensitive process-level vars.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs `mix format` after every Write or Edit tool use on Elixir files,
ensuring code is always formatted before it reaches git.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete the env control surface with two new options:

- `filter_env` (boolean, default true) — when true, applies the
  built-in allowlist (ANTHROPIC_*, CLAUDE_*, PATH, HOME, etc.).
  When false, passes all system env vars through unfiltered.

- `disallowed_env` (list of strings) — keys to exclude from the
  CLI environment. Works in both filtered and unfiltered modes.

Combined with the existing `allowed_env` and `env` options, this
gives users full control over what reaches the CLI:

  # Filtered (default): built-in allowlist + extras
  filter_env: true, allowed_env: ["DATABASE_URL"]

  # Unfiltered: everything minus exclusions
  filter_env: false, disallowed_env: ["RELEASE_COOKIE", "SECRET_KEY"]

  # Explicit overrides always win regardless of mode
  env: %{"FORCE_THIS" => "value"}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ppsplus-bradh and others added 2 commits March 28, 2026 13:51
Remove filter_env, allowed_env, disallowed_env, and all filtering
infrastructure from this PR. These will be submitted as a separate
PR to keep the shell escaping refactor focused.

build_env now passes System.get_env() through unfiltered, matching
the pre-refactor behavior. The spawn_executable change is the sole
focus of this PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hronization

Replace 13 Process.sleep calls across 3 test files with deterministic
synchronization:

- session_test.exs: Use MockCLI.poll_until to wait for request map
  state changes instead of fixed 50-100ms sleeps
- claude_code_test.exs: Use Process.monitor + assert_receive {:DOWN}
  instead of sleeping after stop
- supervisor_test.exs: Use MockCLI.poll_until to wait for supervisor
  child restarts instead of fixed sleeps. Tighten meaningless
  ">= 0" assertion to "in [0, 1]"

Fixes #5
@ppsplus-bradh ppsplus-bradh force-pushed the fix/harden-timing-sensitive-tests branch from f055520 to 67cb465 Compare March 28, 2026 18:58
@guess
Copy link
Copy Markdown
Owner

guess commented Mar 28, 2026

@ppsplus-bradh can you rebase this please 🙏

@guess guess merged commit f086ebe into guess:main Mar 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants