Skip to content

Conversation

@maxisbey
Copy link
Contributor

Summary

Fixes a flaky test that was causing intermittent CI failures due to race conditions with time-based timeouts.

Problem

The test_notification_validation_error test was using a session-level 50ms timeout for all operations. In CI environments with high system load, even "fast" operations could occasionally take longer than 50ms, causing the test to fail unpredictably.

The test's purpose is to verify that the server remains responsive after a client request times out, not to test specific timing constraints.

Solution

Replaced the session-level timeout with per-request timeouts:

  • Fast operations that should succeed: Use read_timeout_seconds=None (no timeout). This makes them reliable in any environment regardless of system load or resource contention.

  • Slow operation that should timeout: Use read_timeout_seconds=timedelta(microseconds=1) (1 microsecond). This triggers the timeout immediately without adding execution time to the test.

Benefits

  • Eliminates time-based race conditions in CI
  • Test runs faster (no waiting for timeouts)
  • Maintains the test's purpose of verifying server resilience after timeouts
  • More reliable across different system loads and environments

Test Plan

  • Ran the test locally and verified it passes consistently
  • Tested with artificial delays to ensure the fix handles slow operations correctly

The test was using a session-level 50ms timeout which caused race
conditions in CI environments where operations could occasionally take
longer than expected due to system load.

The fix uses per-request timeouts instead:
- Fast operations use no timeout (None), making them reliable in any
  environment regardless of system load
- The slow operation that should timeout uses a minimal timeout (1
  microsecond) to trigger immediately without adding test execution time

This eliminates time-based race conditions while keeping the test
fast and maintaining its purpose of verifying that the server remains
responsive after a timeout occurs.
@maxisbey maxisbey added bug Something isn't working P3 Nice to haves, rare edge cases labels Oct 28, 2025
@maxisbey maxisbey enabled auto-merge (squash) October 28, 2025 18:18
Copy link
Member

@johnw188 johnw188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shipit

@maxisbey maxisbey merged commit c44e68f into main Oct 28, 2025
21 checks passed
@maxisbey maxisbey deleted the fix-flaky-timeout-test branch October 28, 2025 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working P3 Nice to haves, rare edge cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants