Skip to content

[BUG] Flaky test: test_agent_concurrent_structured_output_raises_exception on macOS #1492

@strands-agent

Description

@strands-agent

Description

The test test_agent_concurrent_structured_output_raises_exception in tests/strands/agent/test_agent.py is flaky on macOS Python 3.13. It intermittently passes or fails depending on timing conditions.

Observed Behavior

The test expects:

  • Thread 1 to acquire the lock and hold it
  • Thread 2 (started after 50ms delay) to hit ConcurrencyException
  • Result: 1 success, 1 error

Actual behavior (intermittent):

  • Both threads complete successfully
  • Result: 2 successes, 0 errors
FAILED tests/strands/agent/test_agent.py::test_agent_concurrent_structured_output_raises_exception - AssertionError: Expected 1 success, got 2
assert 2 == 1

Root Cause Analysis

The test uses time.sleep(0.05) (50ms) to delay Thread 2, but SlowMockedModel.stream() uses asyncio.sleep(0.15) (150ms). On faster machines (especially macOS with Python 3.13), the timing can result in:

  1. Thread 1: start → acquire lock → wait 150ms → complete → release lock (~155ms total)
  2. Thread 2: starts at 50ms → by ~155ms lock is released → acquires lock successfully

This is a race condition where Thread 2 can acquire the lock after Thread 1 releases it, rather than hitting the concurrency exception.

Suggested Fix

Use explicit synchronization instead of relying on timing:

import threading

lock_acquired = threading.Event()

class SlowMockedModelWithSignal(MockedModelProvider):
    async def stream(self, ...):
        lock_acquired.set()  # Signal that lock was acquired
        await asyncio.sleep(0.15)
        async for event in super().stream(...):
            yield event

def test_agent_concurrent_structured_output_raises_exception(...):
    # ... setup ...
    
    t1.start()
    lock_acquired.wait(timeout=1.0)  # Wait for t1 to actually acquire lock
    t2.start()  # Now t2 will definitely hit the lock
    
    # ... rest of test ...

Alternatively, increase the sleep duration in SlowMockedModel.stream() significantly (e.g., 500ms) to ensure overlap.

Environment

Related

Labels

bug, flaky-test, good first issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions