Skip to content
This repository was archived by the owner on Apr 14, 2026. It is now read-only.
This repository was archived by the owner on Apr 14, 2026. It is now read-only.

test(agent): formalize integration test ladder with fixture-driven architecture #397

@runyaga

Description

@runyaga

Summary

The current integration tests (M4-M7) are working but have grown organically. We need a formal, progressive integration test architecture inspired by dart_monty's integration ladder pattern — fixture-driven, tiered, and extensible for scripting (0008) and future features.

Current State

  • 29 integration tests across 2 files, all passing
  • run_orchestrator_integration_test.dart — 9 tests (M4/M5/M6)
  • m7_room_integration_test.dart — 20 tests (19 groups)
  • Helpers duplicated between files
  • No fixture files — all test data inline
  • Server gets overwhelmed under heavy concurrent load (thread deletion warnings)
  • Test 10 has a latent unicode dash bug (en-dash vs ASCII hyphen)

Proposal: Integration Ladder

Adopt dart_monty's tier-based fixture pattern with progressive complexity.

Tier Structure

Tier Layer Focus Example Tests
tier_01_lifecycle L0 Basic room lifecycle Idle -> Running -> Completed, error rooms, 404
tier_02_tools L1 Tool yielding and resume Single tool, multi-tool, tool failure recovery
tier_03_conversation L0+ Multi-turn history Accumulation, context depth, thread reuse
tier_04_runtime L2 AgentRuntime patterns spawn, waitAll, waitAny, cancel, introspection
tier_05_pipelines L2+ Multi-agent orchestration Fan-out/fan-in, write-review-revise, cascading
tier_06_advanced L2++ Complex compositions Debate, consensus, MapReduce, speculative exec
tier_07_scripting L3 Monty bridge integration HostFunctionWiring, MontyToolExecutor, script rooms

Fixture Format

JSON fixtures per tier, matching dart_monty's pattern:

{
  "id": 1,
  "tier": 1,
  "name": "echo room basic lifecycle",
  "room": "echo",
  "prompt": "Say hello",
  "expectedState": "CompletedState",
  "responseContains": null,
  "tools": null,
  "toolResponses": null,
  "turns": 1,
  "concurrency": 1,
  "xfail": null
}

Shared Test Infrastructure

Extract duplicated helpers into a shared module:

packages/soliplex_agent/test/
  integration/
    fixtures/
      tier_01_lifecycle.json
      tier_02_tools.json
      tier_03_conversation.json
      tier_04_runtime.json
      tier_05_pipelines.json
      tier_06_advanced.json
      tier_07_scripting.json
    helpers/
      integration_harness.dart    # Shared setup/teardown, HTTP clients
      state_waiters.dart          # _waitForTerminalState, _waitForYieldOrTerminal
      fixture_runner.dart         # registerLadderTests() equivalent
      assertions.dart             # Response matchers (unicode-safe)
    tier_01_lifecycle_test.dart
    tier_02_tools_test.dart
    tier_03_conversation_test.dart
    tier_04_runtime_test.dart
    tier_05_pipelines_test.dart
    tier_06_advanced_test.dart
    tier_07_scripting_test.dart

Key Improvements

  1. Fixture-driven tests — JSON fixtures enable:

    • Easy addition of new test cases without code changes
    • Parity comparison across environments (native vs WASM)
    • xfail markers for known issues (like WASM concurrency limits)
  2. Shared harness — Single IntegrationHarness class:

    • Creates/disposes HTTP clients, API, AgUiClient
    • Manages backend health checks before test runs
    • Handles thread cleanup with bounded retries (not infinite)
  3. Unicode-safe assertions — Normalize dashes/quotes before string comparison

  4. Sequential tier execution — Run tiers in order to avoid server overload:

    • Tier 1-3: Sequential (single session tests)
    • Tier 4-6: Sequential between tiers, parallel within where appropriate
    • Server health check between tiers
  5. Scripting tier (tier_07) — New tests for 0008-soliplex-scripting:

    • HostFunctionWiring binds correctly
    • MontyToolExecutor dispatches tool calls through bridge
    • ScriptingToolRegistryResolver resolves tools from script context
    • Script room runs Python code and returns results
    • Script room with external function calls (yield/resume through Monty bridge)
    • Error propagation from Python runtime

Known Issues to Address

  • Thread deletion retry loop is unbounded — cap at 3 retries with backoff
  • Test 10 unicode dash comparison (en-dash vs ASCII hyphen)
  • Server instability under heavy concurrent load (tests 12+ in sequence)
  • Helpers duplicated between two test files
  • No dart_test.yaml configuration for integration test timeouts

Migration Path

  1. Extract shared helpers from existing test files
  2. Create fixture JSON files from existing inline test data
  3. Build fixture_runner.dart (registerLadderTests equivalent)
  4. Migrate existing 29 tests to tier files
  5. Add tier_07 scripting tests
  6. Verify all 29+ tests still pass
  7. Remove old test files

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions