Testing utilities for @noetaris/harness agents. Provides isolated step execution, route testing, and an Observer test double — all designed for unit and integration tests of individual steps without spinning up the full agent loop.
runStep— executes a single named step in isolation with a syntheticctx, capturing state updates, interrupts, and errors in a typed resultrunRoute— invokes a step'sroutefunction directly with a synthetic state snapshotMockObserver— records all harness lifecycle calls (onRunStart,onStepEnd,onEvent, …) for assertion
pnpm add -D @noetaris/harness-testingPeer dependency:
pnpm add @noetaris/harnessimport { runStep, MockObserver } from '@noetaris/harness-testing'
// test a step in isolation
const result = await runStep(harness, 'fetchData', {
slots: { llm: mockLlm },
state: { query: 'hello' },
})
expect(result.state).toEqual({ response: 'world' })
expect(result.interrupted).toBeNull()
expect(result.error).toBeNull()
// observe a full run
const obs = new MockObserver()
await agent.run({}, { observer: obs })
expect(obs.calls.onStepEnd).toHaveLength(3)
expect(obs.events('llm.response')).toHaveLength(1)runStep<S, Ctx>(
harness: Harness<Ctx, S, any, any>,
stepName: string,
options: RunStepOptions<S, Ctx>,
): Promise<StepResult<S>>Executes one step from the harness loop definition in isolation. Always resolves — interrupts and errors are captured in StepResult rather than thrown.
Throws StepNotFoundError if stepName doesn't exist in the loop, or NoRunFunctionError if the step has no run function (decision-only steps).
| Field | Type | Default | Description |
|---|---|---|---|
slots |
Ctx |
— | User-defined context slots passed as the step's ctx argument. |
state |
Partial<S> |
— | Initial state snapshot passed as the step's first argument. |
interruptResponses |
Record<string, unknown> |
— | Pre-loaded interrupt responses keyed by interrupt ID. When a key is present the interrupt is answered immediately (replay mode). When absent the interrupt is captured and execution stops (pause mode). |
sessionId |
string |
"test-session" |
Injected as ctx.sessionId. |
agentId |
string |
"test-agent" |
Injected as ctx.agentId. |
runId |
string |
crypto.randomUUID() |
Injected as ctx.runId. |
signal |
AbortSignal |
never-aborts signal | Injected as ctx.signal. |
Exactly one of state, interrupted, or error is non-null per call.
| Field | Type | Description |
|---|---|---|
state |
Partial<S> | null |
State update returned by the step. Null if interrupted or errored. |
interrupted |
{ interruptId: string; prompt: unknown } | null |
Populated when the step called ctx.interrupt() without a matching interruptResponses entry. |
error |
Error | null |
Error thrown by the step (non-interrupt errors only). |
events |
Array<{ name: string; payload: unknown }> |
All events emitted via ctx.emit() during the step, in call order. |
Happy path:
const result = await runStep(harness, 'classify', {
slots: { llm: mockLlm },
state: { input: 'hello world' },
})
expect(result.state?.label).toBe('greeting')
expect(result.events).toEqual([{ name: 'classify.done', payload: { label: 'greeting' } }])Interrupt — pause mode (no pre-loaded response):
const result = await runStep(harness, 'confirmAction', {
slots: { llm: mockLlm },
state: { action: 'delete' },
})
expect(result.interrupted).toEqual({
interruptId: 'confirm-delete',
prompt: { message: 'Are you sure?' },
})Interrupt — replay mode (pre-loaded response):
const result = await runStep(harness, 'confirmAction', {
slots: { llm: mockLlm },
state: { action: 'delete' },
interruptResponses: { 'confirm-delete': true },
})
expect(result.state?.confirmed).toBe(true)Error capture:
const result = await runStep(harness, 'riskyStep', {
slots: { llm: failingLlm },
state: {},
})
expect(result.error?.message).toMatch('upstream failed')runRoute<S, Ctx>(
harness: Harness<Ctx, S, any, any>,
stepName: string,
options: RunRouteOptions<S>,
): stringInvokes a step's route function synchronously with a synthetic state snapshot. Returns the next step name.
Throws StepNotFoundError if stepName doesn't exist, or NoRouteFunctionError if the step has no route function.
| Field | Type | Description |
|---|---|---|
state |
Partial<S> |
State snapshot passed to the route function. |
const next = runRoute(harness, 'decide', {
state: { score: 0.9 },
})
expect(next).toBe('approve')
const next2 = runRoute(harness, 'decide', {
state: { score: 0.1 },
})
expect(next2).toBe('reject')A test double implementing the harness Observer interface. Records every lifecycle call for assertion.
const obs = new MockObserver()
await agent.run({}, { observer: obs })
obs.calls.onRunStart // OnRunStartCall[]
obs.calls.onRunEnd // OnRunEndCall[]
obs.calls.onStepStart // OnStepStartCall[]
obs.calls.onStepEnd // OnStepEndCall[]
obs.calls.onStepError // OnStepErrorCall[]
obs.calls.onInterrupt // OnInterruptCall[]
obs.calls.onEvent // OnEventCall[]All arrays are always present (never null) even if no calls were made.
Filters onEvent records by name and returns the payload values in call order.
const payloads = obs.events('llm.response')
expect(payloads).toHaveLength(1)
expect(payloads[0]).toMatchObject({ model: 'claude-sonnet-4-6' })Clears all recorded calls. Useful when reusing the same observer across multiple agent.run() calls in one test.
await agent.run({}, { observer: obs })
obs.reset()
await agent.run({}, { observer: obs })
expect(obs.calls.onRunStart).toHaveLength(1) // only the second runThrown by runStep and runRoute when the step name does not exist in the loop definition.
Thrown by runStep when the named step has no run function. This happens for decision-only steps that only define a route function — test their routing logic with runRoute instead.
Thrown by runRoute when the named step has no route function.
MIT