Skip to content

noetaris-lab/harness-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@noetaris/harness-testing

Testing utilities for @noetaris/harness agents. Provides isolated step execution, route testing, and an Observer test double — all designed for unit and integration tests of individual steps without spinning up the full agent loop.

Overview

  • runStep — executes a single named step in isolation with a synthetic ctx, capturing state updates, interrupts, and errors in a typed result
  • runRoute — invokes a step's route function directly with a synthetic state snapshot
  • MockObserver — records all harness lifecycle calls (onRunStart, onStepEnd, onEvent, …) for assertion

Installation

pnpm add -D @noetaris/harness-testing

Peer dependency:

pnpm add @noetaris/harness

Quick Start

import { runStep, MockObserver } from '@noetaris/harness-testing'

// test a step in isolation
const result = await runStep(harness, 'fetchData', {
  slots: { llm: mockLlm },
  state: { query: 'hello' },
})

expect(result.state).toEqual({ response: 'world' })
expect(result.interrupted).toBeNull()
expect(result.error).toBeNull()

// observe a full run
const obs = new MockObserver()
await agent.run({}, { observer: obs })
expect(obs.calls.onStepEnd).toHaveLength(3)
expect(obs.events('llm.response')).toHaveLength(1)

API Reference

runStep

runStep<S, Ctx>(
  harness: Harness<Ctx, S, any, any>,
  stepName: string,
  options: RunStepOptions<S, Ctx>,
): Promise<StepResult<S>>

Executes one step from the harness loop definition in isolation. Always resolves — interrupts and errors are captured in StepResult rather than thrown.

Throws StepNotFoundError if stepName doesn't exist in the loop, or NoRunFunctionError if the step has no run function (decision-only steps).

RunStepOptions<S, Ctx>

Field Type Default Description
slots Ctx User-defined context slots passed as the step's ctx argument.
state Partial<S> Initial state snapshot passed as the step's first argument.
interruptResponses Record<string, unknown> Pre-loaded interrupt responses keyed by interrupt ID. When a key is present the interrupt is answered immediately (replay mode). When absent the interrupt is captured and execution stops (pause mode).
sessionId string "test-session" Injected as ctx.sessionId.
agentId string "test-agent" Injected as ctx.agentId.
runId string crypto.randomUUID() Injected as ctx.runId.
signal AbortSignal never-aborts signal Injected as ctx.signal.

StepResult<S>

Exactly one of state, interrupted, or error is non-null per call.

Field Type Description
state Partial<S> | null State update returned by the step. Null if interrupted or errored.
interrupted { interruptId: string; prompt: unknown } | null Populated when the step called ctx.interrupt() without a matching interruptResponses entry.
error Error | null Error thrown by the step (non-interrupt errors only).
events Array<{ name: string; payload: unknown }> All events emitted via ctx.emit() during the step, in call order.

Examples

Happy path:

const result = await runStep(harness, 'classify', {
  slots: { llm: mockLlm },
  state: { input: 'hello world' },
})
expect(result.state?.label).toBe('greeting')
expect(result.events).toEqual([{ name: 'classify.done', payload: { label: 'greeting' } }])

Interrupt — pause mode (no pre-loaded response):

const result = await runStep(harness, 'confirmAction', {
  slots: { llm: mockLlm },
  state: { action: 'delete' },
})
expect(result.interrupted).toEqual({
  interruptId: 'confirm-delete',
  prompt: { message: 'Are you sure?' },
})

Interrupt — replay mode (pre-loaded response):

const result = await runStep(harness, 'confirmAction', {
  slots: { llm: mockLlm },
  state: { action: 'delete' },
  interruptResponses: { 'confirm-delete': true },
})
expect(result.state?.confirmed).toBe(true)

Error capture:

const result = await runStep(harness, 'riskyStep', {
  slots: { llm: failingLlm },
  state: {},
})
expect(result.error?.message).toMatch('upstream failed')

runRoute

runRoute<S, Ctx>(
  harness: Harness<Ctx, S, any, any>,
  stepName: string,
  options: RunRouteOptions<S>,
): string

Invokes a step's route function synchronously with a synthetic state snapshot. Returns the next step name.

Throws StepNotFoundError if stepName doesn't exist, or NoRouteFunctionError if the step has no route function.

RunRouteOptions<S>

Field Type Description
state Partial<S> State snapshot passed to the route function.

Example

const next = runRoute(harness, 'decide', {
  state: { score: 0.9 },
})
expect(next).toBe('approve')

const next2 = runRoute(harness, 'decide', {
  state: { score: 0.1 },
})
expect(next2).toBe('reject')

MockObserver

A test double implementing the harness Observer interface. Records every lifecycle call for assertion.

const obs = new MockObserver()
await agent.run({}, { observer: obs })

obs.calls.onRunStart   // OnRunStartCall[]
obs.calls.onRunEnd     // OnRunEndCall[]
obs.calls.onStepStart  // OnStepStartCall[]
obs.calls.onStepEnd    // OnStepEndCall[]
obs.calls.onStepError  // OnStepErrorCall[]
obs.calls.onInterrupt  // OnInterruptCall[]
obs.calls.onEvent      // OnEventCall[]

All arrays are always present (never null) even if no calls were made.

obs.events(name): unknown[]

Filters onEvent records by name and returns the payload values in call order.

const payloads = obs.events('llm.response')
expect(payloads).toHaveLength(1)
expect(payloads[0]).toMatchObject({ model: 'claude-sonnet-4-6' })

obs.reset(): void

Clears all recorded calls. Useful when reusing the same observer across multiple agent.run() calls in one test.

await agent.run({}, { observer: obs })
obs.reset()
await agent.run({}, { observer: obs })
expect(obs.calls.onRunStart).toHaveLength(1) // only the second run

Error Classes

StepNotFoundError

Thrown by runStep and runRoute when the step name does not exist in the loop definition.

NoRunFunctionError

Thrown by runStep when the named step has no run function. This happens for decision-only steps that only define a route function — test their routing logic with runRoute instead.

NoRouteFunctionError

Thrown by runRoute when the named step has no route function.

License

MIT

About

Testing utilities for @noetaris/harness agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors