Conformance test harness, mock CLIs, and benchmark suite for Animus plugins (v0.4.x stdio JSON-RPC protocol).
This repo lets a plugin author run the same set of scenarios the official
Animus CI runs, locally, without any network, API keys, or a live LLM
account. Every scenario drives the plugin through the same handshake the real
daemon uses and validates the streaming notification + response shape against
animus-protocol.
Status: v0.3.0 — provider, subject, transport, and trigger plugin conformance suites; concurrent-cancel dispatcher; oai-style scenario variants.
| Crate | Description |
|---|---|
testkit-core |
Shared types: ScenarioFile, TestResult, MatrixReport. |
plugin-harness |
Bin animus-plugin-harness — runs scenarios against a plugin. |
plugin-bench |
Bin animus-plugin-bench — TTFT, throughput, end-to-end duration. |
provider-conformance |
Baseline scenarios for provider plugins (10 scenarios). |
subject-conformance |
Baseline scenarios for subject backend plugins (5 scenarios). |
transport-conformance |
Baseline scenarios for transport backend plugins (4 scenarios). |
trigger-conformance |
Baseline scenarios for trigger backend plugins (3 scenarios). |
mock-cli-claude / -codex / -gemini / |
Fake LLM CLIs that emit canonical streams for each scenario. |
mock-cli-opencode |
|
mock-cli-oai |
Mock OpenAI-compatible HTTP server for animus-provider-oai. |
# 1. Build the workspace (mocks + harness + bench).
cargo build --release --workspace
# 2. Build the plugin you want to test (here: animus-provider-claude).
cd ../animus-provider-claude
cargo build --release
cd -
# 3. Run conformance — provider (default), subject, transport, or trigger.
./target/release/animus-plugin-harness conformance \
--plugin ../animus-provider-claude/target/release/animus-provider-claude
./target/release/animus-plugin-harness conformance --kind subject \
--plugin ../animus-subject-default/target/release/animus-subject-default
./target/release/animus-plugin-harness conformance --kind transport \
--plugin ../animus-transport-http/target/release/animus-transport-http
./target/release/animus-plugin-harness conformance --kind trigger \
--plugin ../animus-trigger-webhook/target/release/animus-trigger-webhook
# 4. (Optional) save a machine-readable report.
./target/release/animus-plugin-harness conformance \
--plugin ../animus-provider-claude/target/release/animus-provider-claude \
--report-json ./report-claude.jsonThe harness injects CLAUDE_BIN, CODEX_BIN, GEMINI_BIN,
OPENCODE_BIN, and MOCK_SCENARIO into the plugin's environment before
spawning, so the provider plugin transparently uses our mock CLIs instead
of the real binaries on $PATH. No network, no API keys.
The 10 baseline scenarios live in scenarios/:
| Name | What it exercises |
|---|---|
streaming-short |
3-delta short completion, final aggregated text. |
streaming-medium |
~40 deltas — sanity check buffering. |
streaming-long |
~300 deltas — back-pressure, large output assembly. |
tool-call-single |
One tool_use + tool_result round-trip surrounded by output. |
tool-call-parallel |
Two parallel tool_use blocks resolved in one envelope. |
tool-call-single-oai |
Stateless OpenAI-style: ToolCall only, host owns tool execution. |
tool-call-parallel-oai |
Same as above, parallel. |
error-recovery |
Mid-stream garbled line that the provider parser must ignore. |
cancellation |
Concurrent dispatcher issues agent/cancel mid-flight (see below). |
resume-session |
agent/resume against a prior session id. |
Plugins that don't advertise the relevant $harness/* capability for a
scenario are SKIPPED (not failed). Plugins opt in by adding to their
initialize capabilities:
$harness/cancellation-loop-v2— opt-in to the v0.3.0 concurrent-cancel test$harness/oai-style— opt-in to the stateless OpenAI tool-call scenarios
The harness now spawns a side-task per scenario when cancel_after_ms is
set. It watches for the first notification to learn the session id, waits
the configured delay, then issues agent/cancel { session_id } via the
same stdio pipe. The plugin should terminate the run with
BackendError::Cancelled (REQUEST_CANCELLED, -32002) or emit a
non-recoverable error notification within the scenario timeout.
A fake-cancellable-plugin test fixture lives at
crates/plugin-harness/src/bin/fake_cancellable_plugin.rs and is
exercised by crates/plugin-harness/tests/cancellation.rs to verify the
wire dance end-to-end without depending on any real provider.
Each is a separate crate that exports pub fn baseline_scenarios() -> Vec<TestScenario> plus a pub async fn run_conformance(plugin_path: &Path) -> Result<MatrixReport>. External CI pipelines can depend on the
crate directly:
[dev-dependencies]
subject-conformance = { git = "https://github.com/launchapp-dev/animus-plugin-testkit", tag = "v0.3.0" }| Suite | Scenarios |
|---|---|
| Subject | handshake, advertise-kinds, subject-list, subject-crud-round-trip, subject-watch-stream |
| Transport | handshake, start-shutdown, schema-health, serve-and-accept |
| Trigger | handshake, watch-fires-event, event-payload-shape |
Trigger backends that need an external stimulus (a webhook POST, a Slack
message, a cron tick) will SKIP watch-fires-event and
event-payload-shape. The handshake still PASSes.
Captured 2026-05-24 against v0.3.0:
$ animus-plugin-harness conformance --kind subject \
--plugin animus-subject-default/target/release/animus-subject-default
==> conformance report: animus-subject-default v0.1.1
kind: subject_backend protocol: 1.0.0
[PASS] handshake 0ms
[PASS] advertise-kinds 0ms
[PASS] subject-list 8ms
[PASS] subject-crud-round-trip 8ms
[PASS] subject-watch-stream 8ms
summary: total 5 passed 5 failed 0 skipped 0
OVERALL: PASS
$ animus-plugin-harness conformance --kind transport \
--plugin animus-transport-http/target/release/animus-transport-http
==> conformance report: animus-transport-http v0.1.0
kind: transport_backend protocol: 1.0.0
[PASS] handshake 0ms
[PASS] start-shutdown 6ms
[PASS] schema-health 5ms
[PASS] serve-and-accept 83ms
summary: total 4 passed 4 failed 0 skipped 0
OVERALL: PASS
$ animus-plugin-harness conformance --kind trigger \
--plugin animus-trigger-webhook/target/release/animus-trigger-webhook
==> conformance report: animus-trigger-webhook v0.1.1
kind: trigger_backend protocol: 1.0.0
[PASS] handshake 0ms
[SKIP] watch-fires-event 0ms
[SKIP] event-payload-shape 0ms
summary: total 3 passed 1 failed 0 skipped 2
OVERALL: PASS
$ animus-plugin-harness conformance \
--plugin animus-provider-claude/target/release/animus-provider-claude
==> conformance report: animus-provider-claude v0.2.1
kind: provider protocol: 1.0.0
[SKIP] cancellation 34ms
[PASS] error-recovery 409ms
[PASS] resume-session 395ms
[PASS] streaming-long 402ms
[PASS] streaming-medium 391ms
[PASS] streaming-short 406ms
[PASS] tool-call-parallel 399ms
[SKIP] tool-call-parallel-oai 34ms
[PASS] tool-call-single 374ms
[SKIP] tool-call-single-oai 33ms
summary: total 10 passed 7 failed 0 skipped 3
OVERALL: PASS
The 3 SKIPs are expected: animus-provider-claude does not advertise the
opt-in capabilities $harness/cancellation-loop-v2 or
$harness/oai-style. Provider plugins that want those tests to run
should advertise those capabilities in their initialize capabilities
list once their backend supports the relevant semantics.
Drop a YAML file into scenarios/ (or your own directory and pass
--scenarios <dir>):
name: my-scenario
description: ...
timeout_ms: 8000
method: run # or `resume`
requires_capabilities: ["agent/resume"] # optional gate
cancel_after_ms: 100 # optional — triggers the concurrent-cancel dispatcher
request:
prompt: "hello"
model: claude-sonnet-4-6
expected_notifications:
- kind: output
contains: "Hello"
expected_response:
output_contains: "Hello"
exit_code: 0
mock:
tool: claude
mock_scenario: streaming-shortSee docs/writing-scenarios.md for the full
matcher surface.
./target/release/animus-plugin-bench \
--plugin ../animus-provider-claude/target/release/animus-provider-claude \
--iterations 10 \
--mock-scenario streaming-mediumReports TTFT (time-to-first-token), end-to-end duration, and notification
throughput. There's also a criterion micro-bench for the scenario loader
(cargo bench -p plugin-bench).
See docs/ci-integration.md. The shipped
provider-matrix.yml workflow runs the
harness against every published launchapp-dev/animus-provider-* repo every
Monday morning.
- Trigger conformance is shallow without an external stimulus. The
webhookbackend (and any backend that needs an inbound HTTP POST, Slack event, cron tick, ...) will SKIPwatch-fires-eventandevent-payload-shape. A future revision could spin up a stimulus injector per backend kind. - Subject CRUD requires create+get to share state. The harness keeps a single plugin process alive across the round-trip so backends with in-process state (the default task store) can be exercised. Backends that persist to a global path may surface cross-test contamination.
- Provider plugins must advertise opt-in capabilities for the new
cancellation + oai-style tests to run. Plugins that don't are SKIPPED
(not failed). Update each plugin's
initializecapabilities to opt in. - The harness depends only on published
animus-protocolcrates — it intentionally has no dependency on the in-treeanimus-cli.
This release is pinned to animus-protocol v0.1.9. Protocol bumps in the
patch range remain compatible; minor bumps may require harness changes.
MIT — see LICENSE.