@hyperdx/hdx-eval@0.2.0

github-actions released this 30 Jun 21:22

@hyperdx/hdx-eval@0.2.0

e2103f7

Minor Changes

5bd1c68: feat: add AI eval framework for benchmarking MCP servers

New @hyperdx/hdx-eval package for benchmarking AI agents against
observability MCP servers. Generates deterministic synthetic telemetry
with planted anomalies, spawns Claude Code as an SRE agent, records full
trajectories, and grades answers using programmatic checks and an
LLM-as-judge.

Includes 5 scenarios (error-root-cause, latency-spike, noisy-signals,
segmented-regression, service-health-check), MCP-agnostic N-way
comparison, blinded judging, and a web viewer for browsing results.

Patch Changes

6a80031: Support multi-model comparison in eval batches via comma-separated --model flag
1a64796: Removing relative imports and using path aliases

Assets 2