Skip to content

@hyperdx/hdx-eval@0.2.0

Choose a tag to compare

@github-actions github-actions released this 30 Jun 21:22
e2103f7

Minor Changes

  • 5bd1c68: feat: add AI eval framework for benchmarking MCP servers

    New @hyperdx/hdx-eval package for benchmarking AI agents against
    observability MCP servers. Generates deterministic synthetic telemetry
    with planted anomalies, spawns Claude Code as an SRE agent, records full
    trajectories, and grades answers using programmatic checks and an
    LLM-as-judge.

    Includes 5 scenarios (error-root-cause, latency-spike, noisy-signals,
    segmented-regression, service-health-check), MCP-agnostic N-way
    comparison, blinded judging, and a web viewer for browsing results.

Patch Changes

  • 6a80031: Support multi-model comparison in eval batches via comma-separated --model flag
  • 1a64796: Removing relative imports and using path aliases