Agent Eval is a skill for evaluating agentic AI pipeline systems at both the component level and end-to-end level. It helps you define what to measure, build or sample eval cases, run repeatable tests, track regressions over time, and turn results into grounded takeaways about what improved, what regressed, and what to change next.
Manual
git clone https://github.com/fsilavong/agent-eval.git ~/.claude/skills/agent-evalInstall with the Vercel Skills CLI:
npx skills add fsilavong/agent-eval