Skip to content

ACP-Evals v1.0.0 - Built for BeeAI

Latest

Choose a tag to compare

@jbarnes850 jbarnes850 released this 22 Jun 15:47
· 7 commits to main since this release

ACP-Evals v1.0.0

Production-grade evaluation framework for ACP agents

What's New

This is the initial release of ACP-Evals, built specifically for the BeeAI community. The framework provides three core evaluators for testing AI agents with real LLM-powered assessment.

Core Features

  • AccuracyEval: LLM-powered semantic evaluation of response quality
  • PerformanceEval: Latency and resource efficiency tracking
  • ReliabilityEval: Consistency and tool usage validation

Key Capabilities

  • Complete transparency with full LLM judge reasoning
  • Professional CLI interface with rich terminal output
  • No text truncation - see complete agent responses
  • Support for ACP agents, Python functions, and BeeAI integration
  • Multiple evaluation rubrics (factual, research_quality, code_quality)
  • CI/CD ready with JSON export and standard exit codes

Getting Started

pip install acp-evals
acp-evals check

Documentation

Note on History

This release represents a fresh start for the project with a clean, focused codebase. Previous development history is preserved in the archive/pre-v1 branch.

Built with ❤️ for the BeeAI community.

Full Changelog: https://github.com/jbarnes850/acp-evals/commits/v1.0.0