ACP-Evals v1.0.0
Production-grade evaluation framework for ACP agents
What's New
This is the initial release of ACP-Evals, built specifically for the BeeAI community. The framework provides three core evaluators for testing AI agents with real LLM-powered assessment.
Core Features
- AccuracyEval: LLM-powered semantic evaluation of response quality
- PerformanceEval: Latency and resource efficiency tracking
- ReliabilityEval: Consistency and tool usage validation
Key Capabilities
- Complete transparency with full LLM judge reasoning
- Professional CLI interface with rich terminal output
- No text truncation - see complete agent responses
- Support for ACP agents, Python functions, and BeeAI integration
- Multiple evaluation rubrics (factual, research_quality, code_quality)
- CI/CD ready with JSON export and standard exit codes
Getting Started
pip install acp-evals
acp-evals checkDocumentation
Note on History
This release represents a fresh start for the project with a clean, focused codebase. Previous development history is preserved in the archive/pre-v1 branch.
Built with ❤️ for the BeeAI community.
Full Changelog: https://github.com/jbarnes850/acp-evals/commits/v1.0.0