AgentClaimGuard v0.4.3
This is an evaluation hardening release.
It adds a small deterministic evaluation suite to show which unsupported structured claim patterns AgentClaimGuard blocks and which valid structured patterns pass under evidence, tool-result, and policy contracts.
Added
- Added
examples/evaluation/cases.jsonlwith deterministic evaluation cases for:- numeric claim missing calculator result
- numeric claim with calculator result
- claim missing required evidence
- claim with invalid
evidence_refs - compliance claim missing standard/regulation evidence
- explicit conflict metadata
- RAG answer missing required citation
- Added
examples/evaluation/run_eval.py. - Added
docs/evaluation.mdcovering scope, non-goals, run instructions, and case format.
Changed
- Added a concise README Evaluation section linking to the eval runner and docs.
- Bumped package and server versions to
0.4.3.
Validation
python -m compileall agentclaimguard
python -m pytest -q
python examples/evaluation/run_eval.pyExpected result:
33 passed
total_cases=7
passed_cases=7
failed_cases=0
blocked_count=6
passed_count=1
Scope
This release does not add mandatory LLM dependencies, LLM/NLI semantic support checking, public API breaks, package rename, factuality-verifier claims, or a new framework adapter.