AgentClaimGuard v0.4.3

This is an evaluation hardening release.

It adds a small deterministic evaluation suite to show which unsupported structured claim patterns AgentClaimGuard blocks and which valid structured patterns pass under evidence, tool-result, and policy contracts.

Added

Added examples/evaluation/cases.jsonl with deterministic evaluation cases for:
- numeric claim missing calculator result
- numeric claim with calculator result
- claim missing required evidence
- claim with invalid evidence_refs
- compliance claim missing standard/regulation evidence
- explicit conflict metadata
- RAG answer missing required citation
Added examples/evaluation/run_eval.py.
Added docs/evaluation.md covering scope, non-goals, run instructions, and case format.

Changed

Added a concise README Evaluation section linking to the eval runner and docs.
Bumped package and server versions to 0.4.3.

Validation

python -m compileall agentclaimguard
python -m pytest -q
python examples/evaluation/run_eval.py

Expected result:

33 passed
total_cases=7
passed_cases=7
failed_cases=0
blocked_count=6
passed_count=1

Scope

This release does not add mandatory LLM dependencies, LLM/NLI semantic support checking, public API breaks, package rename, factuality-verifier claims, or a new framework adapter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentClaimGuard v0.4.3 - Evaluation hardening

Choose a tag to compare

Sorry, something went wrong.