Skip to content

AgentClaimGuard v0.4.3 - Evaluation hardening

Latest

Choose a tag to compare

@konoeph konoeph released this 09 Jun 08:03

AgentClaimGuard v0.4.3

This is an evaluation hardening release.

It adds a small deterministic evaluation suite to show which unsupported structured claim patterns AgentClaimGuard blocks and which valid structured patterns pass under evidence, tool-result, and policy contracts.

Added

  • Added examples/evaluation/cases.jsonl with deterministic evaluation cases for:
    • numeric claim missing calculator result
    • numeric claim with calculator result
    • claim missing required evidence
    • claim with invalid evidence_refs
    • compliance claim missing standard/regulation evidence
    • explicit conflict metadata
    • RAG answer missing required citation
  • Added examples/evaluation/run_eval.py.
  • Added docs/evaluation.md covering scope, non-goals, run instructions, and case format.

Changed

  • Added a concise README Evaluation section linking to the eval runner and docs.
  • Bumped package and server versions to 0.4.3.

Validation

python -m compileall agentclaimguard
python -m pytest -q
python examples/evaluation/run_eval.py

Expected result:

33 passed
total_cases=7
passed_cases=7
failed_cases=0
blocked_count=6
passed_count=1

Scope

This release does not add mandatory LLM dependencies, LLM/NLI semantic support checking, public API breaks, package rename, factuality-verifier claims, or a new framework adapter.