Commit dea5600
feat(emergent): add EmergentJudge — LLM-as-judge for forged tool evaluation
Three evaluation modes scaled to risk level:
- reviewCreation(): full code audit + test validation via single LLM call
- validateReuse(): pure programmatic schema conformance (zero LLM calls)
- reviewPromotion(): dual-judge panel (safety + correctness), both must approve
Includes 26 passing tests covering all approval, rejection, malformed JSON,
schema validation, and multi-judge scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 3271b13 commit dea5600
3 files changed
Lines changed: 1142 additions & 0 deletions
0 commit comments