Skip to content

feat(core): GovernanceCallbackHandler for tool execution authorization#35529

Open
Devon Generally (devongenerally-png) wants to merge 2 commits intolangchain-ai:masterfrom
devongenerally-png:governance-callback-handler
Open

feat(core): GovernanceCallbackHandler for tool execution authorization#35529
Devon Generally (devongenerally-png) wants to merge 2 commits intolangchain-ai:masterfrom
devongenerally-png:governance-callback-handler

Conversation

@devongenerally-png

Summary

Adds GovernanceCallbackHandler — a callback handler that enforces deterministic governance policies on tool calls using the existing on_tool_start / raise_error mechanism. No core changes required.

What it does

Implements a three-phase authorization pipeline for tool calls:

  • PROPOSE: Converts each on_tool_start invocation into a structured intent object with a SHA-256 content hash
  • DECIDE: Evaluates the intent against user-defined policy rules — pure function, no LLM involvement, no interpretation ambiguity
  • PROMOTE: Allows approved calls to proceed normally; raises ToolExecutionDenied for denied calls (propagated via raise_error=True)

Policy format

policy = {
    "default": "deny",  # fail-closed
    "rules": [
        {"tools": ["search", "wikipedia"], "verdict": "approve"},
        {"tools": ["shell"], "verdict": "deny"},
        {
            "tools": ["python_repl"],
            "verdict": "approve",
            "constraints": {"blocked_patterns": ["os.system", "subprocess"]},
        },
    ],
}
handler = GovernanceCallbackHandler(policy=policy, witness_path="witness.jsonl")
agent.invoke(inputs, config={"callbacks": [handler]})

Witness logging

Optional hash-chained audit trail (witness_path parameter). Each entry links to the previous via SHA-256, making tampering detectable. Includes verify_witness_log() utility for independent chain verification.

Changes

  • libs/core/langchain_core/callbacks/governance.py — handler implementation
  • libs/core/tests/unit_tests/callbacks/test_governance.py — 24 unit tests

Design decisions

  • raise_error = True by default — the handler must be able to block tool execution. This uses the existing handle_event exception propagation, requiring zero changes to the callback manager.
  • Fail-closed default — when no rule matches a tool, the default verdict is deny. Unknown tools require explicit policy approval.
  • Static methods for propose/decide — both phases are pure functions, independently testable without handler instantiation.

Test plan

  • 24 unit tests covering: intent hashing, policy evaluation, constraint matching, exception propagation, witness chain integrity, tamper detection
  • Verify no existing tests regress

For a more complete standalone implementation with YAML policy files and adversarial test coverage, see Governance-Guard.

This PR was developed with AI assistance.

Adds a callback handler that enforces deterministic governance policies
on tool calls via the PROPOSE/DECIDE/PROMOTE pattern:

- PROPOSE: Converts tool calls into structured intents with SHA-256 hash
- DECIDE: Evaluates intents against user-defined policy rules (no LLM)
- PROMOTE: Allows approved calls, raises ToolExecutionDenied for denied

Includes hash-chained witness logging for audit trails and 24 unit tests
covering policy evaluation, constraint checking, chain integrity, and
tamper detection.

Uses raise_error=True (default) so denied tools propagate as exceptions
via the existing handle_event mechanism — no core changes required.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added external feature For PRs that implement a new feature; NOT A FEATURE REQUEST core `langchain-core` package issues & PRs and removed feature For PRs that implement a new feature; NOT A FEATURE REQUEST external labels Mar 3, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 3, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 13 untouched benchmarks
⏩ 23 skipped benchmarks1


Comparing devongenerally-png:governance-callback-handler (cd78744) with master (cdf140e)

Open in CodSpeed

Footnotes

  1. 23 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

WitnessLog previously reset prev_hash to genesis on every init, breaking
the hash chain when appending to an existing log file. Now reads the
last entry's hash from the file on startup.

Adds test verifying chain integrity across handler restarts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core `langchain-core` package issues & PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant