Idea
Add a new policy condition type that warns or blocks when AI activity touches designated sensitive files (e.g. auth.rs, encryption.rs, signing.rs). The goal is to give teams a governance guardrail: "AI should not be modifying cryptographic or authentication code without explicit review."
Two possible approaches
Option A — Tool-call based (ForbiddenToolCall)
Evaluate against session tool events: if Write or Edit was called with a file_path matching a sensitive glob pattern, fail the policy.
{
"type": "ForbiddenToolCall",
"tool_names": ["Write", "Edit"],
"when_files_match": ["**/encryption.rs", "**/auth.rs", "**/signing.rs"]
}
Pros: Fast to build, data is already captured in events.jsonl via tool_input.file_path.
Cons — significant:
- False positives: agent wrote to the file during the session but the developer reverted it before committing. The push is blocked even though no AI code landed.
- False negatives: agent modified the file via
Bash (freeform shell command) — not captured as a structured file_path.
- Evaluates intent to modify not what was committed. Framing matters: this can be presented as "agent attempted to touch sensitive files" but it is not a reliable committed-code gate.
Option B — Attribution-based (SensitivePath on committed diff)
Evaluate against the commit diff + attribution data: if committed lines in the push are AI-attributed and touch a file matching the pattern, fail the policy.
{
"type": "SensitivePath",
"patterns": ["**/encryption.rs", "**/auth.rs"],
"action": "warn"
}
Pros: Correct answer — only fires when AI-written code actually landed in the commit.
Cons:
- Requires the attribution pipeline to have run (server-side clone + tree-sitter line attribution). Not always available.
- More complex evaluation path — needs to join commit diff with attribution data rather than just inspecting session tool calls.
- Attribution confidence scores add ambiguity: what threshold counts as "AI-written"?
Recommendation
Option B is the right long-term answer. Option A could be shipped as a stepping stone with clear UI copy that sets expectations ("flags sessions where the agent attempted to modify these files").
Before building either, worth deciding:
- Should this block push or warn only? (Blocking with false positives from Option A would be very disruptive.)
- What's the attribution confidence threshold for Option B?
- Should
Bash calls be scanned for path patterns in their input text? (Partial mitigation for Option A false negatives, but brittle.)
- Human-written changes to sensitive files should never trigger this — how do we ensure that? (Option B handles it naturally; Option A does not.)
Related
- Existing
ConditionalToolCall condition — requires a tool was called on matching files (opposite direction)
- Attribution engine in
tracevault-core/src/diff.rs and policy_eval.rs
Idea
Add a new policy condition type that warns or blocks when AI activity touches designated sensitive files (e.g.
auth.rs,encryption.rs,signing.rs). The goal is to give teams a governance guardrail: "AI should not be modifying cryptographic or authentication code without explicit review."Two possible approaches
Option A — Tool-call based (
ForbiddenToolCall)Evaluate against session tool events: if
WriteorEditwas called with afile_pathmatching a sensitive glob pattern, fail the policy.{ "type": "ForbiddenToolCall", "tool_names": ["Write", "Edit"], "when_files_match": ["**/encryption.rs", "**/auth.rs", "**/signing.rs"] }Pros: Fast to build, data is already captured in
events.jsonlviatool_input.file_path.Cons — significant:
Bash(freeform shell command) — not captured as a structuredfile_path.Option B — Attribution-based (
SensitivePathon committed diff)Evaluate against the commit diff + attribution data: if committed lines in the push are AI-attributed and touch a file matching the pattern, fail the policy.
{ "type": "SensitivePath", "patterns": ["**/encryption.rs", "**/auth.rs"], "action": "warn" }Pros: Correct answer — only fires when AI-written code actually landed in the commit.
Cons:
Recommendation
Option B is the right long-term answer. Option A could be shipped as a stepping stone with clear UI copy that sets expectations ("flags sessions where the agent attempted to modify these files").
Before building either, worth deciding:
Bashcalls be scanned for path patterns in their input text? (Partial mitigation for Option A false negatives, but brittle.)Related
ConditionalToolCallcondition — requires a tool was called on matching files (opposite direction)tracevault-core/src/diff.rsandpolicy_eval.rs