You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement a structured audit trail that captures every consequential agent decision (risk classification, review findings, approval/rejection, model used, thinking budget, cost) as machine-readable JSONL records with full provenance chains. This goes beyond the token-metrics JSONL (which tracks cost) to capture decision reasoning — enabling post-incident investigation ("why was this PR auto-approved?"), compliance reporting, and pattern analysis across the agent fleet.
Market Signal
Agent observability is shifting from cost tracking to decision provenance. Braintrust's continuous evaluation platform (2026) scores agent traces with decision metadata. OpenTelemetry GenAI conventions now include agent spans that capture reasoning chains. The six-layer agent testing guide (Atlan, 2026) identifies "audit trail completeness" as a distinct testing layer. With AI-authored code now subject to regulatory scrutiny in regulated industries, decision provenance is becoming a compliance requirement, not a nice-to-have.
User Signal
Issue #617 ("Agent-authored infra PRs are hard to merge — approval-identity + bot-review friction") reveals that agent decisions face trust challenges. When a PR is auto-approved and later causes an incident, the investigation requires: what model reviewed it, what risk level was assigned, what the deep review found, and why the audit tier passed it. Currently this information is scattered across GitHub Actions run logs (90-day retention, not queryable). The Token Cost Observatory proves the team values structured logging — this extends it from cost to decisions.
Technical Opportunity
review-one-pr.sh already emits structured outputs at each tier (triage classification, deep review findings, audit results). The infrastructure for JSONL emission exists in token-metrics.sh. Adding a parallel emit_decision_record function that captures (pr_number, tier, model, classification, findings_summary, action_taken, justification_hash) alongside the token records creates a queryable decision log. The existing artifact upload mechanism (used by token_report.sh for cross-repo collection) can distribute decision records to the org-wide Token Cost Observatory for federated analysis.
Assessment
Dimension
Score
Rationale
Feasibility
high
Extends existing token-metrics.sh JSONL infrastructure with a parallel record type
Impact
high
Enables post-incident investigation, trust-building, and compliance reporting
Urgency
med
Current 90-day Actions log retention is a known data loss risk for decision provenance
Adversarial Review
Strongest objection: PR review findings may contain sensitive code snippets or security vulnerability details. Logging them in a JSONL audit trail creates a secondary attack surface — if the audit trail leaks, it is a roadmap to unpatched vulnerabilities.
Rebuttal: The audit record captures decision metadata (classification, action, model), not raw findings. The findings_summary field uses a one-line categorization (e.g., "3 medium-severity findings in auth module") not the full finding text. The justification_hash is a SHA-256 of the full reasoning — sufficient to prove provenance without exposing content. This follows the same principle as git commit hashes: prove integrity without embedding the full content in the index. Sensitive content stays in the GitHub Actions log; the audit trail stores only the decision skeleton.
Suggested Next Step
Define the decision record schema (JSONL fields, retention policy, sensitivity classification). Add emit_decision_record to review-one-pr.sh at each tier transition point. Validate that the token_report.sh artifact collection pipeline can handle the additional record type.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Implement a structured audit trail that captures every consequential agent decision (risk classification, review findings, approval/rejection, model used, thinking budget, cost) as machine-readable JSONL records with full provenance chains. This goes beyond the token-metrics JSONL (which tracks cost) to capture decision reasoning — enabling post-incident investigation ("why was this PR auto-approved?"), compliance reporting, and pattern analysis across the agent fleet.
Market Signal
Agent observability is shifting from cost tracking to decision provenance. Braintrust's continuous evaluation platform (2026) scores agent traces with decision metadata. OpenTelemetry GenAI conventions now include agent spans that capture reasoning chains. The six-layer agent testing guide (Atlan, 2026) identifies "audit trail completeness" as a distinct testing layer. With AI-authored code now subject to regulatory scrutiny in regulated industries, decision provenance is becoming a compliance requirement, not a nice-to-have.
User Signal
Issue #617 ("Agent-authored infra PRs are hard to merge — approval-identity + bot-review friction") reveals that agent decisions face trust challenges. When a PR is auto-approved and later causes an incident, the investigation requires: what model reviewed it, what risk level was assigned, what the deep review found, and why the audit tier passed it. Currently this information is scattered across GitHub Actions run logs (90-day retention, not queryable). The Token Cost Observatory proves the team values structured logging — this extends it from cost to decisions.
Technical Opportunity
review-one-pr.sh already emits structured outputs at each tier (triage classification, deep review findings, audit results). The infrastructure for JSONL emission exists in token-metrics.sh. Adding a parallel
emit_decision_recordfunction that captures (pr_number,tier,model,classification,findings_summary,action_taken,justification_hash) alongside the token records creates a queryable decision log. The existing artifact upload mechanism (used by token_report.sh for cross-repo collection) can distribute decision records to the org-wide Token Cost Observatory for federated analysis.Assessment
Adversarial Review
Strongest objection: PR review findings may contain sensitive code snippets or security vulnerability details. Logging them in a JSONL audit trail creates a secondary attack surface — if the audit trail leaks, it is a roadmap to unpatched vulnerabilities.
Rebuttal: The audit record captures decision metadata (classification, action, model), not raw findings. The
findings_summaryfield uses a one-line categorization (e.g., "3 medium-severity findings in auth module") not the full finding text. Thejustification_hashis a SHA-256 of the full reasoning — sufficient to prove provenance without exposing content. This follows the same principle as git commit hashes: prove integrity without embedding the full content in the index. Sensitive content stays in the GitHub Actions log; the audit trail stores only the decision skeleton.Suggested Next Step
Define the decision record schema (JSONL fields, retention policy, sensitivity classification). Add
emit_decision_recordto review-one-pr.sh at each tier transition point. Validate that the token_report.sh artifact collection pipeline can handle the additional record type.Beta Was this translation helpful? Give feedback.
All reactions