feat: enforce agent capabilities in propose() by renasami · Pull Request #8 · renasami/tegata

renasami · 2026-04-14T16:57:27Z

Added agent capability checking in propose(). Registered agents are checked for capability match and maxApprovableRisk before the global threshold check. Unregistered proposers skip these checks for zero-config compatibility. 5 new test cases added.

Summary by CodeRabbit

Tests
- Added test coverage validating that global escalation thresholds can override individual agent risk approval limits, ensuring proper escalation behavior in edge cases.

coderabbitai · 2026-04-14T16:57:41Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9efc1888-d52f-4a18-ac41-3c776a1e2476

📥 Commits

Reviewing files that changed from the base of the PR and between f0b92f0 and 9b938ae.

📒 Files selected for processing (1)

src/core/runtime.test.ts

📝 Walkthrough

Walkthrough

A test case was added to verify escalation behavior when a global escalateAbove threshold exceeds an agent's maxApprovableRisk limit, ensuring the global threshold takes precedence even when an agent is capable of approving the action.

Changes

Cohort / File(s)	Summary
Escalation Threshold Test `src/core/runtime.test.ts`	Added Vitest case verifying that actions exceeding the global `escalateAbove` threshold are escalated regardless of agent `maxApprovableRisk` approval capability.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

feat: enforce agent capabilities in propose() #8: Tests global escalation threshold behavior when an agent's risk capacity is higher than the action's risk score, directly validating the tested scenario.
fix: record audit events for validation failures #7: Modifies the propose() function to enforce per-agent risk checks followed by global threshold logic, implementing the exact control flow this test validates.

Poem

🐰 A brave little test hops into the ring,
Checking if thresholds make escalations sing,
Global limits reign supreme, it's true,
Even agents with high approval won't do! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: enforce agent capabilities in propose()' directly and clearly describes the main change: adding agent capability enforcement to the propose() method.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-capabilities-check

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 033c670a93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T17:01:09Z

+    // Agent capability check (only if proposer is registered)
+    const agent = this.agents.get(proposal.proposer);
+    if (agent !== undefined) {
+      if (!matchesCapability(agent.capabilities, proposal.action.type)) {


Guard capability checks against invalid agent shapes

propose() now calls matchesCapability(agent.capabilities, ...) for registered proposers, but registerAgent() does not enforce runtime shape. In JavaScript usage, registering an agent with capabilities: null (or non-string entries) causes propose() to throw (.some/.split type errors) instead of returning a decision, so a malformed registration can crash authorization flow. Please validate/coerce capabilities before calling the matcher.

Useful? React with 👍 / 👎.

別タスクで対応する。registerAgent() のバリデーション強化として一括対応予定。capabilities が配列かつ各要素が string であることを検証する。

chatgpt-codex-connector · 2026-04-14T17:01:09Z

+      const agentRiskScore = proposal.action.riskScore;
+      if (
+        agentRiskScore !== undefined &&
+        agentRiskScore > agent.maxApprovableRisk


Reject non-finite maxApprovableRisk before risk comparison

The per-agent risk gate assumes agent.maxApprovableRisk is a valid number, but for non-typed callers it can be undefined/NaN/string and make agentRiskScore > agent.maxApprovableRisk evaluate false. That silently disables the new agent-specific escalation check and can auto-approve proposals that should be escalated by agent policy. Add runtime validation for maxApprovableRisk at registration (or before comparison).

Useful? React with 👍 / 👎.

別タスクで対応する。上記と同じく registerAgent() のバリデーション強化として一括対応。

gemini-code-assist

Code Review

This pull request introduces agent capability and risk score validation within the propose method, ensuring that registered agents are escalated if they lack necessary permissions or exceed their risk thresholds. New test cases verify these checks, including wildcard matching and handling of unregistered agents. Feedback suggests refactoring the escalation logic in src/core/runtime.ts to reduce code duplication when recording audit logs and returning decisions.

gemini-code-assist · 2026-04-14T17:02:17Z

+      if (!matchesCapability(agent.capabilities, proposal.action.type)) {
+        const capDecision: Decision = {
+          proposalId,
+          proposal,
+          status: "escalated",
+          tier: resolved.tier,
+          reviewers: [...resolved.reviewers],
+          reason: "proposer lacks capability for this action type",
+          timestamp,
+        };
+
+        this.audit.record({
+          proposalId,
+          eventType: "escalated",
+          proposal,
+          decision: capDecision,
+          timestamp: new Date().toISOString(),
+        });
+
+        return capDecision;
+      }
+
+      const agentRiskScore = proposal.action.riskScore;
+      if (
+        agentRiskScore !== undefined &&
+        agentRiskScore > agent.maxApprovableRisk
+      ) {
+        const riskDecision: Decision = {
+          proposalId,
+          proposal,
+          status: "escalated",
+          tier: resolved.tier,
+          reviewers: [...resolved.reviewers],
+          reason: "riskScore exceeds agent's maxApprovableRisk",
+          timestamp,
+        };
+
+        this.audit.record({
+          proposalId,
+          eventType: "escalated",
+          proposal,
+          decision: riskDecision,
+          timestamp: new Date().toISOString(),
+        });
+
+        return riskDecision;
+      }


The logic for handling agent capability and risk score escalation is duplicated across two blocks. This can be refactored to improve maintainability and reduce redundancy by determining the escalation reason first and then performing the audit recording and return in a single block.

let escalationReason: string | undefined; if (!matchesCapability(agent.capabilities, proposal.action.type)) { escalationReason = "proposer lacks capability for this action type"; } else if ( proposal.action.riskScore !== undefined && proposal.action.riskScore > agent.maxApprovableRisk ) { escalationReason = "riskScore exceeds agent's maxApprovableRisk"; } if (escalationReason) { const agentDecision: Decision = { proposalId, proposal, status: "escalated", tier: resolved.tier, reviewers: [...resolved.reviewers], reason: escalationReason, timestamp, }; this.audit.record({ proposalId, eventType: "escalated", proposal, decision: agentDecision, timestamp: new Date().toISOString(), }); return agentDecision; }

対応しない。capDecision と riskDecision は現状では構造が似ているが、将来的に異なる情報（具体的な capability 名、超過量等）を含める可能性がある。~50 行の重複は許容範囲で、ヘルパー化するとコードの流れが追いにくくなる。

coderabbitai

🧹 Nitpick comments (2)

src/core/runtime.ts (1)

166-216: Consider a fail-closed option for unknown proposers.

This preserves zero-config compatibility, but a typoed or spoofed proposal.proposer still skips both new checks and falls back to the legacy policy/threshold path. An opt-in requireRegisteredProposer mode, or at least a distinct unknown-agent escalation/audit path, would make capability enforcement much safer to roll out.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/runtime.ts` around lines 166 - 216, Add a fail-closed path for
unknown proposers by honoring a config flag (e.g., requireRegisteredProposer) on
the runtime: if this.agents.get(proposal.proposer) returns undefined and
this.requireRegisteredProposer is true, construct an escalated Decision (use
same shape as capDecision/riskDecision, include proposalId, proposal,
status:"escalated", tier:resolved.tier, reviewers:[...resolved.reviewers],
reason:"unknown proposer" and timestamp) call this.audit.record with
eventType:"escalated" and return that decision; otherwise preserve the existing
legacy policy/threshold behavior. Reference this.agents, proposal.proposer,
matchesCapability, Decision, and this.audit.record when locating where to add
the check.

src/core/runtime.test.ts (1)

379-467: Add a registered-agent case that still trips the global threshold.

These tests prove the new early escalations, but they never show that a registered agent still hits escalateAbove after capability and maxApprovableRisk both pass. A bug that short-circuits registered agents to approval would still satisfy this block.

Suggested test

+  it("registered agent still respects global escalateAbove after agent checks pass", async () => {
+    const tegata = new Tegata({ escalateAbove: 70 });
+    const registered = tegata.registerAgent({
+      id: "ci-bot",
+      name: "CI Bot",
+      role: "proposer",
+      capabilities: ["ci:*:deploy"],
+      maxApprovableRisk: 90,
+    });
+    expect(registered.ok).toBe(true);
+
+    const decision = await tegata.propose({
+      proposer: "ci-bot",
+      action: { type: "ci:production:deploy", riskScore: 80 },
+    });
+
+    expect(decision.status).toBe("escalated");
+    expect(decision.reason).toContain("threshold");
+  });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/runtime.test.ts` around lines 379 - 467, Add a test that verifies a
registered agent who has matching capability and whose maxApprovableRisk allows
the action still gets escalated when the action.riskScore exceeds the global
escalateAbove threshold: create a Tegata, register an agent (e.g., id "ci-bot")
with a capability that matches the action type and maxApprovableRisk above the
action's riskScore check, then call Tegata.propose with an action whose
riskScore is greater than the global escalateAbove (the current tests imply 70)
and assert decision.status is "escalated" and decision.reason contains the
global-threshold message; place this alongside the other it(...) tests
referencing Tegata.registerAgent and Tegata.propose.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/core/runtime.test.ts`:
- Around line 379-467: Add a test that verifies a registered agent who has
matching capability and whose maxApprovableRisk allows the action still gets
escalated when the action.riskScore exceeds the global escalateAbove threshold:
create a Tegata, register an agent (e.g., id "ci-bot") with a capability that
matches the action type and maxApprovableRisk above the action's riskScore
check, then call Tegata.propose with an action whose riskScore is greater than
the global escalateAbove (the current tests imply 70) and assert decision.status
is "escalated" and decision.reason contains the global-threshold message; place
this alongside the other it(...) tests referencing Tegata.registerAgent and
Tegata.propose.

In `@src/core/runtime.ts`:
- Around line 166-216: Add a fail-closed path for unknown proposers by honoring
a config flag (e.g., requireRegisteredProposer) on the runtime: if
this.agents.get(proposal.proposer) returns undefined and
this.requireRegisteredProposer is true, construct an escalated Decision (use
same shape as capDecision/riskDecision, include proposalId, proposal,
status:"escalated", tier:resolved.tier, reviewers:[...resolved.reviewers],
reason:"unknown proposer" and timestamp) call this.audit.record with
eventType:"escalated" and return that decision; otherwise preserve the existing
legacy policy/threshold behavior. Reference this.agents, proposal.proposer,
matchesCapability, Decision, and this.audit.record when locating where to add
the check.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bc9c326-63a0-47c4-b7ff-32b895b7b4ca

📥 Commits

Reviewing files that changed from the base of the PR and between d1cf1c4 and 033c670.

📒 Files selected for processing (2)

src/core/runtime.test.ts
src/core/runtime.ts

renasami · 2026-04-14T17:34:07Z

CodeRabbit nitpick への回答:

1. requireRegisteredProposer の fail-closed option — 対応しない（今は）。良いアイデアだが v0.2 scope。現状の zero-config 要件（new Tegata() + 未登録 proposer で動く）と矛盾するため、opt-in config として v0.2 で検討する。

2. registered agent が global threshold で引っかかるテスト追加 — 対応する。正当なテストギャップ。agent checks（capability + maxApprovableRisk）通過後に global escalateAbove で escalate されるケースが未検証。この PR に追加する。

Registered agent that passes capability + maxApprovableRisk checks should still be escalated when riskScore exceeds global escalateAbove. Addresses CodeRabbit review feedback on PR #8. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add global threshold test for registered agents Registered agent that passes capability + maxApprovableRisk checks should still be escalated when riskScore exceeds global escalateAbove. Addresses CodeRabbit review feedback on PR #8. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: implement review/approve handler flow PolicyRule is now a discriminated union (AutoPolicy | ReviewPolicy | ApprovePolicy). Review and approve tiers require a handler at the type level — no runtime validation needed. Handlers return ReviewResult (approved | denied); escalation and timeout are Tegata's responsibility. - Add ReviewResult, ReviewHandler, ApprovalHandler types - Split PolicyRule into discriminated union keyed on tier - Add Decision.decidedBy for audit attribution - Extend ResolvedPolicy with handler/timeoutMs - Switch-based resolvePolicy for exhaustive case handling - clonePolicyRule helper (structuredClone can't clone functions) - executeHandler with Promise.race timeout + .then(ok, err) - propose() review/approve: pending → handler → decided/timed_out/escalated - ADR-004: design decisions and PR strategy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden handler execution for LLM/agent safety - Wrap handler call in Promise.resolve().then() to catch sync throws - Clone proposal before passing to handler (prevent mutation) - Validate handler result shape at trust boundary (status, decidedBy) - Sanitize handler error messages in Decision.reason (no leaking) - Add tests: sync throw, proposal mutation, invalid shape, empty decidedBy, error message sanitization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 14, 2026

View reviewed changes

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

renasami force-pushed the feat/agent-capabilities-check branch from f0b92f0 to 9b938ae Compare April 16, 2026 16:25

renasami merged commit d51a9d4 into main Apr 16, 2026
1 of 2 checks passed

Conversation

renasami commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

renasami Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

renasami Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

renasami Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

renasami commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

renasami commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading