Skip to content

feat: enforce agent capabilities in propose()#8

Merged
renasami merged 1 commit intomainfrom
feat/agent-capabilities-check
Apr 16, 2026
Merged

feat: enforce agent capabilities in propose()#8
renasami merged 1 commit intomainfrom
feat/agent-capabilities-check

Conversation

@renasami
Copy link
Copy Markdown
Owner

@renasami renasami commented Apr 14, 2026

Added agent capability checking in propose(). Registered agents are checked for capability match and maxApprovableRisk before the global threshold check. Unregistered proposers skip these checks for zero-config compatibility. 5 new test cases added.

Summary by CodeRabbit

  • Tests
    • Added test coverage validating that global escalation thresholds can override individual agent risk approval limits, ensuring proper escalation behavior in edge cases.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9efc1888-d52f-4a18-ac41-3c776a1e2476

📥 Commits

Reviewing files that changed from the base of the PR and between f0b92f0 and 9b938ae.

📒 Files selected for processing (1)
  • src/core/runtime.test.ts

📝 Walkthrough

Walkthrough

A test case was added to verify escalation behavior when a global escalateAbove threshold exceeds an agent's maxApprovableRisk limit, ensuring the global threshold takes precedence even when an agent is capable of approving the action.

Changes

Cohort / File(s) Summary
Escalation Threshold Test
src/core/runtime.test.ts
Added Vitest case verifying that actions exceeding the global escalateAbove threshold are escalated regardless of agent maxApprovableRisk approval capability.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Poem

🐰 A brave little test hops into the ring,
Checking if thresholds make escalations sing,
Global limits reign supreme, it's true,
Even agents with high approval won't do! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: enforce agent capabilities in propose()' directly and clearly describes the main change: adding agent capability enforcement to the propose() method.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-capabilities-check

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 033c670a93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/core/runtime.ts
// Agent capability check (only if proposer is registered)
const agent = this.agents.get(proposal.proposer);
if (agent !== undefined) {
if (!matchesCapability(agent.capabilities, proposal.action.type)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard capability checks against invalid agent shapes

propose() now calls matchesCapability(agent.capabilities, ...) for registered proposers, but registerAgent() does not enforce runtime shape. In JavaScript usage, registering an agent with capabilities: null (or non-string entries) causes propose() to throw (.some/.split type errors) instead of returning a decision, so a malformed registration can crash authorization flow. Please validate/coerce capabilities before calling the matcher.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

別タスクで対応する。registerAgent() のバリデーション強化として一括対応予定。capabilities が配列かつ各要素が string であることを検証する。

Comment thread src/core/runtime.ts
const agentRiskScore = proposal.action.riskScore;
if (
agentRiskScore !== undefined &&
agentRiskScore > agent.maxApprovableRisk
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-finite maxApprovableRisk before risk comparison

The per-agent risk gate assumes agent.maxApprovableRisk is a valid number, but for non-typed callers it can be undefined/NaN/string and make agentRiskScore > agent.maxApprovableRisk evaluate false. That silently disables the new agent-specific escalation check and can auto-approve proposals that should be escalated by agent policy. Add runtime validation for maxApprovableRisk at registration (or before comparison).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

別タスクで対応する。上記と同じく registerAgent() のバリデーション強化として一括対応。

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces agent capability and risk score validation within the propose method, ensuring that registered agents are escalated if they lack necessary permissions or exceed their risk thresholds. New test cases verify these checks, including wildcard matching and handling of unregistered agents. Feedback suggests refactoring the escalation logic in src/core/runtime.ts to reduce code duplication when recording audit logs and returning decisions.

Comment thread src/core/runtime.ts
Comment on lines +169 to +215
if (!matchesCapability(agent.capabilities, proposal.action.type)) {
const capDecision: Decision = {
proposalId,
proposal,
status: "escalated",
tier: resolved.tier,
reviewers: [...resolved.reviewers],
reason: "proposer lacks capability for this action type",
timestamp,
};

this.audit.record({
proposalId,
eventType: "escalated",
proposal,
decision: capDecision,
timestamp: new Date().toISOString(),
});

return capDecision;
}

const agentRiskScore = proposal.action.riskScore;
if (
agentRiskScore !== undefined &&
agentRiskScore > agent.maxApprovableRisk
) {
const riskDecision: Decision = {
proposalId,
proposal,
status: "escalated",
tier: resolved.tier,
reviewers: [...resolved.reviewers],
reason: "riskScore exceeds agent's maxApprovableRisk",
timestamp,
};

this.audit.record({
proposalId,
eventType: "escalated",
proposal,
decision: riskDecision,
timestamp: new Date().toISOString(),
});

return riskDecision;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling agent capability and risk score escalation is duplicated across two blocks. This can be refactored to improve maintainability and reduce redundancy by determining the escalation reason first and then performing the audit recording and return in a single block.

      let escalationReason: string | undefined;
      if (!matchesCapability(agent.capabilities, proposal.action.type)) {
        escalationReason = "proposer lacks capability for this action type";
      } else if (
        proposal.action.riskScore !== undefined &&
        proposal.action.riskScore > agent.maxApprovableRisk
      ) {
        escalationReason = "riskScore exceeds agent's maxApprovableRisk";
      }

      if (escalationReason) {
        const agentDecision: Decision = {
          proposalId,
          proposal,
          status: "escalated",
          tier: resolved.tier,
          reviewers: [...resolved.reviewers],
          reason: escalationReason,
          timestamp,
        };

        this.audit.record({
          proposalId,
          eventType: "escalated",
          proposal,
          decision: agentDecision,
          timestamp: new Date().toISOString(),
        });

        return agentDecision;
      }

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

対応しない。capDecision と riskDecision は現状では構造が似ているが、将来的に異なる情報(具体的な capability 名、超過量等)を含める可能性がある。~50 行の重複は許容範囲で、ヘルパー化するとコードの流れが追いにくくなる。

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/core/runtime.ts (1)

166-216: Consider a fail-closed option for unknown proposers.

This preserves zero-config compatibility, but a typoed or spoofed proposal.proposer still skips both new checks and falls back to the legacy policy/threshold path. An opt-in requireRegisteredProposer mode, or at least a distinct unknown-agent escalation/audit path, would make capability enforcement much safer to roll out.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/runtime.ts` around lines 166 - 216, Add a fail-closed path for
unknown proposers by honoring a config flag (e.g., requireRegisteredProposer) on
the runtime: if this.agents.get(proposal.proposer) returns undefined and
this.requireRegisteredProposer is true, construct an escalated Decision (use
same shape as capDecision/riskDecision, include proposalId, proposal,
status:"escalated", tier:resolved.tier, reviewers:[...resolved.reviewers],
reason:"unknown proposer" and timestamp) call this.audit.record with
eventType:"escalated" and return that decision; otherwise preserve the existing
legacy policy/threshold behavior. Reference this.agents, proposal.proposer,
matchesCapability, Decision, and this.audit.record when locating where to add
the check.
src/core/runtime.test.ts (1)

379-467: Add a registered-agent case that still trips the global threshold.

These tests prove the new early escalations, but they never show that a registered agent still hits escalateAbove after capability and maxApprovableRisk both pass. A bug that short-circuits registered agents to approval would still satisfy this block.

Suggested test
+  it("registered agent still respects global escalateAbove after agent checks pass", async () => {
+    const tegata = new Tegata({ escalateAbove: 70 });
+    const registered = tegata.registerAgent({
+      id: "ci-bot",
+      name: "CI Bot",
+      role: "proposer",
+      capabilities: ["ci:*:deploy"],
+      maxApprovableRisk: 90,
+    });
+    expect(registered.ok).toBe(true);
+
+    const decision = await tegata.propose({
+      proposer: "ci-bot",
+      action: { type: "ci:production:deploy", riskScore: 80 },
+    });
+
+    expect(decision.status).toBe("escalated");
+    expect(decision.reason).toContain("threshold");
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/runtime.test.ts` around lines 379 - 467, Add a test that verifies a
registered agent who has matching capability and whose maxApprovableRisk allows
the action still gets escalated when the action.riskScore exceeds the global
escalateAbove threshold: create a Tegata, register an agent (e.g., id "ci-bot")
with a capability that matches the action type and maxApprovableRisk above the
action's riskScore check, then call Tegata.propose with an action whose
riskScore is greater than the global escalateAbove (the current tests imply 70)
and assert decision.status is "escalated" and decision.reason contains the
global-threshold message; place this alongside the other it(...) tests
referencing Tegata.registerAgent and Tegata.propose.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/core/runtime.test.ts`:
- Around line 379-467: Add a test that verifies a registered agent who has
matching capability and whose maxApprovableRisk allows the action still gets
escalated when the action.riskScore exceeds the global escalateAbove threshold:
create a Tegata, register an agent (e.g., id "ci-bot") with a capability that
matches the action type and maxApprovableRisk above the action's riskScore
check, then call Tegata.propose with an action whose riskScore is greater than
the global escalateAbove (the current tests imply 70) and assert decision.status
is "escalated" and decision.reason contains the global-threshold message; place
this alongside the other it(...) tests referencing Tegata.registerAgent and
Tegata.propose.

In `@src/core/runtime.ts`:
- Around line 166-216: Add a fail-closed path for unknown proposers by honoring
a config flag (e.g., requireRegisteredProposer) on the runtime: if
this.agents.get(proposal.proposer) returns undefined and
this.requireRegisteredProposer is true, construct an escalated Decision (use
same shape as capDecision/riskDecision, include proposalId, proposal,
status:"escalated", tier:resolved.tier, reviewers:[...resolved.reviewers],
reason:"unknown proposer" and timestamp) call this.audit.record with
eventType:"escalated" and return that decision; otherwise preserve the existing
legacy policy/threshold behavior. Reference this.agents, proposal.proposer,
matchesCapability, Decision, and this.audit.record when locating where to add
the check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bc9c326-63a0-47c4-b7ff-32b895b7b4ca

📥 Commits

Reviewing files that changed from the base of the PR and between d1cf1c4 and 033c670.

📒 Files selected for processing (2)
  • src/core/runtime.test.ts
  • src/core/runtime.ts

@renasami
Copy link
Copy Markdown
Owner Author

CodeRabbit nitpick への回答:

1. requireRegisteredProposer の fail-closed option — 対応しない(今は)。良いアイデアだが v0.2 scope。現状の zero-config 要件(new Tegata() + 未登録 proposer で動く)と矛盾するため、opt-in config として v0.2 で検討する。

2. registered agent が global threshold で引っかかるテスト追加 — 対応する。正当なテストギャップ。agent checks(capability + maxApprovableRisk)通過後に global escalateAbove で escalate されるケースが未検証。この PR に追加する。

renasami added a commit that referenced this pull request Apr 14, 2026
Registered agent that passes capability + maxApprovableRisk checks
should still be escalated when riskScore exceeds global escalateAbove.
Addresses CodeRabbit review feedback on PR #8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Registered agent that passes capability + maxApprovableRisk checks
should still be escalated when riskScore exceeds global escalateAbove.
Addresses CodeRabbit review feedback on PR #8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@renasami renasami force-pushed the feat/agent-capabilities-check branch from f0b92f0 to 9b938ae Compare April 16, 2026 16:25
@renasami renasami merged commit d51a9d4 into main Apr 16, 2026
1 of 2 checks passed
renasami added a commit that referenced this pull request Apr 16, 2026
* test: add global threshold test for registered agents

Registered agent that passes capability + maxApprovableRisk checks
should still be escalated when riskScore exceeds global escalateAbove.
Addresses CodeRabbit review feedback on PR #8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: implement review/approve handler flow

PolicyRule is now a discriminated union (AutoPolicy | ReviewPolicy |
ApprovePolicy). Review and approve tiers require a handler at the type
level — no runtime validation needed. Handlers return ReviewResult
(approved | denied); escalation and timeout are Tegata's responsibility.

- Add ReviewResult, ReviewHandler, ApprovalHandler types
- Split PolicyRule into discriminated union keyed on tier
- Add Decision.decidedBy for audit attribution
- Extend ResolvedPolicy with handler/timeoutMs
- Switch-based resolvePolicy for exhaustive case handling
- clonePolicyRule helper (structuredClone can't clone functions)
- executeHandler with Promise.race timeout + .then(ok, err)
- propose() review/approve: pending → handler → decided/timed_out/escalated
- ADR-004: design decisions and PR strategy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden handler execution for LLM/agent safety

- Wrap handler call in Promise.resolve().then() to catch sync throws
- Clone proposal before passing to handler (prevent mutation)
- Validate handler result shape at trust boundary (status, decidedBy)
- Sanitize handler error messages in Decision.reason (no leaking)
- Add tests: sync throw, proposal mutation, invalid shape, empty
  decidedBy, error message sanitization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant