Skip to content

feat: gate MCP tool calls through EvalOps approvals#44

Merged
haasonsaas merged 1 commit intomainfrom
codex/evalops-mcp-approvals
Apr 21, 2026
Merged

feat: gate MCP tool calls through EvalOps approvals#44
haasonsaas merged 1 commit intomainfrom
codex/evalops-mcp-approvals

Conversation

@haasonsaas
Copy link
Copy Markdown
Collaborator

Summary

  • add an EvalOps approvals client path for RequestApproval and GetApproval
  • gate MCP tool execution through EvalOps before calling the local MCP server manager
  • annotate existing wide events with approval risk, decision, request id, and offline fallback state

This is the first backend slice for #17. It adds the approvals enforcement point and telemetry, while leaving the dedicated Electron approval UI/modal as follow-up work.

Validation

  • npm run build
  • git diff --check
  • Earlier branch validation: npm run contextkit:build && npm test built ContextKit and passed the ContextKit/meeting tests, then hit the existing local audio duration assertion in tests/audio-data-size.test.mjs (Duration > 3s, got 0.0s) despite non-empty WAV output.

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 21, 2026

PR Summary

Medium Risk
Changes the MCP tool execution path to synchronously request/poll EvalOps approvals before running tools, which can block or deny actions and introduces new network/offline behaviors that could affect reliability and latency.

Overview
MCP tool execution is now gated by EvalOps approvals: before MCPServerManager.callTool, the executor requests an approval, optionally polls GetApproval for a decision, and denies execution when not approved.

Adds a new evalops/approvals module that infers a tool risk level from tool metadata, encodes tool-call payload/context for EvalOps, supports configurable wait/poll intervals, and falls back to allow when approvals are offline/errors occur.

Extends the EvalOps consumer SDK to support RequestApproval and GetApproval, and annotates WideEvent telemetry with approval risk/offline/decision/request-id fields. Also adds MCPServerManager.getTool to fetch tool metadata for approval context.

Reviewed by Cursor Bugbot for commit 7254668. Bugbot is set up for automated code reviews on this repo. Configure here.

@haasonsaas haasonsaas merged commit 631e18b into main Apr 21, 2026
5 checks passed
@haasonsaas haasonsaas deleted the codex/evalops-mcp-approvals branch April 21, 2026 00:22
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

Reviewed by Cursor Bugbot for commit 7254668. Configure here.

})
const result = decisionResult(response, input.requestId, input.riskLevel)
if (result.decision || normalizeState(result.state) === 'resolved') return result
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polling loop ignores offline results, blocks then denies

High Severity

The waitForDecision polling loop exit condition (result.decision || normalizeState(result.state) === 'resolved') never matches an offline fallback result from decisionResult, because allowOffline sets neither decision nor state. When EvalOps goes offline mid-polling, the loop silently discards the allowed: true offline result, polls for the full 2-minute timeout, and then returns allowed: false — contradicting the fail-open offline policy used everywhere else. The loop needs to also check result.offline to break out and honor the offline fallback.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7254668. Configure here.


if (/\b(delete|remove|destroy|drop|truncate|wipe|erase|format|revoke|kill|shutdown|terminate)\b/u.test(haystack)) {
return { level: 'RISK_LEVEL_HIGH', reason: 'destructive tool name or schema' }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex keyword "format" misclassifies tools with JSON Schema annotations

Medium Severity

inferMCPToolRisk stringifies the tool's inputSchema into the haystack, then tests it against the HIGH risk regex which includes the word format. Since "format" is a standard JSON Schema keyword used ubiquitously for date-time, email, URI, and other annotations, any MCP tool with such schema fields will be misclassified as RISK_LEVEL_HIGH ("destructive"). This will cause routine read-only tools to trigger the strictest approval path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7254668. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant