feat: gate MCP tool calls through EvalOps approvals#44
Conversation
PR SummaryMedium Risk Overview Adds a new Extends the EvalOps consumer SDK to support Reviewed by Cursor Bugbot for commit 7254668. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.
Reviewed by Cursor Bugbot for commit 7254668. Configure here.
| }) | ||
| const result = decisionResult(response, input.requestId, input.riskLevel) | ||
| if (result.decision || normalizeState(result.state) === 'resolved') return result | ||
| } |
There was a problem hiding this comment.
Polling loop ignores offline results, blocks then denies
High Severity
The waitForDecision polling loop exit condition (result.decision || normalizeState(result.state) === 'resolved') never matches an offline fallback result from decisionResult, because allowOffline sets neither decision nor state. When EvalOps goes offline mid-polling, the loop silently discards the allowed: true offline result, polls for the full 2-minute timeout, and then returns allowed: false — contradicting the fail-open offline policy used everywhere else. The loop needs to also check result.offline to break out and honor the offline fallback.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 7254668. Configure here.
|
|
||
| if (/\b(delete|remove|destroy|drop|truncate|wipe|erase|format|revoke|kill|shutdown|terminate)\b/u.test(haystack)) { | ||
| return { level: 'RISK_LEVEL_HIGH', reason: 'destructive tool name or schema' } | ||
| } |
There was a problem hiding this comment.
Regex keyword "format" misclassifies tools with JSON Schema annotations
Medium Severity
inferMCPToolRisk stringifies the tool's inputSchema into the haystack, then tests it against the HIGH risk regex which includes the word format. Since "format" is a standard JSON Schema keyword used ubiquitously for date-time, email, URI, and other annotations, any MCP tool with such schema fields will be misclassified as RISK_LEVEL_HIGH ("destructive"). This will cause routine read-only tools to trigger the strictest approval path.
Reviewed by Cursor Bugbot for commit 7254668. Configure here.


Summary
RequestApprovalandGetApprovalThis is the first backend slice for #17. It adds the approvals enforcement point and telemetry, while leaving the dedicated Electron approval UI/modal as follow-up work.
Validation
npm run buildgit diff --checknpm run contextkit:build && npm testbuilt ContextKit and passed the ContextKit/meeting tests, then hit the existing local audio duration assertion intests/audio-data-size.test.mjs(Duration > 3s, got0.0s) despite non-empty WAV output.