Skip to content

Goal Policy and Risk Levels

nick3 edited this page May 28, 2026 · 1 revision

Goal Policy and Risk Levels

Every goal declares what it's allowed to do. The policy gate intercepts every tool call during the goal's lifetime and either allows it (≤ ceiling), denies it (denylist), or prompts the user (exceeds ceiling).

Source: src/main/goal-policy.ts.


The risk ladder

read_only  →  write_local  →  network_get  →  network_write  →  spends_money

A goal with risk ceiling X allows all tools whose declared risk is ≤ X. Tools strictly above X prompt the user via the approval modal before executing.

Tier What it allows
read_only Only inspect state. No mutations anywhere.
write_local Run shell commands, edit files, type in browser inputs. No HTTP form submissions.
network_get Navigate browsers to non-localhost URLs, hover.
network_write Submit forms, set cookies, POST, drag-and-drop, file upload (still prompts for the file pick), JS execute.
spends_money Checkout pages, payment forms, banking. Use sparingly.

Built-in per-tool risk table

BUILTIN_PERMISSIONS in src/main/goal-policy.ts. Excerpt:

Risk Tools
read_only declare_step, verify_step, list_panes, capture_screenshot, read_terminal_output, poll_terminal_status, wait_for_output, get_fleet_status, browser_get_content, browser_get_axtree, browser_query, browser_query_all, browser_screenshot, browser_screenshot_full_page, browser_screenshot_annotated, browser_get_action_log, browser_get_cookies, browser_verify_visual_state, browser_describe_screen, browser_scroll, browser_wait_for_selector, browser_wait_for_navigation, browser_wait_for_text
write_local focus_pane, maximize_pane, create_workspace, restart_terminal, write_to_terminal, set_agent_role, assign_task, complete_task, fail_task, wait_for_agent, share_context, create_goal, browser_save_pdf, browser_save_html, convert_pane_to_browser, convert_pane_to_terminal
network_get browser_navigate, browser_back, browser_forward, browser_reload, browser_hover
network_write browser_click, browser_smart_click, browser_click_at, browser_type, browser_keypress, browser_select_option, browser_check, browser_set_files, browser_set_cookie, browser_execute_js, browser_drag, browser_run_recipe
(unknown / plugin) Defaults to network_write to be safe

Unknown tools (custom plugins that don't register permissions) default to network_write — they prompt for anything beyond network_write goals. To override, your plugin can call getPermissions and pre-register a mapping (currently in source code; plugin-friendly API is roadmap).


Policy fields

interface GoalPolicy {
  risk: GoalRisk             // ceiling — required
  allowedTools?: string[]    // only these are allowed (more restrictive than risk)
  deniedTools?: string[]     // always blocked, regardless of risk
  sandboxDir?: string        // file-touching tools may only operate inside this dir
}

allowedTools (allowlist)

When set, only tools in the list are allowed. Useful for super-narrow goals: "this goal can only call browser_navigate and browser_get_content."

deniedTools (denylist)

Always blocked, regardless of risk. Useful for risk-management: "never call browser_execute_js even though my goal is network_write."

sandboxDir

For tools that take a filesystem path (browser_save_pdf, browser_save_html, browser_set_files), the path argument is checked against this prefix. Paths outside prompt the user.

The path check normalizes trailing slashes; it's a literal prefix match (not a full canonicalization, so symlinks could escape). Treat it as a hint, not a hard sandbox.


Evaluation

evaluate(toolName, permissions, policy): { allow, reason?, needsApproval? }

Logic order (in goal-policy.ts:70-100):

  1. Denylist wins absolutely. allow: false, no approval.
  2. Allowlist — if set, tool must be in it. Otherwise allow: false, needsApproval: true (prompt).
  3. Risk ceiling — if tool's risk > policy.risk: allow: false, needsApproval: true (prompt).
  4. Sandbox path check — if any path argument is outside sandboxDir: allow: false, needsApproval: true (prompt).
  5. Otherwise: allow: true.

The dispatcher (AIManager.executeTool) consults the verdict:

  • allow: true → run the tool
  • allow: false, needsApproval: false → return error to the model ("denied by policy")
  • allow: false, needsApproval: true → show the #approval-modal to the user; user choice determines outcome

Approval modal

When a tool exceeds policy and needsApproval is true, the user gets a modal showing:

  • Tool name
  • Arguments (truncated)
  • Why it's outside policy ("requires network_write but goal allows up to write_local")
  • Three buttons: Allow once / Allow for this goal / Deny

Allow once — runs this single call. Next time, prompts again. Allow for this goal — adds the tool to the running goal's allowedTools for the remainder of the goal. Subsequent calls of the same tool don't prompt. Deny — blocks the call. Model gets a "denied" tool result and can adjust.

60-second timeout on the modal; no response = deny.


Defaults are conservative

By default, unknown tools are network_write. So a goal at risk write_local will prompt for any browser action, any plugin tool, etc. This is intentional — false-positives (extra prompts) are better than false-negatives (silent risky actions).

When you author a custom tool that's truly read-only, register its permissions explicitly so it doesn't prompt unnecessarily. Until the plugin API is extended to expose this, edit BUILTIN_PERMISSIONS in source.


Global bypass

Settings → Global bypass mode (or status bar chip) suppresses every approval gate. Useful for automation rigs where there's no human to click "Allow."

⚠ With bypass on, the policy gate still evaluates but never prompts — anything that would have prompted just runs. The risk ceiling and denylist still apply (those are hard blocks, not prompts).

Don't leave bypass on accidentally.


See also

Clone this wiki locally