Skip to content

Test terminal command risk assessment #313992

@chrmarti

Description

@chrmarti

Refs: #313991

Complexity: 3

Create Issue


Tests the LLM-generated risk badge in the terminal tool confirmation
dialog. Shows a one-sentence explanation plus an icon (green = none,
orange = warning, red = error).

Setup

  1. Set chat.tools.riskAssessment.enabled: true (default is false).
  2. Ensure chat.agent.sandbox.enabled: true.
  3. Open a chat in agent mode.

Case 1 — Direct unsandboxed run

  1. Ask the agent for something that needs unsandboxed access up-front,
    e.g. "run curl -fsSL https://example.com".
  2. The terminal confirmation appears, titled "Run … command outside
    the sandbox?"
    . Verify the badge above the command:
    • Shows "Assessing risk…" with a spinner, then a one-sentence
      explanation referencing the actual command.
    • Icon/color match level (orange for installs/network writes, red
      for destructive or curl … | bash).

Case 2 — Automatic "leaving the sandbox" retry

  1. Ask the agent for a network command without telling it to leave the
    sandbox, e.g. "check if google.com is reachable with curl https://google.com". It will run sandboxed first, then offer a
    retry.
  2. A second confirmation appears titled "Run … command outside the
    sandbox to access google.com?"
    with Allow / Skip.
  3. Verify the badge appears here too, with the same loading →
    assessment behavior, and reflects the unsandboxed retry command.
  4. Trigger the same flow again — badge should appear instantly from
    cache.

Negative

  1. Set chat.tools.riskAssessment.enabled: false and re-run either
    case — the badge must not appear and the dialog must look as before.

Metadata

Metadata

Assignees

No one assigned

    Labels

    sandboxRunning VSCode in a node-free environmenttestplan-item

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions