Skip to content

Agent Workspace

Samuele Giampieri edited this page May 16, 2026 · 2 revisions

Agent Workspace

Watch the demo: RedAmon Agent Workspace: AI Runs 4 Parallel Pentests and Writes Its Own Report (YouTube) — one prompt, four concurrent job_spawn scans, live FileSystem Drawer activity, auto-generated notes/REPORT.md, and a downloadable evidence bundle. Shows every concept on this page in action against a deliberately vulnerable lab target.

The Agent Workspace is a per-project filesystem that the AI agent, the kali-sandbox, and you (through the web UI) all see at the same time. Every project gets its own isolated /workspace/<projectId>/ directory with four purpose-built folders, 24 filesystem tools the agent can call, 5 background-job tools for long-running scans, and a browser drawer to drag files in, preview results, monitor jobs, and download archives.

This page explains how to use the drawer, what every folder is for, and every workspace tool the agent has at its disposal.

Operator guide vs. technical deep dive. This page is the operator-facing guide — what the drawer looks like, what each folder is for, what every tool does. For implementation details (path validation, undo stacks, atomic writes, job lifecycle state machine), read agentic/workspace_fs.py and agentic/job_runner.py in the project repository.


Table of Contents

  1. What the Workspace Is
  2. Opening the FileSystem Drawer
  3. Workspace Layout — the Four Folders
  4. Files Tab — Walkthrough
  5. Jobs Tab — Walkthrough
  6. Uploading Files for the Agent
  7. How the Agent Uses the Workspace
  8. Filesystem Tools Reference — 24 fs_*
  9. Background-Job Tools Reference — 5 job_*
  10. Tool-Output Auto-Offload
  11. Safety, Protected Paths, and Path Validation
  12. Next Steps

What the Workspace Is

Until v4.10.0, the agent reasoned over a graph database and a chat transcript, but had nowhere to persist intermediate work between turns. Long nuclei outputs blew up the LLM context window. Files you wanted the agent to read had to be pasted into the chat. Long-running scans blocked the agent from doing anything else until they finished. And there was no place for the agent to keep notes across iterations, so each session started from a blank slate.

The Agent Workspace closes every one of those gaps in one piece of infrastructure. It gives the agent a real working surface it can read, write, search, diff, and archive across iterations. It gives you a drag-and-drop inbox the agent checks on every turn, with a CHECK THESE NOW directive that pushes the agent to read what you just dropped before continuing whatever else it was doing. It gives long-running scans a background-job lane so the agent can spawn a 30-minute nuclei sweep, keep reasoning on a different angle of the target, and search the partial output mid-flight with fs_grep. And it auto-routes huge tool outputs to disk so the LLM gets a head/tail stub plus a file path instead of a 5 MB blob that would derail its reasoning or get truncated mid-sentence.

Everything is per-project. The workspace for project A is invisible to project B, and the path-validation layer rejects every traversal trick (see Safety, Protected Paths, and Path Validation).

What This Unlocks in Practice

A few scenarios that were painful or impossible before v4.10.0, and now just work:

Scenario 1: Multi-day investigation with continuity. You start a pentest on Monday. The agent maps the auth surface, finds three candidate CVEs, and drafts initial notes in notes/auth-surface.md and notes/cve-shortlist.md. On Tuesday you reopen the project, ask "where were we?", and the agent reads its own notes back and resumes exactly where it stopped, including the rationale for the candidates it shortlisted. No transcript scrolling, no re-explaining context, no lost reasoning when a session is closed.

Scenario 2: Custom credential or wordlist drop. You have a leaked employee list from a breach you are authorized to test against. Drag the .txt file into uploads/. On the agent's next think step the workspace block surfaces the new file under CHECK THESE NOW. The agent reads it with fs_read, deduplicates against the graph's existing user nodes, and uses it as the seed for execute_hydra or a targeted OSINT pass. No copy-paste into chat, no truncation when the list is 50,000 lines long.

Scenario 3: Background nuclei scan with mid-flight triage. The agent needs to run a broad execute_nuclei template sweep that will take 30 minutes. It calls job_spawn, gets job_abc123 back, and immediately keeps working on a different angle of the target. Five minutes in, it runs fs_grep "critical|high" jobs/abc123.log to peek at hits so far, sees one CVE-2024-XXXX flagged, and pivots into manual verification of that finding while the bulk scan keeps running in the background. The blocking-sequential pattern is gone.

Scenario 4: Document-driven scoping. You upload the engagement's Rules of Engagement PDF or a vendor security advisory into uploads/. The agent reads it with fs_read, extracts the in-scope domains and prohibited techniques, drops a notes/roe-summary.md for itself, and references it on every subsequent decision. This complements rather than replaces the formal Rules of Engagement enforcement layer: RoE governs hard-coded scope at the tool executor; uploaded documents give the agent the softer context (business hours, contact escalation paths, customer-specific caveats) it needs to reason within those rails.

Scenario 5: Iterative payload crafting. The agent writes a candidate SSTI payload to notes/payloads/jinja2-rce.txt with fs_write, tests it via execute_curl, sees a sanitizer block one variant, uses fs_edit to tweak the encoding, tests again, then calls fs_undo_edit to roll back when the new version regresses. The payload's iteration history is preserved on disk for you to inspect, hand to a teammate, or paste into the final report.

Scenario 6: Pre-existing recon ingest. You already ran a Burp scan or have an nmap XML from a prior engagement window. Drop both into uploads/. The agent parses them with fs_read plus a quick fs_grep for findings of interest, cross-references against the live graph, and surfaces deltas: services present in the old export but missing now, new endpoints since the last sweep, ports that changed banners between scans. The agent does the diff so you don't have to.

Scenario 7: Fireteam coordination across specialists. A four-member fireteam wave deploys: one specialist on auth, one on routing, one on headers, one on JS recon. All four write to the same notes/ directory in parallel. Each member uses fs_diff vs_last_read to detect when another member has updated a shared file (notes/shared-context.md), avoiding lost-update races without needing explicit locks. The chat stays clean while the coordination happens through the filesystem. See Fireteam — Parallel Specialists for the wave-deployment model that makes this work.

Scenario 8: Huge tool output that would have killed context. execute_playwright returns a 4 MB DOM dump after a deep crawl. The auto-offload writes it to tool-outputs/2026-05-15T14-22-08Z-execute_playwright.txt. The LLM sees the file path plus a head/tail stub (the first 80 lines and last 40 lines, char-capped) and then surgically pulls only what it cares about with fs_grep "csrf|api_key|/admin" tool-outputs/.... Pre-workspace, that one tool call would have either truncated mid-sentence or eaten the entire context window.

Scenario 9: Long hydra brute-force without losing visibility. The agent spawns execute_hydra against a discovered SSH service as a background job. The drawer's Jobs tab shows it running with a live byte-count badge so you can see at a glance that the scan is still making progress. You click View log to jump straight to jobs/<id>.log and read live output without interrupting the agent. If the job goes sideways, Cancel stops it cleanly and the status flips to cancelled. If the agent container restarts mid-scan, the recover_on_boot hook flips orphan running jobs to interrupted so the UI never shows a forever-spinning row.

Scenario 10: Operator handoff at engagement end. You click the download icon on the workspace root and pull a tar.gz of the entire workspace to your laptop. Inside is the agent's complete working memory: every notes/ file, every offloaded scan output in tool-outputs/, every job log in jobs/, every artifact you uploaded. You hand it to a co-operator, attach it as supporting evidence to the engagement report, or archive it for the customer's record. Reproducible, auditable, no information loss.


Opening the FileSystem Drawer

There are two entry points for the FileSystem Drawer. Both open the same drawer; both close the Node Drawer first if it's open, to avoid overlap.

Entry point Location
Folder icon in the graph toolbar Top row of the Red Zone toolbar, near the AI Agent button. Always available.
Folder icon in the AI Agent drawer header Inside the AI Agent drawer header, to the left of the chat. Convenient when you're already chatting and want to drop a file mid-conversation.

The drawer slides in from the left side of the screen (the AI Agent drawer slides from the right), so you can have chat and files visible at the same time on a wide monitor. You can resize the drawer by dragging its right edge; the width is remembered per user.

The drawer has two tabs: Files and Jobs. Both auto-refresh every 5 seconds while open. The Jobs tab also receives instant job_update events over the agent WebSocket, so status flips are visible without waiting for the poll.

Agent Workspace — Files tab


Workspace Layout — the Four Folders

When a project is created, the workspace is empty. On first access, four subdirectories are auto-created. All four are protected: the drawer will refuse to rename or delete them, and the backend enforces the same rule (the lock badge in the UI is backed by a server-side check, not just a frontend hint). The files inside them remain fully editable — only the folders themselves are pinned.

Folder What it's for Who writes here Protected?
notes/ Agent scratch space — investigation notes, draft reports, generated payloads, anything the agent wants to keep across iterations. Agent (via fs_write / fs_edit) and you (via upload or rename into it). Folder yes, contents no
tool-outputs/ Auto-managed. When a tool's stdout exceeds 20 KB, the executor writes it here as <utc-iso>-<tool>.txt and gives the LLM a head/tail stub plus the file path. The agent runtime (automatic). Folder yes, contents no
jobs/ Auto-managed. Background-job logs (<job_id>.log) and metadata (<job_id>.meta.json). The agent reads these via job_status / job_wait or directly with fs_grep mid-flight. The job_runner (automatic). Folder yes, contents no
uploads/ Your inbox for the agent. Drag files here for the agent to read on its next think step. You (via the drawer). Folder yes, contents no

The agent is taught this layout on every think step via the WORKSPACE_LAYOUT_BLOCK prepended to its prompt. When uploads/ has files, the block adds a CHECK THESE NOW directive listing each filename (newest first, capped at 20), so the agent reflexively reads what you've just dropped before continuing whatever else it was doing.


Files Tab — Walkthrough

The Files tab is a familiar file-manager surface, scoped to the project workspace.

Feature What it does
Breadcrumb navigation Click any path segment to jump back up the tree. The root crumb takes you to /workspace/<projectId>/.
Sort columns Click Name, Size, or Modified to toggle ascending/descending. Directories are always grouped on top.
Filter box Real-time, case-insensitive substring match on the current directory's entries.
Multi-select Checkbox on each row plus a header checkbox for select-all. Selected rows are highlighted.
Drag-and-drop upload Drop one or more files anywhere on the drawer. The current directory is the destination. On name collision the backend returns 409 and you get an overwrite-confirm modal — no silent overwrites.
File picker upload Browser file input as the keyboard-friendly alternative to drag-and-drop.
Inline preview Click any file row to open a preview pane. Text files render with syntax highlighting; binary files fall back to a base64 hex view. Hard cap of 1 MB so the drawer never hangs on a huge file.
Properties popover The info icon on each row opens a popover with size, mtime, mode (octal), SHA-256, and the symlink target if applicable.
Per-folder tar.gz download The download icon on any folder row packages the entire subtree into a tar.gz stream — useful for shipping notes/ to a co-operator or grabbing all tool-outputs/.
Rename inline Pencil icon on each row. Inline input field. Rejected on protected folders, on / or \ in the new name, and on .. segments.
Delete Trash icon. Themed confirmation modal (not the native browser dialog). Recursive flag required when deleting a non-empty folder.
Bulk download When multiple rows are selected, the bulk-download button packages every selected path into a single tar.gz archive. Useful for pulling specific files from across the tree.
Bulk delete Same selection model. Mixed selections that include protected folders are silently filtered — the confirmation modal explains exactly what will and won't be removed.
Lock badges The four default folders show a Lock badge so the protected-status is obvious before you try to rename or delete.
New Folder Creates a subdirectory in the current path. Parent directories auto-create if missing.

The Files tab auto-refreshes every 5 seconds while open. Refresh is paused while you have a preview open, so you don't get flicker on a file you're reading. Switching projects (via the project selector) resets the drawer state — preview, selection, filter, and current path all clear, so you never see project A's SHA-256 leak into project B's view.

File preview and properties


Jobs Tab — Walkthrough

The Jobs tab lists every background job ever spawned for this project, newest first.

Column What you see
Tool The tool that was spawned (execute_nuclei, execute_curl, etc.), plus an optional human-readable label the agent set on job_spawn.
Status Color-coded badge: running (yellow), done (green), failed (red), cancelled (orange), interrupted (gray — agent restart while it was running).
Elapsed Wall-clock duration. Live for running jobs; frozen for terminal ones.
Log size Current byte count of jobs/<id>.log. Tells you at a glance whether a job is making progress.
Actions View log jumps to the Files tab and opens jobs/<id>.log in the preview pane. Cancel is shown for running jobs only — themed confirm, then the asyncio task is cancelled and status flips to cancelled.

The Jobs tab also receives WebSocket job_update events pushed by the agent runtime whenever a job transitions (spawned → running → done/failed/cancelled). The 5-second poll is the fallback if the WebSocket isn't connected.

Jobs survive agent restarts: their <id>.meta.json is persisted to disk, so when the agent container reboots, orphan running jobs are flipped to interrupted by the recover_on_boot hook — they don't show up as forever-spinning in the UI.

Agent Workspace — Jobs tab


Uploading Files for the Agent

The uploads/ folder is your inbox for the agent. Typical things to drop here:

  • Credentials — a .env snippet, a SSH key, a service-account JSON.
  • Wordlists — a custom list of usernames or paths for fuzzing.
  • Screenshots — UI screenshots of an unauthenticated page you want the agent to reason about.
  • PDFs / docs — a vendor advisory, a target's privacy policy, an RoE document.
  • Existing recon output — a Burp export, a nmap XML, a Nuclei JSON.

On the agent's next think step, the WORKSPACE_LAYOUT_BLOCK rendered into its prompt will list the contents of uploads/ under a CHECK THESE NOW directive (newest 20 files), pushing the agent to read them with fs_read / fs_read_many before doing anything else. You don't need to tell the agent in chat that you've dropped a file — but you can if you want extra steering ("the file I just uploaded is the production OpenAPI spec; map all endpoints from it").


How the Agent Uses the Workspace

When you ask the agent something, the workspace participates in three roles:

  1. Memory across iterations. The agent uses notes/ like a scratchpad. A typical investigation will leave behind notes/auth-surface-summary.md, notes/cve-shortlist.md, notes/draft-report.md, etc. — and the agent reads its own notes back later. This is why long sessions on the same project get smarter over time.
  2. Tool-output overflow buffer. Heavy tools like execute_nuclei and execute_playwright dump megabytes of output. The runtime offloads everything over 20 KB to tool-outputs/<utc-iso>-<tool>.txt and gives the LLM a head/tail stub. The agent then uses fs_grep / fs_read to drill in surgically instead of carrying the whole blob in context.
  3. Background scan staging area. When the agent decides a scan will take more than 60 seconds and it has other useful work to do, it uses job_spawn to detach the scan as a background job. Output streams to jobs/<id>.log in real time. The agent can call fs_grep "critical" jobs/<id>.log mid-flight to see if the scan has hit anything interesting yet, without blocking.

Fireteam members share the same workspace as the root agent — see Fireteam — Parallel Specialists for how multiple specialists coordinate via shared files (fs_diff vs_last_read detects concurrent edits between members).


Filesystem Tools Reference — 24 fs_*

These 24 tools are available to the agent in all three operational phases (Informational, Exploitation, Post-Exploitation). They operate only on paths inside the project workspace; any attempt to escape via .. or symlink resolution returns a clean error.

Read & Inspect

Tool Purpose
fs_read Read a file with line numbers. Auto-detects binary content. Stores an internal snapshot for fs_diff vs_last_read.
fs_read_many Batched read of multiple files in one call, capped at max_total_bytes (default 200 KB).
fs_stat Metadata only — size, mtime, mode, type, optional SHA-256.

Write & Mutate

Tool Purpose
fs_write Atomic create / overwrite / append. Auto-creates parent directories.
fs_edit Exact-string replacement with uniqueness check (refuses if the search string appears more than once). Records an undo entry.
fs_multi_edit Multiple ordered edits to one file as a single all-or-nothing transaction. One undo snapshot per call.
fs_undo_edit Roll back the last fs_edit / fs_multi_edit. In-memory undo stack, capped at 20 entries per file, cleared on agent restart.
fs_delete Delete a file or directory. Recursive flag required for directories.
fs_move Move / rename a path. Auto-creates destination parents. Works across subdirectories.
fs_copy Copy a file or directory tree. Recursive flag required for directories. Permissions normalized so the host user can edit the copy.
fs_mkdir Create a directory. Idempotent. The four default subdirectories are auto-created on first workspace access.
fs_chmod Change permission bits, octal (0o755) or symbolic (+x, -w).
fs_symlink_create Create a symlink. Refuses if the link already exists or if either endpoint would escape the workspace.

Search & Navigate

Tool Purpose
fs_grep Ripgrep over the workspace with full regex support. 30s timeout, 1000-match cap. Works on in-flight job logs in jobs/.
fs_glob Glob pattern search (**/*.json, notes/*.md) sorted newest-first. Capped at 500 results.
fs_find Metadata-driven search — name glob plus filters on mtime, size, type. 30s walk timeout, 5000-result ceiling.
fs_list Single-directory listing with type, size, mtime. Capped at 200 entries per call.
fs_tree Depth-limited ASCII tree view. Skips .git, node_modules, __pycache__. Bounded by both depth and total entry count.
fs_symbols Tree-sitter AST outline — function / class / method names and their line ranges. 15 languages: Python, JS, TS, TSX, JSX, Java, Go, Rust, Ruby, PHP, C, C++, C#, Kotlin, Swift, Scala.
fs_symlink_read Resolve a symlink to its raw target without following it.

Integrity & Archive

Tool Purpose
fs_hash SHA-256 or MD5 hash of a file. Streams in 64 KB chunks so large files don't blow memory.
fs_diff Unified diff between two files, or between a file and its last fs_read snapshot (vs_last_read mode — catches concurrent writes in fireteam sessions).
fs_extract Safe extraction of tar / zip / gz archives. Refuses zip-slip and tar-slip paths that would write outside the destination.
fs_archive Bundle workspace paths into a tar.gz or zip for one-click operator download.

For deeper UI use of files and notes, see also the dedicated Pentest Reports page — agent-drafted reports typically land in notes/ before being exported.


Background-Job Tools Reference — 5 job_*

Background jobs let the agent detach long-running tools and keep reasoning in parallel.

Tool Purpose
job_spawn Spawn a tool call as an asyncio background task. Returns {job_id, output_path, status: running} immediately. Output is tee'd to jobs/<job_id>.log in real time. Phase restrictions and RoE checks are enforced at spawn time, not when results are read.
job_status Non-blocking status query. Returns current status, log size, and the last 40 lines of output. Survives agent restart via <id>.meta.json.
job_wait Block up to N seconds for the job to finish. Returns the same shape as job_status whether the timeout fired or the job completed.
job_cancel Cancel a running job. No-op on terminal jobs. Status flips to cancelled.
job_list List jobs for the project. Filter by active=true (running), active=false (terminal), or omit for all. Sorted by start time, newest first.

The agent is steered (via WORKSPACE_LAYOUT_BLOCK) to use job_spawn when a tool will likely take longer than 60 seconds and there's other useful work to do in parallel and live feedback isn't critical. For everything else — quick HTTP probes, graph queries, KB / CVE lookups, single-file edits — synchronous tool calls remain the default.


Tool-Output Auto-Offload

When a tool returns more than 20,000 bytes of output, the runtime automatically:

  1. Writes the full output to tool-outputs/<utc-iso>-<tool>.txt.
  2. Replaces the LLM-facing result with a stub that contains the file path, the byte count, the first ~80 lines (4 KB cap), and the last ~40 lines (2 KB cap).
  3. Tells the agent it can use fs_read for the full output or fs_grep over tool-outputs/ to search.

The char-capped head and tail prevent single-line blobs (base64 dumps, minified JSON) from defeating the offload by being technically "small in line count."

Auto-offload is governed by a per-tool policy map:

Mode Behavior Examples
auto (default) Offload only if output exceeds 20 KB. Most execute_* tools.
always Always offload, even small outputs. execute_nuclei, execute_playwright (consistently huge).
never Always inline, regardless of size. query_graph, web_search, cve_intel, shodan, google_dork, tradecraft_lookup (structured, small, want full context).

The agent can also force the mode on a per-call basis with an output_mode parameter set to inline, file, or auto. The executor strips this parameter before dispatch so the underlying tool / MCP server never sees it.


Safety, Protected Paths, and Path Validation

Several layers of validation keep the workspace contained and inert against directory traversal, symlink escape, and archive-slip attacks.

Concern Defense
Project-id injection (projectId="../etc" in the API query string) _validate_project_id rejects /, \, NUL byte, leading ., and .. before the path is joined to the workspace root.
Path traversal (fs_read ../etc/passwd) Every fs_* tool resolves the requested path against the project root and rejects anything outside it.
Symlink escape The path resolver follows symlinks and re-checks containment — a symlink to /etc/passwd inside the workspace can't be read through fs_read.
Zip-slip / tar-slip on extract fs_extract rejects entries whose resolved destination falls outside the target directory.
Archive symlink leak Archive builders (fs_archive, drawer downloads) skip symlinks so a workspace symlink to a host file can't be smuggled into a downloaded zip.
Protected-folder bypass via normalization (./notes, notes/, notes//) is_protected_path normalizes the path with os.path.normpath before checking — 11 variants are regression-pinned.
Host-user lockout The agent container runs as root, but os.umask(0) at startup ensures files it creates are mode 0o666 / dirs 0o777, so your host user (UID 1000) can edit and remove them. fs_copy re-applies the same modes explicitly to defeat shutil.copy2's metadata copy.
Download-error navigation The drawer uses an <a download> anchor rather than window.location.href, so a backend error never navigates the page away from the graph view.

Path validation is server-enforced, not a frontend convenience. Even a direct HTTP call to /workspace/... with projectId=../etc is rejected with a clean 400 — the lock badges in the drawer mirror the same rule but are not the gate.


Next Steps

  • AI Agent Guide — how the agent reasons about which tool to call (including when to spawn a job_* vs. call a tool synchronously).
  • Fireteam — Parallel Specialists — how multiple specialist sub-agents share the same workspace and coordinate via fs_diff vs_last_read.
  • Red Zone — the toolbar where the folder icon lives.
  • Pentest Reports — agent-drafted reports usually originate as notes/draft-report.md before export.
  • Project Settings Reference — every configurable parameter, including the tool-phase matrix that gates which fs_* and job_* tools are available in each phase.

Clone this wiki locally