Skip to content

jamesburchill/safeagent

Repository files navigation

SafeAgent

Project home: https://safeagent.ca

SafeAgent is governed execution for autonomous systems.

It is not another agent framework. SafeAgent is the boundary layer that decides, contains, and records whether an autonomous system may safely affect real systems. It is designed for teams that want autonomous agents to run commands, inspect repositories, trigger tests, or touch controlled infrastructure without granting them open-ended host access.

What Problem It Solves

Autonomous agents are useful because they can act. That same property makes them risky. A coding agent, build agent, operations assistant, or remediation loop can:

  • run commands the operator did not expect;
  • read files outside the intended workspace;
  • write too much or write in the wrong place;
  • use network access in ways that bypass review;
  • hide important context in unstructured logs;
  • create changes without a durable approval and audit trail.

SafeAgent gives those systems a controlled execution path. It checks policy before execution, isolates each job in a disposable sandbox, requires approval for risky actions, and writes JSONL audit events that an operator can inspect after the fact.

Who It Is For

SafeAgent is for:

  • developers experimenting with autonomous coding agents against local repositories;
  • platform teams building approval gates around agent-driven execution;
  • security teams that need evidence of what an agent requested, what was allowed, and what actually ran;
  • product teams adding autonomous workflows where runtime containment and provenance matter.

It is currently a reference implementation and starter control plane, not a high-assurance sandbox product.

How It Differs From Agent Runtimes

Agent runtimes such as OpenClaw or NanoClaw focus on agent capability: tool use, planning loops, coding workflows, and runtime ergonomics.

SafeAgent focuses on the execution boundary:

  • which command may run;
  • which path it may touch;
  • which network profile it may use;
  • whether approval is required;
  • how execution is isolated;
  • what evidence is recorded.

An agent runtime can call SafeAgent. SafeAgent does not try to replace the runtime. It gives the runtime a governed place to act.

SafeAgent And VetoGuard

SafeAgent and VetoGuard should remain distinct:

  • SafeAgent is execution containment and runtime enforcement.
  • VetoGuard is policy authority, veto logic, authorisation arbitration, and escalation.

This repository includes a simple local policy model. A production deployment could ask VetoGuard for a decision before SafeAgent launches a sandbox, but SafeAgent should still own the execution mechanics, audit records, workspace copy, approval state, and runtime limits.

See docs/vetoguard-integration.md for the intended integration shape.

Current Capabilities

  • FastAPI control plane for job submission and approval workflow.
  • Disposable Docker sandbox per job.
  • Non-root sandbox user.
  • Read-only sandbox root filesystem with a writable /workspace.
  • No privileged mode and no Docker socket inside the sandbox.
  • Named network profiles, including Docker's none network.
  • Conservative default inspect-only policy.
  • Optional execution policy for tests and builds.
  • Policy checks for command allow/deny/approval decisions.
  • Path checks for common file-oriented commands.
  • Source workspace validation that rejects symlinks and special files.
  • JSONL audit records with stable top-level fields and original payload detail.
  • Output truncation and workspace write counting.

Threat Model

SafeAgent is intended to reduce blast radius when an autonomous system executes commands. It assumes the agent may make mistakes, request risky commands, or attempt basic command/path escapes. The current implementation is meant to protect the host from casual or accidental damage, not from a determined container escape adversary.

Primary controls:

  • pre-execution policy decisions;
  • no shell invocation for sandboxed commands;
  • shell control operator rejection;
  • source repository validation before copying into the job workspace;
  • workspace-root confinement for submitted repositories;
  • disposable per-job workspaces;
  • Docker runtime hardening flags;
  • approval records for higher-risk commands;
  • audit records for decisions and execution results.

Important limitations:

  • Docker with a mounted Docker socket in the control plane is a high-trust component.
  • Docker containers are not a high-assurance isolation boundary.
  • Repo-defined test/build commands can execute arbitrary code inside the sandbox.
  • Network controls depend on correct host and Docker network configuration.
  • Approval auth is a minimal token check, not a full identity system.
  • File write limits are detected after execution, not prevented at the first write.

Read the full model in docs/threat-model.md.

Quickstart

  1. Copy .env.example to .env and set APPROVAL_TOKEN to a non-default value before exposing the API beyond local testing.

  2. Set HOST_WORKSPACES_ROOT in .env to the absolute host path for this repository's workspaces directory.

  3. Create a sample repository inside the mounted workspace root:

mkdir -p ./workspaces/example-repo
printf 'hello\n' > ./workspaces/example-repo/hello.txt
printf 'delete me\n' > ./workspaces/example-repo/delete-me.txt
  1. Build the sandbox image:
docker build -t agent-safe-sandbox:latest ./sandbox-images
  1. Start the control plane:
docker compose up --build -d
  1. Submit an allowed inspect-only command:
curl -X POST http://localhost:8080/jobs \
  -H 'Content-Type: application/json' \
  -d '{
    "repo_path": "/opt/agent-stack/workspaces/example-repo",
    "command": "ls",
    "network_profile": "none"
  }'
  1. Submit a denied command:
curl -X POST http://localhost:8080/jobs \
  -H 'Content-Type: application/json' \
  -d '{
    "repo_path": "/opt/agent-stack/workspaces/example-repo",
    "command": "docker ps",
    "network_profile": "none"
  }'
  1. Submit an approval-required command:
curl -X POST http://localhost:8080/jobs \
  -H 'Content-Type: application/json' \
  -d '{
    "repo_path": "/opt/agent-stack/workspaces/example-repo",
    "command": "rm delete-me.txt",
    "network_profile": "none"
  }'
  1. Inspect or decide the approval:
curl http://localhost:8080/approvals/<approval_id> \
  -H 'X-Approval-Token: change-me-approval-token'

curl -X POST http://localhost:8080/approvals/<approval_id>/deny \
  -H 'Content-Type: application/json' \
  -H 'X-Approval-Token: change-me-approval-token' \
  -d '{"reviewer": "operator", "note": "not needed"}'

More examples are in examples/README.md.

Policy Model

The default policy is configs/policy.yaml. It allows inspection commands and requires approval for test/build/package/network/destructive actions.

configs/policy.execution.yaml is an explicit opt-in policy that allows common test/build commands such as pytest, python3 -m pytest, and make test. It is less conservative because those commands can execute repository-defined code.

Policy decisions are evaluated in this order:

  1. explicit deny;
  2. approval required;
  3. allow;
  4. default deny.

See docs/policy-model.md.

Audit Events

Audit records are written as JSONL under AUDIT_ROOT, one file per job. Records include stable top-level fields such as:

  • timestamp
  • request_id
  • actor
  • action
  • target
  • policy_decision
  • approval_required
  • approved_by
  • result
  • exit_code
  • evidence
  • path_changes

The original event-specific details remain in payload.

See docs/audit-schema.md.

Architecture Overview

Agent or operator
  -> SafeAgent control plane
  -> policy decision
  -> optional approval
  -> disposable sandbox
  -> result and audit log

The control plane receives a job request, normalises the command, checks the configured policy, validates the source repository, copies it into a per-job workspace, launches a hardened Docker sandbox without a shell, captures output, counts workspace changes, and writes audit events.

See docs/architecture.md.

Development

Install runtime dependencies:

python3 -m pip install -r control-plane/requirements.txt

Install test dependencies:

python3 -m pip install -r requirements-dev.txt

Run tests:

python3 -m pytest

There is no separate lint configuration yet. Keep changes small, readable, and dependency-light.

Repository Layout

  • control-plane/app.py -- minimal FastAPI control plane.
  • configs/policy.yaml -- conservative inspect-only policy.
  • configs/policy.execution.yaml -- opt-in execution policy.
  • configs/nftables-agent.conf -- host egress-control example.
  • sandbox-images/Dockerfile -- non-root sandbox image.
  • scripts/launch-sandbox.sh -- standalone sandbox launcher helper.
  • scripts/apply-nftables.sh -- helper to load the nftables example.
  • examples/ -- request and audit examples.
  • docs/ -- architecture, threat model, policy model, VetoGuard integration, roadmap, and audit schema.
  • tests/ -- focused safety behaviour tests.

Current Limitations

  • This is a reference implementation, not a production security boundary.
  • The control plane is high trust because it can access the Docker socket.
  • Approval workflow uses a single shared token.
  • There is no multi-user identity, RBAC, or signed policy bundle.
  • There is no built-in VetoGuard client yet.
  • Runtime filesystem enforcement is coarse; some write controls are post-execution checks.
  • Path policy extraction covers common commands, not every possible command shape.
  • Host firewall rules are examples and must be adapted before relying on network isolation.
  • Logs should be shipped to append-only storage for production use.

Licence

SafeAgent is licensed under the Apache License, Version 2.0. Licence terms are in LICENSE.

About

SafeAgent is a Dockerized execution layer for AI agents that enforces boundaries, controls access, and keeps agent behaviour inside systems you own.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Contributors