Agent terminal substrate: persistent PTY sessions with semantic screen reads, predicate waits, replayable evidence, policy gates, proof bundles, forks, and agent-compatible transports.
This is the layer below agent orchestration frameworks. It gives coding and ops agents durable hands in a terminal: start a session, send input, observe the screen, wait for real conditions, record every mutation, and let a human or another agent audit what happened.
Install from the public repo:
cargo install --git https://github.com/evalops/agent-ptySee docs/INSTALL.md for requirements, source builds, and local verification.
Run the self-contained demo first. It does not require a daemon:
agent-pty demoThe demo creates a disposable git repo under ~/.agent-pty/demos, opens a real
PTY session, runs a failing check, fixes the repo from inside the session, waits
for the passing check, and writes proof artifacts. The output points to:
- the demo workspace
- a Markdown report
- generated proof bundle artifacts in JSON, Markdown, and HTML
- the append-only event log
For automation, use JSON output:
agent-pty demo --jsonAfter that, try the persistent daemon flow:
agent-pty doctor
agent-pty serve --socket ~/.agent-pty.sockIn another terminal:
agent-pty status
agent-pty new --repo "$PWD" --name first-run
agent-pty send first-run "printf 'hello from agent-pty\n'"
agent-pty wait first-run --until "hello from agent-pty"
agent-pty screen first-run --format markdown
agent-pty attach first-run --read-only --timeout 2s
agent-pty proof first-run
agent-pty proof first-run --html
agent-pty kill first-run
agent-pty stopFor ready-to-run integration examples, see examples/. They cover
CLI daemon control, HTTP/JSON, and MCP stdio flows against real sessions.
- Persistent PTY sessions through
portable-pty. vt100screen snapshots with spans plus semantic summaries for prompts, commands, error lines, URLs, file paths, spinners, and active processes.- Predicate waits for literal/regex text, prompt return, idle output, and process exit.
- Append-only JSONL evidence logs for actions and observations.
- Evidence redaction for common token/password/API-key shapes before JSONL write.
- SHA-256 event hash chains with proof-level log-integrity reporting.
- Git status/diff snapshots around mutating actions.
- Durable session index with workspace, shell, env, pid, dimensions, start time, active state, and event-log path.
- Process snapshots in observations, including root pid and child processes.
- Proof bundles as JSON, Markdown, and self-contained HTML artifacts.
- Git worktree-backed forks for parallel repair attempts.
- Configurable policy gate with audited denials and one-time approval tokens.
- Unix-socket JSON daemon.
- Daemon lifecycle diagnostics with
doctor,status, andstop. - Human attach over the Unix socket with live output and optional stdin control.
- Tmux-backed sessions that can reconnect to live processes after daemon restart.
- HTTP/JSON request surface.
- MCP-compatible stdio JSON-RPC tool surface.
- OTEL-style JSONL trace events for daemon requests.
agent-pty demo
agent-pty doctor
agent-pty serve --socket ~/.agent-pty.sock
agent-pty status
agent-pty stop
agent-pty serve-http --addr 127.0.0.1:4319
agent-pty mcp-stdio
agent-pty new --repo ~/src/evalops/platform --name codex-1
agent-pty new --repo ~/src/evalops/platform --name durable-1 --backend tmux
agent-pty send codex-1 "cargo test"
agent-pty attach codex-1
agent-pty screen codex-1 --format markdown
agent-pty screen codex-1 --format json
agent-pty wait codex-1 --until "finished in"
agent-pty wait codex-1 --until "idle:2s"
agent-pty list
agent-pty proof codex-1
agent-pty proof codex-1 --html
agent-pty replay codex-1 --json
agent-pty fork codex-1 --new-name repair-b --copy-worktree
agent-pty trace-path
agent-pty kill codex-1The daemon accepts newline-delimited JSON on the configured Unix socket. Most
connections carry one request and receive one response. attach is the
exception: after the JSON handshake, the socket switches into a bidirectional
terminal stream.
{"op":"new","id":"codex-1","repo":"/repo","shell":"/bin/sh","rows":24,"cols":80,"env":{}}
{"op":"new","id":"durable-1","repo":"/repo","shell":"/bin/sh","rows":24,"cols":80,"env":{},"backend":"tmux"}
{"op":"send","id":"codex-1","text":"cargo test","enter":true}
{"op":"approve","id":"codex-1","command":"printf 'ok\\n' # vault write","rule":"vault write","ttl_ms":600000}
{"op":"send","id":"codex-1","text":"printf 'ok\\n' # vault write","enter":true,"approval":"<token>"}
{"op":"attach","id":"codex-1","read_only":false,"history_bytes":12000}
{"op":"wait","id":"codex-1","until":"regex:finished in","timeout_ms":30000}
{"op":"screen","id":"codex-1"}
{"op":"list"}
{"op":"proof","id":"codex-1"}
{"op":"fork","id":"codex-1","name":"repair-b","copy_worktree":true}
{"op":"replay","id":"codex-1"}
{"op":"trace_path"}
{"op":"kill","id":"codex-1"}
{"op":"shutdown"}Use doctor before starting the daemon or when a user reports that the CLI
cannot connect:
agent-pty doctor
agent-pty doctor --jsonUse status and stop for day-to-day lifecycle checks:
agent-pty status
agent-pty status --json
agent-pty stopstatus --json succeeds even when the daemon is unreachable, returning
running: false with the connection diagnostic in error. That makes it safe
for scripts and agents to call without turning "not running" into an exception.
Attach streams the session transcript and live output into the current terminal. By default, stdin is forwarded into the PTY, so a human can answer prompts or interrupt a process directly:
agent-pty attach codex-1For observation-only use, pass --read-only:
agent-pty attach codex-1 --read-onlyFor smoke tests and scripts, --timeout exits automatically:
agent-pty attach codex-1 --read-only --timeout 2s
printf 'yes\n' | agent-pty attach codex-1 --timeout 1sAttach uses a streaming Unix-socket handshake, not the one-request/one-response
JSON protocol. Each attach is recorded as evidence, and any bytes typed through
the attach channel are logged as normal send_keys actions.
agent-pty screen <name> --format json returns the raw screen text, spans, and
a semantic summary. The summary is built for agent consumption:
- latest prompt-looking line
- latest command-looking line
- error-looking line indexes
- URLs and file paths
- spinner-looking line indexes
- active child process, when visible from the process tree
The default pty backend is an in-process portable PTY. It is fast and has the
most complete local state, but the live process dies with the daemon.
Use --backend tmux when a session must survive daemon restarts:
agent-pty new --repo "$PWD" --name durable-1 --backend tmux
agent-pty send durable-1 "npm test -- --watch"
agent-pty stop
agent-pty serve --socket ~/.agent-pty.sock
agent-pty screen durable-1Tmux-backed sessions are persisted in sessions.json with their tmux session
name. A fresh daemon using the same log directory can reconnect to the live tmux
session for send, screen, wait, attach, proof, replay, and kill.
agent-pty serve-http exposes the same request model over POST /request.
curl -sS http://127.0.0.1:4319/request \
-d '{"op":"screen","id":"codex-1"}'Responses use the same envelope as the Unix socket transport:
{"ok":true,"data":{"type":"screen","data":{"rows":24,"cols":80,"text":"...","spans":[]}},"error":null}agent-pty mcp-stdio exposes these MCP-compatible tools over stdio JSON-RPC:
terminal.newterminal.sendterminal.screenterminal.waitterminal.killterminal.replayterminal.listterminal.proofterminal.forkterminal.approve
Every action and observation is written to the session JSONL log. A proof bundle summarizes:
- commands run
- files changed
- latest git status and diff
- blocked policy actions
- approved policy actions
- nonzero exits
- screen tail
- log-integrity status
- event-log path
- generated JSON, Markdown, and HTML artifact paths
Before events are written, common secret-like values are redacted from commands, observations, and git snapshots. Each event is linked into a SHA-256 hash chain; proof bundles report whether the replayed log verifies.
agent-pty proof <name> prints the Markdown proof path by default.
agent-pty proof <name> --html prints the browser-friendly HTML proof path, and
agent-pty proof <name> --json returns the full structured bundle.
The built-in policy requires approval for these high-risk command families before they reach the PTY:
rm -rfgit push --forceterraform applykubectl deletevault writegh pr merge
Policy denials and approvals become evidence events, so a run can prove that a dangerous mutation was avoided, explicitly approved, or blocked by policy.
Create a one-time approval for the exact command, then spend it on send:
cmd="printf 'approved\n' # vault write"
token="$(agent-pty approve codex-1 "$cmd" --rule "vault write" --ttl 10m)"
agent-pty send codex-1 "$cmd" --approval "$token"Daemon surfaces can load JSON policy files with --policy:
agent-pty serve --policy ./agent-pty-policy.json
agent-pty serve-http --policy ./agent-pty-policy.json
agent-pty mcp-stdio --policy ./agent-pty-policy.jsonExample policy:
{
"builtin_rules": true,
"rules": [
{
"label": "block secret writes",
"pattern": "(?i)secret-token",
"action": "deny"
},
{
"label": "allow tmp cleanup",
"pattern": "rm -rf /tmp/agent-pty-safe",
"action": "allow"
}
]
}Rule actions are allow, require_approval, or deny. Custom rules are
evaluated before built-ins, which lets a policy create tightly scoped exceptions.
cargo fmt -- --check
cargo test
cargo buildThe deepest local check is an ignored black-box test that drives the compiled binary through real tmux panes and real daemon processes:
cargo test --test e2e_tmux -- --ignored --nocaptureThe test invokes:
scripts/e2e-tmux.sh --ci --artifacts target/e2e-tmux/latestIt creates a temporary broken Rust repo from fixtures/broken-rust-project,
starts a detached tmux session, and exercises:
- Unix socket daemon and CLI commands
- HTTP daemon with
curl - MCP stdio JSON-RPC tools
- interactive prompt handling with
read - Python REPL interaction
- long-running output and idle waits
- ANSI/color and cursor-return output
- blocked policy commands and audited denial evidence
- git worktree forks for passing and failing repair attempts
- proof bundles for both repair attempts
- concurrent
replayandproofclients hammering the same evidence log - daemon restart followed by replay from durable logs
- tmux pane capture and
scriptterminal transcript capture
Artifacts are written under target/e2e-tmux/latest:
report.mdtmux-driver-pane.txttmux-daemon-pane.txttmux-http-pane.txttmux-observer-pane.txtdriver.typescriptagent-pty-logs/*.jsonlagent-pty-logs/traces.jsonlproofs/*.proof.mdproofs/*.proof.html
This harness is intentionally slower and more operationally realistic than the normal Cargo suite. It is the check to run before claiming that terminal, transport, replay, proof, policy, and artifact behavior work end to end.