agent-pty

Agent terminal substrate: persistent PTY sessions with semantic screen reads, predicate waits, replayable evidence, policy gates, proof bundles, forks, and agent-compatible transports.

This is the layer below agent orchestration frameworks. It gives coding and ops agents durable hands in a terminal: start a session, send input, observe the screen, wait for real conditions, record every mutation, and let a human or another agent audit what happened.

Five-Minute Quickstart

Install from the public repo:

cargo install --git https://github.com/evalops/agent-pty

See docs/INSTALL.md for requirements, source builds, and local verification.

Run the self-contained demo first. It does not require a daemon:

agent-pty demo

The demo creates a disposable git repo under ~/.agent-pty/demos, opens a real PTY session, runs a failing check, fixes the repo from inside the session, waits for the passing check, and writes proof artifacts. The output points to:

the demo workspace
a Markdown report
generated proof bundle artifacts in JSON, Markdown, and HTML
the append-only event log

For automation, use JSON output:

agent-pty demo --json

After that, try the persistent daemon flow:

agent-pty doctor
agent-pty serve --socket ~/.agent-pty.sock

In another terminal:

agent-pty status
agent-pty new --repo "$PWD" --name first-run
agent-pty send first-run "printf 'hello from agent-pty\n'"
agent-pty wait first-run --until "hello from agent-pty"
agent-pty screen first-run --format markdown
agent-pty attach first-run --read-only --timeout 2s
agent-pty proof first-run
agent-pty proof first-run --html
agent-pty kill first-run
agent-pty stop

For ready-to-run integration examples, see examples/. They cover CLI daemon control, HTTP/JSON, and MCP stdio flows against real sessions.

What Works Now

Persistent PTY sessions through portable-pty.
vt100 screen snapshots with spans plus semantic summaries for prompts, commands, error lines, URLs, file paths, spinners, and active processes.
Predicate waits for literal/regex text, prompt return, idle output, and process exit.
Append-only JSONL evidence logs for actions and observations.
Evidence redaction for common token/password/API-key shapes before JSONL write.
SHA-256 event hash chains with proof-level log-integrity reporting.
Git status/diff snapshots around mutating actions.
Durable session index with workspace, shell, env, pid, dimensions, start time, active state, and event-log path.
Process snapshots in observations, including root pid and child processes.
Proof bundles as JSON, Markdown, and self-contained HTML artifacts.
Git worktree-backed forks for parallel repair attempts.
Configurable policy gate with audited denials and one-time approval tokens.
Unix-socket JSON daemon.
Daemon lifecycle diagnostics with doctor, status, and stop.
Human attach over the Unix socket with live output and optional stdin control.
Tmux-backed sessions that can reconnect to live processes after daemon restart.
HTTP/JSON request surface.
MCP-compatible stdio JSON-RPC tool surface.
OTEL-style JSONL trace events for daemon requests.

CLI

agent-pty demo

agent-pty doctor
agent-pty serve --socket ~/.agent-pty.sock
agent-pty status
agent-pty stop
agent-pty serve-http --addr 127.0.0.1:4319
agent-pty mcp-stdio

agent-pty new --repo ~/src/evalops/platform --name codex-1
agent-pty new --repo ~/src/evalops/platform --name durable-1 --backend tmux
agent-pty send codex-1 "cargo test"
agent-pty attach codex-1
agent-pty screen codex-1 --format markdown
agent-pty screen codex-1 --format json
agent-pty wait codex-1 --until "finished in"
agent-pty wait codex-1 --until "idle:2s"
agent-pty list
agent-pty proof codex-1
agent-pty proof codex-1 --html
agent-pty replay codex-1 --json
agent-pty fork codex-1 --new-name repair-b --copy-worktree
agent-pty trace-path
agent-pty kill codex-1

Unix Socket Protocol

The daemon accepts newline-delimited JSON on the configured Unix socket. Most connections carry one request and receive one response. attach is the exception: after the JSON handshake, the socket switches into a bidirectional terminal stream.

{"op":"new","id":"codex-1","repo":"/repo","shell":"/bin/sh","rows":24,"cols":80,"env":{}}
{"op":"new","id":"durable-1","repo":"/repo","shell":"/bin/sh","rows":24,"cols":80,"env":{},"backend":"tmux"}
{"op":"send","id":"codex-1","text":"cargo test","enter":true}
{"op":"approve","id":"codex-1","command":"printf 'ok\\n' # vault write","rule":"vault write","ttl_ms":600000}
{"op":"send","id":"codex-1","text":"printf 'ok\\n' # vault write","enter":true,"approval":"<token>"}
{"op":"attach","id":"codex-1","read_only":false,"history_bytes":12000}
{"op":"wait","id":"codex-1","until":"regex:finished in","timeout_ms":30000}
{"op":"screen","id":"codex-1"}
{"op":"list"}
{"op":"proof","id":"codex-1"}
{"op":"fork","id":"codex-1","name":"repair-b","copy_worktree":true}
{"op":"replay","id":"codex-1"}
{"op":"trace_path"}
{"op":"kill","id":"codex-1"}
{"op":"shutdown"}

Daemon Lifecycle

Use doctor before starting the daemon or when a user reports that the CLI cannot connect:

agent-pty doctor
agent-pty doctor --json

Use status and stop for day-to-day lifecycle checks:

agent-pty status
agent-pty status --json
agent-pty stop

status --json succeeds even when the daemon is unreachable, returning running: false with the connection diagnostic in error. That makes it safe for scripts and agents to call without turning "not running" into an exception.

Human Attach

Attach streams the session transcript and live output into the current terminal. By default, stdin is forwarded into the PTY, so a human can answer prompts or interrupt a process directly:

agent-pty attach codex-1

For observation-only use, pass --read-only:

agent-pty attach codex-1 --read-only

For smoke tests and scripts, --timeout exits automatically:

agent-pty attach codex-1 --read-only --timeout 2s
printf 'yes\n' | agent-pty attach codex-1 --timeout 1s

Attach uses a streaming Unix-socket handshake, not the one-request/one-response JSON protocol. Each attach is recorded as evidence, and any bytes typed through the attach channel are logged as normal send_keys actions.

Semantic Screen

agent-pty screen <name> --format json returns the raw screen text, spans, and a semantic summary. The summary is built for agent consumption:

latest prompt-looking line
latest command-looking line
error-looking line indexes
URLs and file paths
spinner-looking line indexes
active child process, when visible from the process tree

Session Backends

The default pty backend is an in-process portable PTY. It is fast and has the most complete local state, but the live process dies with the daemon.

Use --backend tmux when a session must survive daemon restarts:

agent-pty new --repo "$PWD" --name durable-1 --backend tmux
agent-pty send durable-1 "npm test -- --watch"
agent-pty stop
agent-pty serve --socket ~/.agent-pty.sock
agent-pty screen durable-1

Tmux-backed sessions are persisted in sessions.json with their tmux session name. A fresh daemon using the same log directory can reconnect to the live tmux session for send, screen, wait, attach, proof, replay, and kill.

HTTP Transport

agent-pty serve-http exposes the same request model over POST /request.

curl -sS http://127.0.0.1:4319/request \
  -d '{"op":"screen","id":"codex-1"}'

Responses use the same envelope as the Unix socket transport:

{"ok":true,"data":{"type":"screen","data":{"rows":24,"cols":80,"text":"...","spans":[]}},"error":null}

MCP Tools

agent-pty mcp-stdio exposes these MCP-compatible tools over stdio JSON-RPC:

terminal.new
terminal.send
terminal.screen
terminal.wait
terminal.kill
terminal.replay
terminal.list
terminal.proof
terminal.fork
terminal.approve

Evidence Model

Every action and observation is written to the session JSONL log. A proof bundle summarizes:

commands run
files changed
latest git status and diff
blocked policy actions
approved policy actions
nonzero exits
screen tail
log-integrity status
event-log path
generated JSON, Markdown, and HTML artifact paths

Before events are written, common secret-like values are redacted from commands, observations, and git snapshots. Each event is linked into a SHA-256 hash chain; proof bundles report whether the replayed log verifies.

agent-pty proof <name> prints the Markdown proof path by default. agent-pty proof <name> --html prints the browser-friendly HTML proof path, and agent-pty proof <name> --json returns the full structured bundle.

Policy Gate

The built-in policy requires approval for these high-risk command families before they reach the PTY:

rm -rf
git push --force
terraform apply
kubectl delete
vault write
gh pr merge

Policy denials and approvals become evidence events, so a run can prove that a dangerous mutation was avoided, explicitly approved, or blocked by policy.

Create a one-time approval for the exact command, then spend it on send:

cmd="printf 'approved\n' # vault write"
token="$(agent-pty approve codex-1 "$cmd" --rule "vault write" --ttl 10m)"
agent-pty send codex-1 "$cmd" --approval "$token"

Daemon surfaces can load JSON policy files with --policy:

agent-pty serve --policy ./agent-pty-policy.json
agent-pty serve-http --policy ./agent-pty-policy.json
agent-pty mcp-stdio --policy ./agent-pty-policy.json

Example policy:

{
  "builtin_rules": true,
  "rules": [
    {
      "label": "block secret writes",
      "pattern": "(?i)secret-token",
      "action": "deny"
    },
    {
      "label": "allow tmp cleanup",
      "pattern": "rm -rf /tmp/agent-pty-safe",
      "action": "allow"
    }
  ]
}

Rule actions are allow, require_approval, or deny. Custom rules are evaluated before built-ins, which lets a policy create tightly scoped exceptions.

Local Verification

cargo fmt -- --check
cargo test
cargo build

Deep Tmux E2E

The deepest local check is an ignored black-box test that drives the compiled binary through real tmux panes and real daemon processes:

cargo test --test e2e_tmux -- --ignored --nocapture

The test invokes:

scripts/e2e-tmux.sh --ci --artifacts target/e2e-tmux/latest

It creates a temporary broken Rust repo from fixtures/broken-rust-project, starts a detached tmux session, and exercises:

Unix socket daemon and CLI commands
HTTP daemon with curl
MCP stdio JSON-RPC tools
interactive prompt handling with read
Python REPL interaction
long-running output and idle waits
ANSI/color and cursor-return output
blocked policy commands and audited denial evidence
git worktree forks for passing and failing repair attempts
proof bundles for both repair attempts
concurrent replay and proof clients hammering the same evidence log
daemon restart followed by replay from durable logs
tmux pane capture and script terminal transcript capture

Artifacts are written under target/e2e-tmux/latest:

report.md
tmux-driver-pane.txt
tmux-daemon-pane.txt
tmux-http-pane.txt
tmux-observer-pane.txt
driver.typescript
agent-pty-logs/*.jsonl
agent-pty-logs/traces.jsonl
proofs/*.proof.md
proofs/*.proof.html

This harness is intentionally slower and more operationally realistic than the normal Cargo suite. It is the check to run before claiming that terminal, transport, replay, proof, policy, and artifact behavior work end to end.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
fixtures/broken-rust-project		fixtures/broken-rust-project
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-pty

Five-Minute Quickstart

What Works Now

CLI

Unix Socket Protocol

Daemon Lifecycle

Human Attach

Semantic Screen

Session Backends

HTTP Transport

MCP Tools

Evidence Model

Policy Gate

Local Verification

Deep Tmux E2E

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-pty

Five-Minute Quickstart

What Works Now

CLI

Unix Socket Protocol

Daemon Lifecycle

Human Attach

Semantic Screen

Session Backends

HTTP Transport

MCP Tools

Evidence Model

Policy Gate

Local Verification

Deep Tmux E2E

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages