Helm — take the wheel from your agent

Real-time oversight of LLM agents. Watch the agent execute, approve or edit every tool call, abort if it goes sideways. The third piece of the trilogy with Sentinel and Recourse.

React 19 + TypeScript + Tailwind v4 · one canonical agent run (fix a session race condition) · simulated 9-step execution with one mandatory approval gate.

Live demo → · Sentinel · Recourse

The thesis

Agents today run away. They open files, run commands, push branches, hit APIs — and the operator either watches a spinner or reads a transcript after the fact. Both extremes miss the moment that actually matters: the moment a tool call is about to fire.

Helm puts the human back in the loop while the agent is acting. Every tool call is previewed: tool kind, target, intended effect, diff or command, cost notice. Reversible steps can be auto-allowed by policy. Irreversible steps — pushes to shared remotes, network calls, deletes — always pause for explicit approval, never auto. The operator can allow, edit before allowing, reject, or abort the run.

This is not a transcript viewer. The agent is genuinely waiting for you.

The canonical run

The demo simulates an agent fixing a session race condition — a real engineering task with a real range of risks:

1. read   src/auth.ts                        trivial    auto-allow
2. read   src/middleware/session.ts          trivial    auto-allow
3. exec   npm test (baseline)                reversible
4. write  src/store/sessionStore.ts          reversible (diff shown)
5. write  src/auth.ts                        reversible (diff shown)
6. write  test/session.race.test.ts          reversible (diff shown)
7. exec   npm test (verify fix)              reversible
8. vcs    git checkout -b + commit           reversible
9. network git push + gh pr create           DANGER — explicit approval

Step 9 is the moment the run touches shared state. Helm pauses there, shows the exact git push and gh pr create it's about to run, surfaces a cost notice ("First action that affects shared state"), and waits. You can allow, edit (e.g. change the PR base branch), reject the step with a written rationale, or abort the whole run.

Six load-bearing primitives

Primitive	What it does
`ToolBadge`	Names the tool the agent is using: `read` / `search` / `write` / `exec` / `vcs` / `network` / `delete`. Pattern + color so the category is legible at a glance.
`ReversibilityChip`	Names the recovery cost: `trivial` (read-only), `reversible` (undo via another step), `danger` (affects shared state — never auto-allowed), `destructive` (cannot be undone — extra confirm). Cross-hatched on danger/destructive — the same primitive Sentinel uses for hallucinations and Recourse for fabricated content.
`ConfidenceTag`	The agent's calibrated confidence in its own next move — same vocabulary as Sentinel and Recourse, applied to intentions instead of claims.
`DiffView`	Renders the exact diff before a write. The point is to make the change small enough to read at a glance, so approval is actually deliberate.
`ApprovalGate`	The load-bearing interaction. Renders intent, command/path, diff, cost notice; offers allow / edit / reject. Danger and destructive steps render with cross-hatch.
`AgentStream`	The right-rail thought stream. Every step rendered as a row, with status (queued / running / complete / rejected / aborted), animated transitions, the running step pulsing with a typing cursor.

How the trilogy fits

	Sentinel	Recourse	Helm
Oversight target	AI output	Institution behavior	Agent behavior
Reader	Expert reviewer	End-user citizen	Operator / developer
Timing	After-the-fact	Reactive (responding to letters)	Real-time, in flight
Unit	`AIClaim`	`CaseClaim` + `Deadline`	`AgentStep` + `ToolCall`
Verdict surface	Accept / Edit / Reject per claim	Verify / Send	Allow / Edit / Reject per tool call
Audit object	`AuditRecord` of reviewed claims	`MailingRecord` + decision ledger	`DecisionRecord` per step
Failure mode named	Hallucination (cross-hatch)	Fabricated statute (cross-hatch)	Irreversible action (cross-hatch)

The cross-hatch is intentional across all three. It's the same visual primitive — "this thing is in a different category, you must respond differently" — applied to three different failure modes. A reviewer who's seen Sentinel will recognize Helm's danger pattern instantly, and vice versa.

Design moves

1. Reversibility is the policy axis, not safety

Many agent products treat "is this safe?" as a single binary. Helm splits it into four bands based on recovery cost, because that's what actually matters when you're deciding to gate. A file edit and a git push are both "writes," but they sit on opposite ends of the spectrum.

2. Auto-allow has a fixed ceiling

The auto-allow toggle covers trivial and reversible steps only. danger and destructive cannot be auto-allowed in any policy. This is a deliberate product posture: if auto-allow could cover the dangerous steps, the gate is theater.

3. The diff is part of the gate, not a separate view

The user doesn't have to click into a tab to see what's about to be written. The diff renders in the gate itself, scoped to the lines that change. If the change is too big to read, it's too big to approve.

4. Edit-before-allow

The operator can rewrite the intent or the command before approving. Common case: the agent picked the right action but wrong target ("push to develop, not main"). The edit is logged in the decision ledger alongside the original.

5. Reject collects a rationale

A reject isn't just a click — the user is asked why. That note becomes part of the run's audit trail and, in production, becomes training signal for the next version of the policy.

6. The agent stream is sticky

You don't lose the workspace pane when you scroll the thought stream. The "files the agent is touching" view stays visible — accountability is glanceable, not buried.

Get it running

npm install
npm run dev

Open the printed URL.

npm run build      # tsc -b + vite build → dist/
npm run preview    # serve the built dist/

Stack

Vite 6 + React 19 + TypeScript (strict)
Tailwind v4 with @theme design tokens (OKLCH, shifted to steel-cyan to distinguish from Sentinel's info-blue and Recourse's ember)
Radix UI for the audit drawer
Lucide icons

What this prototype isn't trying to be

Not a real agent runner. The demo simulates a 9-step run with pre-canned outputs. A production version would wrap an actual agent (Anthropic Computer Use, OpenAI Operator, Cursor's agent mode, a custom LangGraph agent) and intercept its tool-call requests via a policy layer.
Not a transcript viewer. Transcript viewers show you what already happened. Helm pauses the agent and waits.
Not a chat interface. Chat hides structure. The two-pane "workspace + stream" surface shows you what the agent is touching and what it's planning, side by side.

Where this could go

Wrap a real agent (Claude Computer Use, OpenAI Operator) with a policy layer that intercepts tool calls and routes them through this UI.
Multi-operator runs — multiple humans can be required to approve different categories of step (engineer for code, SRE for prod deploy, security for cred access).
Integration with Sentinel — when an agent step generates AI output (a commit message, a PR body, a customer message), pass it through Sentinel's review primitives before it commits.
Recording mode — capture decisions across many runs to learn what gates the operator actually approves vs. rejects, and propose new auto-allow policies grounded in real data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
public		public
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Helm — take the wheel from your agent

The thesis

The canonical run

Six load-bearing primitives

How the trilogy fits

Design moves

1. Reversibility is the policy axis, not safety

2. Auto-allow has a fixed ceiling

3. The diff is part of the gate, not a separate view

4. Edit-before-allow

5. Reject collects a rationale

6. The agent stream is sticky

Get it running

Stack

What this prototype isn't trying to be

Where this could go

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Helm — take the wheel from your agent

The thesis

The canonical run

Six load-bearing primitives

How the trilogy fits

Design moves

1. Reversibility is the policy axis, not safety

2. Auto-allow has a fixed ceiling

3. The diff is part of the gate, not a separate view

4. Edit-before-allow

5. Reject collects a rationale

6. The agent stream is sticky

Get it running

Stack

What this prototype isn't trying to be

Where this could go

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages