Container auto-fix: red container → configurable agent session, with safeguards by falkoro · Pull Request #35 · falkoro/shelldeck

falkoro · 2026-06-03T14:49:09Z

What

A red container (unhealthy / restarting / crashed) gets a Fix button (🔧). Clicking it opens a danger-gated modal; on confirm, ShellDeck spawns a tmux session running a configurable agent, seeded with the container's read-only diagnostics, to attempt a safe remediation while you watch it live.

Configurable "fix agents" registry — {label, command}, seeded with Claude/Grok/Codex, managed in-app (a "Fix agents" editor; GET/POST /api/fix-agents). {prompt} / {promptfile} placeholders are substituted, else the prompt is appended.
Per-host protected flag → propose-only. On a protected host the agent is told to inspect read-only and print recommended commands, run nothing. Unprotected hosts attempt a safe fix. (Checkbox in the remote-hosts editor.)
Danger gate. Modal shows host/container, a red "potentially dangerous" warning (or blue propose-only banner), an agent picker, and a required acknowledgement checkbox — Confirm is disabled until it's checked.
Read-only diagnostic (ps -a / inspect / logs --tail 200, zero mutating verbs). Prompt is written to a file and passed quoted. A destructive-verb ban (rm/rmi/compose down -v/volume prune/system prune/kill/reboot…) is baked into the agent's instructions, scoped to only the target container.
Fix sessions are dynamic sessions — they appear as normal cards and close for real via POST /api/sessions/remove (kill + drop).

Safety model (read this)

ShellDeck cannot hard-sandbox an interactive agent session on a host — so protection is layered, not absolute:

danger-ack gate before spawn, 2. read-only diagnostic, 3. protected→propose-only prompt mode, 4. destructive-verb ban in the prompt.

Two residual risks, both LOW/by-design (from the security review):

propose-only is advisory (prompt-level; not server-enforced). A protected host gives no technical guarantee against mutation if the agent ignores the prompt.
a hostile container's logs are embedded in the prompt → prompt-injection risk against the fixer agent (out of the shell-injection threat model, but real).

Security review

Independent security-reviewer pass: no CRITICAL/HIGH/MED findings. No command-injection path — name/engine/target are charset-validated to a metacharacter-free set, the attacker-influenced prompt/diagnostics are shell_word single-quote-escaped (or never inlined on the default branch), and the quoting composes correctly through tmux → sh -c → zsh -lic. All write endpoints are login+unlock+CSRF-gated; GET /api/fix-agents is unlock-gated; write_prompt can't escape its dir.

⚠️ Overlap with #34

This branch adds its own src/dynamic_sessions.rs (master lacks it). Its DynamicSession is {name,label,family,alias,badge,start}, which diverges from #34's {name,label,start,cwd}. Whichever lands second needs a small struct-unification merge. Recommend merging #34 first, then I'll reconcile this on top (one dynamic_sessions module).

Verification

40 Rust tests pass (incl. new fix.rs guards: quoting can't be broken by attacker prompt, diagnostic is read-only, session names stay in charset).
Browser-verified on an isolated instance against a live (unhealthy) container: Fix button only on red cards; modal agent picker + ack-gates-Confirm + danger banner; fix-agents editor; protected checkbox; end-to-end spawn captured live diagnostics + correct SAFE-FIX instructions, and close killed the tmux session + cleared dynamic-sessions.
No cargo fmt churn; compact style preserved.

Note for the productization track

The spawn shell is hardcoded zsh (fine for this box). For the open-source/enterprise build it should honor $SHELL/a configured default-command — follow-up, not a blocker.

🤖 Generated with Claude Code

A red (unhealthy/restarting/crashed) container gets a Fix button that spawns a tmux session running a configurable agent, seeded with the container's read-only diagnostics, to attempt a safe remediation while the operator watches. - Configurable "fix agents" registry (label + command), seeded with Claude/Grok/Codex, editable in-app via /api/fix-agents. - Per-host `protected` flag → propose-only mode (agent inspects + prints recommended commands, runs nothing). Advisory (prompt-level). - Danger modal: agent picker + required acknowledgement gate before spawn. - Read-only diagnostic (ps/inspect/logs, no mutating verbs); prompt written to a file and passed quoted; destructive-verb ban baked into the agent prompt. - Fix sessions are dynamic sessions: appear as cards, real close via /api/sessions/remove (kill + drop from dashboard). - All write endpoints login+unlock+CSRF gated; GET /api/fix-agents unlock-gated. Security-reviewed: no injection (charset-validated name/engine/target + shell_word escaping holds through tmux→sh -c→zsh -lic). 40 tests pass; UI + spawn/close browser-verified against a live unhealthy container. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…cord Commit security review caught a protected-host safety-gate bypass: the SSH target was taken from the client request while `protected` was recomputed by re-matching that target — so addressing a protected host via an alternate target string (IP vs hostname, different user@) failed the string match, propose became false, and the agent would run in full mutating mode against a host the operator marked protected. resolve_host() now derives target + protected + label from ONE record: a named host (id/label) is authoritative and the client-sent target is ignored; a free-form target to an unconfigured host fails safe to propose-only; a matching record honours its own flag. Covered by unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a guarded “Fix” workflow for unhealthy/restarting/crashed containers that spawns a tmux-backed “fix agent” session seeded with read-only diagnostics, plus new runtime configuration for fix agents, protected remote hosts (propose-only mode), and persisted dynamic sessions.

Changes:

Add backend support for /api/fix and /api/fix-agents (load/save), including prompt-file generation and tmux session spawning.
Introduce persisted “dynamic sessions” and merge them into the session list; add a new “remove” endpoint for sessions.
Update the UI to show a Fix (🔧) button on “red” containers, a danger-gated modal, a “Fix agents” editor, and a protected flag in the remote-host editor.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/tmux.rs	Adds dynamic sessions into the session model and introduces `create_dynamic_session`.
src/routes.rs	Adds `/api/fix`, `/api/fix-agents`, and `/api/sessions/remove`; adjusts stop behavior to remove dynamic entries.
src/remote_hosts.rs	Persists/normalizes new `protected` flag for remote hosts and updates tests.
src/main.rs	Wires new modules (`dynamic_sessions`, `fix`, `fix_agents`).
src/fix.rs	Implements fix-session creation: diagnostics, prompt generation, agent command construction, and dynamic session spawn.
src/fix_agents.rs	Implements fix-agent registry load/save with normalization and seeding.
src/dynamic_sessions.rs	Implements persistence layer for dynamic sessions (load/save/remove).
src/config.rs	Adds config paths for fix agents and dynamic sessions, plus `protected` on remote hosts.
public/metrics.js	Adds Fix button + danger-gated modal + `/api/fix` call in served JS.
public/events.js	Adds “Fix agents” button to the Configure area.
public/actions.js	Reworks remote-host editor into row UI with `protected`; adds fix-agents editor.
frontend/metrics.ts	TypeScript source for Fix button + modal + `/api/fix` call.
frontend/events.ts	TypeScript source for “Fix agents” button insertion.
frontend/actions.ts	TypeScript source for remote-host editor rows + `protected`, plus fix-agents editor.
.gitignore	Ignores fix-agents.json, dynamic-sessions.json, fix-prompts/, and local Playwright scratch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
+// Resolve the SSH target, propose-only flag, and display label from ONE authoritative source.
+// The `protected` (propose-only) gate must be decided by the same record that supplies the
+// target, so a client can't dodge a protected host by sending an alternate target string that
+// fails to string-match the configured record. Returns (target, propose_only, host_label).
+fn resolve_host(


+pub async fn create_dynamic_session(config: Arc<Config>, spec: KnownSession) -> Result<String, String> {
+    dynamic_sessions::save_entry(config.clone(), &spec).await?;
+    match launch_spec(&spec, "created").await {
+        Ok(message) => Ok(message),
+        Err(error) => {


+async fn api_session_remove(
+    State(state): State<AppState>,
+    headers: HeaderMap,
+    connect: ConnectInfo<SocketAddr>,
+    axum::Json(body): axum::Json<NameBody>,
+) -> Response {
+    if let Some(response) = guard(&state, &headers, &connect) {
+        return response;
+    }
+    if let Err(response) = require_unlock(&state, &headers).and_then(|_| require_action(&headers)) {
+        return response;
+    }
+    stop_session_result(state.config.clone(), &body.name).await
 }


+async fn stop_session_result(config: std::sync::Arc<config::Config>, name: &str) -> Response {
+    match tmux::stop_session(name).await {
+        Ok(message) => {
+            let _ = dynamic_sessions::remove(config, name).await;
+            webutil::json_response(
+                StatusCode::OK,
+                &serde_json::json!({ "ok": true, "message": message }),
+            )
+        }
+        Err(error) => webutil::json_response(
+            StatusCode::BAD_REQUEST,
+            &serde_json::json!({ "error": error }),
+        ),
+    }
+}


+function readRemoteHostRows(root: HTMLElement): RemoteHostEntry[] {
+  return Array.from(root.querySelectorAll<HTMLElement>('[data-remote-host-row]'))
+    .map((row) => {
+      const value = (field: string): string => row.querySelector<HTMLInputElement>(`[data-host-field="${field}"]`)?.value.trim() || '';
+      const label = value('label');
+      const id = value('id') || idFromLabel(label);
+      return {
+        id,
+        label,
+        target: value('target'),
+        protected: Boolean(row.querySelector<HTMLInputElement>('[data-host-field="protected"]')?.checked),
+      };
    })
    .filter((host) => host.id && host.label && host.target);
 }


+function readRemoteHostRows(root) {
+    return Array.from(root.querySelectorAll('[data-remote-host-row]'))
+        .map((row) => {
+        const value = (field) => row.querySelector(`[data-host-field="${field}"]`)?.value.trim() || '';
+        const label = value('label');
+        const id = value('id') || idFromLabel(label);
+        return {
+            id,
+            label,
+            target: value('target'),
+            protected: Boolean(row.querySelector('[data-host-field="protected"]')?.checked),
+        };
    })
        .filter((host) => host.id && host.label && host.target);
 }


+  await Promise.all([loadFixAgents(), loadRemoteHostConfig()]);
+  const remote = remoteHostConfig.find((item) => item.id === host || item.label === host);


+    await Promise.all([loadFixAgents(), loadRemoteHostConfig()]);
+    const remote = remoteHostConfig.find((item) => item.id === host || item.label === host);


falkoro · 2026-06-07T09:09:07Z

🧠 Grok Composer review (grok-composer-2.5)

Review (diff only)

Scope note: The diff is truncated in the middle (container-fix handler, fix_agents / dynamic_sessions modules, and related routes). Findings below are from the visible hunks only.

1. Brand consistency

Clean. No header/footer logos, favicons, og:image/schema, hero mockups, or marketing copy in this diff.

2. Spot Suite architecture

Not applicable / clean for suite rules. This is a local ops dashboard (tmux, SSH, Docker/Podman, JSON config on disk)—not tenant-scoped Workers/D1/OAuth/SECRETS KV.

Positive:

.gitignore:12-16 — fix-agents.json, dynamic-sessions.json, fix-prompts/, .playwright-mcp/ kept out of git.
main.rs (fix-agents save handler) — require_unlock + require_action before persisting agents.

No flags for: client-supplied tenant_id, unscoped D1, committed secrets, missing OAuth/cron/migration, or marketing Pages deploy.

3. Correctness + security

Medium — confirm server enforces protected (not visible in diff).
frontend/actions.ts:490 adds protected?: boolean and frontend/actions.ts:507 exposes it in the remote-host editor; frontend/metrics.ts:248 shows a Fix action for unhealthy containers. If the container-fix API only checks unlock/action headers and does not read protected from persisted host config, production hosts can be auto-fixed via direct API call. Needs server-side check in the truncated fix handler.

Medium — arbitrary shell execution surface (likely intentional, still worth hardening).
frontend/actions.ts:576 stores free-form command for fix agents; src/tmux.rs:391 runs spec.start via tmux new-session. Anyone who can unlock the dashboard and save agents / spawn dynamic sessions effectively has shell access. Acceptable for a personal ops tool; risky if this endpoint is ever network-exposed beyond localhost/trusted operators. Consider allowlisting agent IDs/commands or validating against configured agents only at execution time.

Low — dynamic-session cleanup looks correct.
src/tmux.rs:399-406 rolls back dynamic_sessions on failed launch; main.rs:1030-1042 stop_session_result removes dynamic session on stop.

Low — UI escaping is consistent.
frontend/actions.ts:507, frontend/actions.ts:576, frontend/metrics.ts:248 use escapeHtml for injected attribute values.

Low — frontend/events.ts:217-224 duplicates editRemoteHostsBtn lookup to inject the Fix agents button. Works, but brittle if markup already includes #editFixAgentsBtn.

Clean: No customer-facing “workspace” copy; protected-host label says “production,” not suite tenant terminology.

Verdict: Solid ops-dashboard extension with sensible gitignore and auth on the visible fix-agents API; verify truncated server code enforces protected on fix and treats fix-agent commands as privileged shell, not just validated form input.

Copilot AI review requested due to automatic review settings June 3, 2026 14:49

Copilot started reviewing on behalf of falkoro June 3, 2026 14:49 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container auto-fix: red container → configurable agent session, with safeguards#35

Container auto-fix: red container → configurable agent session, with safeguards#35
falkoro wants to merge 2 commits into
masterfrom
feat/container-autofix

falkoro commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

falkoro commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		await Promise.all([loadFixAgents(), loadRemoteHostConfig()]);
		const remote = remoteHostConfig.find((item) => item.id === host \|\| item.label === host);

Conversation

falkoro commented Jun 3, 2026

What

Safety model (read this)

Security review

⚠️ Overlap with #34

Verification

Note for the productization track

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

falkoro commented Jun 7, 2026

🧠 Grok Composer review (grok-composer-2.5)

Review (diff only)

1. Brand consistency

2. Spot Suite architecture

3. Correctness + security

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants