feat: v4 redesign — 13 fat verb-tools, token-efficient output, adoption layer by hunchom · Pull Request #11 · hunchom/claude-code-ssh

hunchom · 2026-05-17T08:40:12Z

Summary

Tool consolidation — collapses the prior 51-tool surface into 13 fat verb-tools (ssh_run, ssh_file, ssh_find, ssh_logs, ssh_service, ssh_health, ssh_db, ssh_backup, ssh_session, ssh_net, ssh_docker, ssh_fleet, ssh_plan), each routing an action enum to the existing handlers via per-domain dispatchers. The schema stays small enough (~5k tokens) to remain un-deferred.
Token-efficient output — render primitives (renderHeader / renderRows / renderKV / indentBody), a compact default format, indentation instead of code fences, and ls / ps command-output compressors ahead of truncation.
New capabilities — ssh_find (timeout-bounded, match-capped remote search), ssh_run script / detach / job-status / job-kill (chained commands plus detached background jobs), pooled-connection reuse without a per-call liveness probe.
Adoption layer — selling tool descriptions that name the raw ssh / scp / rsync each tool replaces, a project rule file entry, and a fail-open PreToolUse Bash-nudge hook.

Hardening pass

A strict by-the-book review of the consolidated surface — treating the pre-existing handler code as untrusted — found and fixed, before merge:

Critical — a ssh_run action: fleet call OOM-crashed the whole server on the production group-object shape (infinite loop in pMap); ssh_alert_setup action: check never fired because the threshold evaluator read field names the health-check producer never emits.
High — unstructured crash on an unknown fleet group name; ssh_file action: edit treated literal old_text as a regex (silent wrong-replacement on a write path); the ssh_plan approval-token gate was bypassable by a downward risk override.
Medium / Low — a known_hosts host match using substring matching; a leaked follow-session map; dead/misleading args; a dead, command-injection-prone deploy-helper.js module (deleted); stale comments.

Every finding was independently re-verified closed with no regression.

Test Plan

npm test — 1070 passing, 0 failing across 59 suites
./scripts/validate.sh — JS syntax, MCP server startup, no .env leak
node --check src/index.js — 13-tool registration valid
Smoke-test the 13 tools against a live configured server
Confirm the 13-tool schema stays un-deferred (no tool-search fallback)

Consolidate the 51-tool MCP surface into 13 fat verb-tools with a token-efficient plain-text output model and an adoption strategy. Incorporates findings from a 4-agent design review covering tokens, speed, UX, and architecture.

Plan 1 of 5: pre-build schema gate plus four additive render primitives (renderHeader, indentBody, renderKV, renderRows). Additive-only, zero regression risk.

Plan 2 of 6: rewrite defaultRender and renderMarkdown onto the Plan 1 primitives, replace fences with indentation, make compact the default format.

Plan 3 of 6: command-output compressor module (ls, ps) wired into formatExecResult ahead of truncation, with a raw:true bypass.

…r regexes

Seven plan files: the 12-tool dispatcher facade (3), new capabilities — remote search, script/detach jobs, connection reuse (3), and adoption (1).

- ssh_docker description dropped a `compose` action absent from both the schema enum and the handler; ssh_fleet description was missing the `command_alias` action the enum declares - remove an orphaned `3. Custom` list item the TOOL_MANAGEMENT.md overview rewrite left behind - ssh-bash-nudge.mjs resolves the .env path via fileURLToPath rather than URL.pathname, which is not percent-decoded

- ssh_run schema marks server optional -- it was required, which made the fleet action (group-targeted, no server) uninvokable through MCP - ssh_db list ships a renderDbList renderer -- without it the databases array collapsed to one JSON line via defaultRender - de-stale README tool/test counts to the v4 13-tool surface - correct the ssh-docker.js header comment (compose is unadvertised)

+          result = deps.addServersToGroup(name, members);
+          output = `[ok] Group '${name}' members: ${result.servers.join(', ')}`;
+        } else {
+          result = deps.updateGroup(name, { description });


buildTcpCommand's /dev/tcp fallback interpolated the model-supplied target_host raw into a bash -c line -- a hostile value such as `x/$(cmd)/y` reached a remote shell. Add a SAFE_HOST_RE allowlist (hostname / IPv4 / IPv6 chars only): handleSshPortTest rejects a bad host before connecting, and buildTcpCommand throws on one. Bump the README test count to 1031.

…atches

Every ssh_db handler hard-fails without a valid db_type, but the dispatcher REQUIRED map omitted it -- a missing db_type produced the handler's vaguer "unsupported db_type: undefined" instead of a clean dispatcher-level "action requires: db_type". Add db_type to all four action entries.

Tighten SAFE_HOST_RE so target_host cannot begin with '-'. A real hostname or IP never does, and a dash-prefixed value risks being read as a flag by nc/getent. Defense-in-depth on top of the existing shell-metacharacter allowlist.

stepRisk() honored any caller-supplied risk value, including one below the action table risk. A model-authored step {action:'exec_sudo', risk:'low'} classified as low, so highestRisk() never saw high and the approve-token gate was skipped -- a privileged step ran with no token. Compute the table risk first; return the override only when it ranks strictly higher. Downward overrides are ignored.

Three defects in SSHSessionV2: - _drainWaiters resolved only the head waiter per stream event. Two sentinels arriving in one data chunk left the second waiter hanging to timeout. Now loops, draining every head waiter already buffered. - onStderr fed a SECOND stream into the SAME StringDecoder as stdout; a multibyte char split across an interleaved boundary corrupted. client.shell() folds stderr into the pty stream -- the stderr listener was dead code, now removed. - renderSessionSend wrapped stdout/stderr in ```text fences, which break when the payload contains a ``` line. Switched to indentBody, consistent with every other v4 renderer.

getDefaultConfig() still emitted the seven pre-v4 groups (core, sessions, monitoring, backup, database, advanced, gamechanger). The real v4 groups are core/ops/advanced. In custom mode, isToolEnabled resolved a tool to ops/advanced, found no such key in the stale config, and fell through to return true -- silently re-enabling tools a user had disabled after upgrade. Default groups are now the three real v4 keys.

isHostKnown / getCurrentHostKey tested line.includes(hostEntry), so example.com matched a notexample.com line, an example.com.evil.net line, or a coincidental hit inside the base64 key body -- and never matched hashed |1| entries. Both feed the live connect() host-key verifier, so a mismatch silently fell through to TOFU re-acceptance. Now parse each line and compare host (and port) exactly, reusing parseKnownHostLine from key-tools.js. Hashed entries never match.

handleSshExecuteGroup treated resolveGroup's result as an array, but the production DEPS.resolveGroup returns { name, servers:[...] } (or null). pMap then read .length on a non-array -> undefined -> the worker guard i>=n was never true -> infinite loop -> heap OOM -> the whole MCP server process crashed instead of returning a structured error. - handleSshExecuteGroup: normalize resolveGroup result, accepting both a bare array and the { name, servers } object; empty/null -> structured fail. - pMap: non-array items degrades to an empty result instead of looping. - DEPS.resolveGroup: wrap getGroup in try/catch; an unknown group name (getGroup throws "Group 'X' not found") now returns null, which the handler renders as a clean "group has no servers" fail, not a raw crash. Tests exercise the real production object shape and the null/unknown-group path; the prior fleet tests stubbed resolveGroup as a bare array and so missed both crashes entirely.

The ssh_file schema describes old_text as plain "Text to replace", so a caller supplies literal text. But the edit action mapped it straight into a patch { find } and applyPatches compiled find with new RegExp(). Regex metacharacters in literal text (. ( [ * ? etc.) were interpreted as regex: a literal-looking edit silently patched the wrong span, or threw "invalid patch regex" on an unbalanced bracket -- corrupting remote files. - applyPatches: honor a per-rule literal:true flag. Under it, find is regex-escaped before compiling and $-sequences in the replacement stay literal (no accidental capture-group backreferences). - ssh_file edit: build the patch rule with literal:true so old_text / new_text are matched and substituted verbatim. Direct patch[] callers (ssh_edit with explicit regex rules) are unchanged -- literal defaults off. Tests cover metachar substrings, an unbalanced bracket that no longer throws, and literal $-sequences in the replacement.

The deploy dispatch forwarded permissions and owner to handleSshDeploy, but that handler never destructures them and they are absent from the ssh_file schema -- pure dead forwarding. Meanwhile handleSshDeploy does read rollback_hook (run after a rollback to restore service) yet it was not in the schema, so a caller had no way to set a real feature. - ssh_file deploy dispatch: stop forwarding the dead permissions/owner. - ssh_file inputSchema: add rollback_hook (optional string).

The header claimed those branches exec "like handleSshExecute", but they call streamExecCommand with raw:true, skipping the OS timeout-wrapper that handleSshExecute's non-raw path applies. Comment-only.

compressLs always drops the single `total N` header line, so the shared footer's "1 line compressed" wrongly implied a content row was hidden. compressLs now supplies its own footer note ("ls total-line dropped"); the dispatcher honors a compressor-supplied note over the generic line-count footer.

buildDeploymentStrategy / detectDeploymentNeeds interpolated sudoPassword and remotePath raw into shell command strings -- command injection plus a credential on the pipeline. The module was imported by index.js but none of its exports were ever called; the live deploy path (handleSshDeploy) does not use it. Removing the file and its dead import block.

evaluateThresholds read metrics.cpu.usage_percent, memory.used_percent and disk[].used_percent -- field names handleSshHealthCheck never emits. Every read yielded undefined -> NaN -> Number.isFinite false, so the threshold branch was skipped and check reported "ok" on a server at 99% load. The real producer (monitoring-tools parseTopCpu/parseFreeMem/parseDf): - cpu has no aggregate usage field, only { user_pct, system_pct, idle_pct, iowait_pct } -> derive usage as 100 - idle_pct - memory exposes used_pct (not used_percent) - disk rows expose used_pct + device + mount (not filesystem/used_percent) Rewrite the three reads to those names; keep the Number.isFinite guard so a genuinely-missing metric still skips cleanly. Correct the stale comments that cited the wrong contract. Tests: the existing evaluateThresholds cases fed fabricated usage_percent/used_percent shapes that passed only because they matched the buggy reader. Re-point them at the real field names and add two end-to-end cases that pipe a genuine handleSshHealthCheck result (busy host / quiet host) through evaluateThresholds, asserting alerts fire / stay silent -- this guards against producer/consumer field drift.

The overview path JSON.parse(hc.content[0].text)'d the delegated health_check payload with no try/catch. It cannot throw today -- health_check is always invoked with format:'json' -- but a future shape change would surface as an unhandled SyntaxError out of handleSshMonitor instead of a structured failure. Wrap the parse; on failure return a structured ssh_monitor fail ("overview unavailable: ... unparseable payload") so the tool degrades gracefully. Add an injectable _healthCheck delegate (defaults to the real aggregator, production never overrides it) so the parse guard is genuinely testable. Tests: a happy overview path, a connection failure, and an injected garbage-payload delegate that asserts the tool degrades to a structured failure rather than throwing -- this last one fails without the try/catch.

The module-level sessions Map dropped an entry only on an explicit ssh_tail_stop. A follow-start whose stream closed (remote ended, connection dropped) but was never explicitly stopped left its state in the Map for the server's lifetime. The 1 MB ring buffer bounds per-session memory, but closed sessions were never reaped. Track readOffset (highest delivered offset) and closedAt per session. reapClosedSessions() drops a closed session once its buffer is fully read, or once it has been closed past a grace period even if unread (the stream is dead). It runs on every follow-start and follow-read. handleSshTailRead's opening sweep excludes the session it is about to serve so the caller still gets one final closed:true read; that session is then reaped at the end of the call. Tests: a closed + fully-read session is reaped on the next read with no explicit stop (fails without the fix), a closed session past the grace window is reaped even if never read, and an open session survives any sweep.

creds() in the ssh_db dispatcher forwarded host and port to all four db handlers, and the ssh_db inputSchema declared them -- but no db-tools handler or command builder reads either. buildMongoConnectionUri is the only place with host/port parameters and its sole caller never passes them, so every db op always hits the local socket. A caller passing host/port for a remote DB silently got the local one. Remove host/port from creds() and from the ssh_db inputSchema, and drop the host/port assertions from the dispatcher test (user/password forwarding still covered). Not wired through -- that is a feature, out of scope. Re-add only alongside a handler that honors them.

hunchom added 30 commits May 16, 2026 22:38

docs: add ssh-mcp v4 redesign design spec

1ff9189

Consolidate the 51-tool MCP surface into 13 fat verb-tools with a token-efficient plain-text output model and an adoption strategy. Incorporates findings from a 4-agent design review covering tokens, speed, UX, and architecture.

docs: add ssh-mcp v4 foundation implementation plan

88b8180

Plan 1 of 5: pre-build schema gate plus four additive render primitives (renderHeader, indentBody, renderKV, renderRows). Additive-only, zero regression risk.

build: add v4 schema-cost pre-build gate

a17ce6c

feat: add renderHeader v4 render primitive

c75dc4c

feat: add indentBody v4 render primitive

97f62c9

feat: add renderKV v4 render primitive

3a69fb5

feat: add renderRows v4 render primitive

609cabc

fix: harden render primitives against malformed rows

bac9fbf

docs: add ssh-mcp v4 output-rewrite implementation plan

7da1820

Plan 2 of 6: rewrite defaultRender and renderMarkdown onto the Plan 1 primitives, replace fences with indentation, make compact the default format.

refactor: rewrite defaultRender onto v4 render primitives

a280608

refactor: rewrite renderMarkdown onto v4 render primitives

7234ec6

feat: make compact the default v4 output format

a67ee1f

fix: restore value-level test assertions and harden kvRows

db77399

docs: add ssh-mcp v4 compressors implementation plan

e408e51

Plan 3 of 6: command-output compressor module (ls, ps) wired into formatExecResult ahead of truncation, with a raw:true bypass.

feat: add command-output compressor module with ls compressor

50c7540

feat: add ps command-output compressor

87f324d

feat: run command-output compression in formatExecResult

4982cf6

fix: correct compressPs trailing-newline handling and unify compresso…

8edfff6

…r regexes

docs: add ssh-mcp v4 plans 4-6 (facade, capabilities, adoption)

20b8d7a

Seven plan files: the 12-tool dispatcher facade (3), new capabilities — remote search, script/detach jobs, connection reuse (3), and adoption (1).

feat: add per-action arg validation helper for v4 dispatchers

792e9f0

feat: add ctx-factory helper for v4 dispatchers

31dad56

feat: add ssh_run v4 dispatcher (exec, sudo, fleet)

2462207

feat: add ssh_file v4 dispatcher

8c0da4e

fix: correct timeout remap and forward dropped args in v4 dispatchers

6a601ad

feat: add ssh_logs v4 dispatcher

ecd9874

feat: add ssh_service v4 dispatcher

20af91f

feat: add ssh_health v4 dispatcher

8d06e4a

feat: add ssh_db v4 dispatcher

22e8830

feat: add ssh_backup v4 dispatcher

8154a0c

feat: add ssh_session v4 dispatcher

127a7b9

hunchom added 8 commits May 17, 2026 03:34

chore: sync gitnexus index symbol counts

c37d5cf

feat: selling v4 tool descriptions that name the bash they replace

49454fa

docs: add CLAUDE.md rule to prefer ssh_* MCP tools over raw ssh

747de27

feat: add PreToolUse Bash-nudge detector for raw ssh invocations

fb3907c

feat: register PreToolUse Bash-nudge hook in .claude/settings.json

8ed2dab

docs: correct stale tool and test counts to the v4 surface

b753885

github-advanced-security AI found potential problems May 17, 2026

View reviewed changes

Comment thread src/fleet-adapters.js

result = deps.addServersToGroup(name, members);

output = `[ok] Group '${name}' members: ${result.servers.join(', ')}`;

} else {

result = deps.updateGroup(name, { description });

hunchom added 20 commits May 17, 2026 04:51

chore: sync package-lock.json with package.json node engine and dep p…

297407c

…atches

fix: reject leading-dash hosts in ssh_net port-test

4ca37d7

Tighten SAFE_HOST_RE so target_host cannot begin with '-'. A real hostname or IP never does, and a dash-prefixed value risks being read as a flag by nc/getent. Defense-in-depth on top of the existing shell-metacharacter allowlist.

docs: correct ssh-run header -- script/detach/job-* use raw exec

c02f164

The header claimed those branches exec "like handleSshExecute", but they call streamExecCommand with raw:true, skipping the OS timeout-wrapper that handleSshExecute's non-raw path applies. Comment-only.

test(session): model stderr PTY-folding faithfully in the shell fake

0effbe2

docs: reword wiring-plan check to drop literal attribution markers

8380db2

hunchom merged commit b333600 into main May 20, 2026
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v4 redesign — 13 fat verb-tools, token-efficient output, adoption layer#11

feat: v4 redesign — 13 fat verb-tools, token-efficient output, adoption layer#11
hunchom merged 91 commits into
mainfrom
ssh-mcp-v4-redesign

hunchom commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hunchom commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Hardening pass

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hunchom commented May 17, 2026 •

edited

Loading