Skip to content

feat: v4 redesign — 13 fat verb-tools, token-efficient output, adoption layer#11

Merged
hunchom merged 91 commits into
mainfrom
ssh-mcp-v4-redesign
May 20, 2026
Merged

feat: v4 redesign — 13 fat verb-tools, token-efficient output, adoption layer#11
hunchom merged 91 commits into
mainfrom
ssh-mcp-v4-redesign

Conversation

@hunchom
Copy link
Copy Markdown
Owner

@hunchom hunchom commented May 17, 2026

Summary

  • Tool consolidation — collapses the prior 51-tool surface into 13 fat verb-tools (ssh_run, ssh_file, ssh_find, ssh_logs, ssh_service, ssh_health, ssh_db, ssh_backup, ssh_session, ssh_net, ssh_docker, ssh_fleet, ssh_plan), each routing an action enum to the existing handlers via per-domain dispatchers. The schema stays small enough (~5k tokens) to remain un-deferred.
  • Token-efficient output — render primitives (renderHeader / renderRows / renderKV / indentBody), a compact default format, indentation instead of code fences, and ls / ps command-output compressors ahead of truncation.
  • New capabilitiesssh_find (timeout-bounded, match-capped remote search), ssh_run script / detach / job-status / job-kill (chained commands plus detached background jobs), pooled-connection reuse without a per-call liveness probe.
  • Adoption layer — selling tool descriptions that name the raw ssh / scp / rsync each tool replaces, a project rule file entry, and a fail-open PreToolUse Bash-nudge hook.

Hardening pass

A strict by-the-book review of the consolidated surface — treating the pre-existing handler code as untrusted — found and fixed, before merge:

  • Critical — a ssh_run action: fleet call OOM-crashed the whole server on the production group-object shape (infinite loop in pMap); ssh_alert_setup action: check never fired because the threshold evaluator read field names the health-check producer never emits.
  • High — unstructured crash on an unknown fleet group name; ssh_file action: edit treated literal old_text as a regex (silent wrong-replacement on a write path); the ssh_plan approval-token gate was bypassable by a downward risk override.
  • Medium / Low — a known_hosts host match using substring matching; a leaked follow-session map; dead/misleading args; a dead, command-injection-prone deploy-helper.js module (deleted); stale comments.

Every finding was independently re-verified closed with no regression.

Test Plan

  • npm test — 1070 passing, 0 failing across 59 suites
  • ./scripts/validate.sh — JS syntax, MCP server startup, no .env leak
  • node --check src/index.js — 13-tool registration valid
  • Smoke-test the 13 tools against a live configured server
  • Confirm the 13-tool schema stays un-deferred (no tool-search fallback)

hunchom added 30 commits May 16, 2026 22:38
Consolidate the 51-tool MCP surface into 13 fat verb-tools with a token-efficient plain-text output model and an adoption strategy. Incorporates findings from a 4-agent design review covering tokens, speed, UX, and architecture.
Plan 1 of 5: pre-build schema gate plus four additive render primitives (renderHeader, indentBody, renderKV, renderRows). Additive-only, zero regression risk.
Plan 2 of 6: rewrite defaultRender and renderMarkdown onto the Plan 1 primitives, replace fences with indentation, make compact the default format.
Plan 3 of 6: command-output compressor module (ls, ps) wired into formatExecResult ahead of truncation, with a raw:true bypass.
Seven plan files: the 12-tool dispatcher facade (3), new capabilities — remote search, script/detach jobs, connection reuse (3), and adoption (1).
hunchom added 8 commits May 17, 2026 03:34
- ssh_docker description dropped a `compose` action absent from both the
  schema enum and the handler; ssh_fleet description was missing the
  `command_alias` action the enum declares
- remove an orphaned `3. Custom` list item the TOOL_MANAGEMENT.md
  overview rewrite left behind
- ssh-bash-nudge.mjs resolves the .env path via fileURLToPath rather
  than URL.pathname, which is not percent-decoded
- ssh_run schema marks server optional -- it was required, which made
  the fleet action (group-targeted, no server) uninvokable through MCP
- ssh_db list ships a renderDbList renderer -- without it the databases
  array collapsed to one JSON line via defaultRender
- de-stale README tool/test counts to the v4 13-tool surface
- correct the ssh-docker.js header comment (compose is unadvertised)
Comment thread src/fleet-adapters.js
result = deps.addServersToGroup(name, members);
output = `[ok] Group '${name}' members: ${result.servers.join(', ')}`;
} else {
result = deps.updateGroup(name, { description });
hunchom added 20 commits May 17, 2026 04:51
buildTcpCommand's /dev/tcp fallback interpolated the model-supplied
target_host raw into a bash -c line -- a hostile value such as
`x/$(cmd)/y` reached a remote shell. Add a SAFE_HOST_RE allowlist
(hostname / IPv4 / IPv6 chars only): handleSshPortTest rejects a bad
host before connecting, and buildTcpCommand throws on one. Bump the
README test count to 1031.
Every ssh_db handler hard-fails without a valid db_type, but the
dispatcher REQUIRED map omitted it -- a missing db_type produced the
handler's vaguer "unsupported db_type: undefined" instead of a clean
dispatcher-level "action requires: db_type". Add db_type to all four
action entries.
Tighten SAFE_HOST_RE so target_host cannot begin with '-'. A real
hostname or IP never does, and a dash-prefixed value risks being read
as a flag by nc/getent. Defense-in-depth on top of the existing
shell-metacharacter allowlist.
stepRisk() honored any caller-supplied risk value, including one below
the action table risk. A model-authored step {action:'exec_sudo',
risk:'low'} classified as low, so highestRisk() never saw high and the
approve-token gate was skipped -- a privileged step ran with no token.

Compute the table risk first; return the override only when it ranks
strictly higher. Downward overrides are ignored.
Three defects in SSHSessionV2:

- _drainWaiters resolved only the head waiter per stream event. Two
  sentinels arriving in one data chunk left the second waiter hanging
  to timeout. Now loops, draining every head waiter already buffered.

- onStderr fed a SECOND stream into the SAME StringDecoder as stdout;
  a multibyte char split across an interleaved boundary corrupted.
  client.shell() folds stderr into the pty stream -- the stderr
  listener was dead code, now removed.

- renderSessionSend wrapped stdout/stderr in ```text fences, which
  break when the payload contains a ``` line. Switched to indentBody,
  consistent with every other v4 renderer.
getDefaultConfig() still emitted the seven pre-v4 groups (core,
sessions, monitoring, backup, database, advanced, gamechanger). The
real v4 groups are core/ops/advanced. In custom mode, isToolEnabled
resolved a tool to ops/advanced, found no such key in the stale
config, and fell through to return true -- silently re-enabling tools
a user had disabled after upgrade. Default groups are now the three
real v4 keys.
isHostKnown / getCurrentHostKey tested line.includes(hostEntry), so
example.com matched a notexample.com line, an example.com.evil.net
line, or a coincidental hit inside the base64 key body -- and never
matched hashed |1| entries. Both feed the live connect() host-key
verifier, so a mismatch silently fell through to TOFU re-acceptance.

Now parse each line and compare host (and port) exactly, reusing
parseKnownHostLine from key-tools.js. Hashed entries never match.
handleSshExecuteGroup treated resolveGroup's result as an array, but the
production DEPS.resolveGroup returns { name, servers:[...] } (or null).
pMap then read .length on a non-array -> undefined -> the worker guard
i>=n was never true -> infinite loop -> heap OOM -> the whole MCP server
process crashed instead of returning a structured error.

- handleSshExecuteGroup: normalize resolveGroup result, accepting both a
  bare array and the { name, servers } object; empty/null -> structured fail.
- pMap: non-array items degrades to an empty result instead of looping.
- DEPS.resolveGroup: wrap getGroup in try/catch; an unknown group name
  (getGroup throws "Group 'X' not found") now returns null, which the
  handler renders as a clean "group has no servers" fail, not a raw crash.

Tests exercise the real production object shape and the null/unknown-group
path; the prior fleet tests stubbed resolveGroup as a bare array and so
missed both crashes entirely.
The ssh_file schema describes old_text as plain "Text to replace", so a
caller supplies literal text. But the edit action mapped it straight into
a patch { find } and applyPatches compiled find with new RegExp(). Regex
metacharacters in literal text (. ( [ * ? etc.) were interpreted as regex:
a literal-looking edit silently patched the wrong span, or threw
"invalid patch regex" on an unbalanced bracket -- corrupting remote files.

- applyPatches: honor a per-rule literal:true flag. Under it, find is
  regex-escaped before compiling and $-sequences in the replacement stay
  literal (no accidental capture-group backreferences).
- ssh_file edit: build the patch rule with literal:true so old_text /
  new_text are matched and substituted verbatim.

Direct patch[] callers (ssh_edit with explicit regex rules) are unchanged
-- literal defaults off. Tests cover metachar substrings, an unbalanced
bracket that no longer throws, and literal $-sequences in the replacement.
The deploy dispatch forwarded permissions and owner to handleSshDeploy,
but that handler never destructures them and they are absent from the
ssh_file schema -- pure dead forwarding. Meanwhile handleSshDeploy does
read rollback_hook (run after a rollback to restore service) yet it was
not in the schema, so a caller had no way to set a real feature.

- ssh_file deploy dispatch: stop forwarding the dead permissions/owner.
- ssh_file inputSchema: add rollback_hook (optional string).
The header claimed those branches exec "like handleSshExecute", but they
call streamExecCommand with raw:true, skipping the OS timeout-wrapper
that handleSshExecute's non-raw path applies. Comment-only.
compressLs always drops the single `total N` header line, so the
shared footer's "1 line compressed" wrongly implied a content row was
hidden. compressLs now supplies its own footer note ("ls total-line
dropped"); the dispatcher honors a compressor-supplied note over the
generic line-count footer.
buildDeploymentStrategy / detectDeploymentNeeds interpolated sudoPassword
and remotePath raw into shell command strings -- command injection plus a
credential on the pipeline. The module was imported by index.js but none
of its exports were ever called; the live deploy path (handleSshDeploy)
does not use it. Removing the file and its dead import block.
evaluateThresholds read metrics.cpu.usage_percent, memory.used_percent
and disk[].used_percent -- field names handleSshHealthCheck never emits.
Every read yielded undefined -> NaN -> Number.isFinite false, so the
threshold branch was skipped and check reported "ok" on a server at
99% load.

The real producer (monitoring-tools parseTopCpu/parseFreeMem/parseDf):
- cpu has no aggregate usage field, only { user_pct, system_pct,
  idle_pct, iowait_pct } -> derive usage as 100 - idle_pct
- memory exposes used_pct (not used_percent)
- disk rows expose used_pct + device + mount (not filesystem/used_percent)

Rewrite the three reads to those names; keep the Number.isFinite guard
so a genuinely-missing metric still skips cleanly. Correct the stale
comments that cited the wrong contract.

Tests: the existing evaluateThresholds cases fed fabricated
usage_percent/used_percent shapes that passed only because they matched
the buggy reader. Re-point them at the real field names and add two
end-to-end cases that pipe a genuine handleSshHealthCheck result
(busy host / quiet host) through evaluateThresholds, asserting alerts
fire / stay silent -- this guards against producer/consumer field drift.
The overview path JSON.parse(hc.content[0].text)'d the delegated
health_check payload with no try/catch. It cannot throw today --
health_check is always invoked with format:'json' -- but a future
shape change would surface as an unhandled SyntaxError out of
handleSshMonitor instead of a structured failure.

Wrap the parse; on failure return a structured ssh_monitor fail
("overview unavailable: ... unparseable payload") so the tool
degrades gracefully.

Add an injectable _healthCheck delegate (defaults to the real
aggregator, production never overrides it) so the parse guard is
genuinely testable. Tests: a happy overview path, a connection
failure, and an injected garbage-payload delegate that asserts the
tool degrades to a structured failure rather than throwing -- this
last one fails without the try/catch.
The module-level sessions Map dropped an entry only on an explicit
ssh_tail_stop. A follow-start whose stream closed (remote ended,
connection dropped) but was never explicitly stopped left its state
in the Map for the server's lifetime. The 1 MB ring buffer bounds
per-session memory, but closed sessions were never reaped.

Track readOffset (highest delivered offset) and closedAt per session.
reapClosedSessions() drops a closed session once its buffer is fully
read, or once it has been closed past a grace period even if unread
(the stream is dead). It runs on every follow-start and follow-read.

handleSshTailRead's opening sweep excludes the session it is about to
serve so the caller still gets one final closed:true read; that
session is then reaped at the end of the call.

Tests: a closed + fully-read session is reaped on the next read with
no explicit stop (fails without the fix), a closed session past the
grace window is reaped even if never read, and an open session
survives any sweep.
creds() in the ssh_db dispatcher forwarded host and port to all four
db handlers, and the ssh_db inputSchema declared them -- but no
db-tools handler or command builder reads either. buildMongoConnectionUri
is the only place with host/port parameters and its sole caller never
passes them, so every db op always hits the local socket. A caller
passing host/port for a remote DB silently got the local one.

Remove host/port from creds() and from the ssh_db inputSchema, and
drop the host/port assertions from the dispatcher test (user/password
forwarding still covered). Not wired through -- that is a feature,
out of scope. Re-add only alongside a handler that honors them.
@hunchom hunchom merged commit b333600 into main May 20, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants