Skip to content

feat(workspace-agent): add Hono service for sandboxed git operations#674

Merged
marcusrbrown merged 1 commit into
mainfrom
feat/gateway-workspace-agent
May 24, 2026
Merged

feat(workspace-agent): add Hono service for sandboxed git operations#674
marcusrbrown merged 1 commit into
mainfrom
feat/gateway-workspace-agent

Conversation

@marcusrbrown
Copy link
Copy Markdown
Collaborator

Why

The gateway daemon needs to drive git operations (clone, fetch, checkout) inside a sandboxed workspace container without ever holding raw GitHub tokens itself, and without mounting /var/run/docker.sock to exec into the workspace. The clean solution is a tiny HTTP service inside the workspace that the gateway calls across the internal compose network.

This is PR C of the four-PR series implementing /add-project (channel-repo binding). PR A (bindings store) and PR B (GitHub App auth) already landed.

What

A new apps/workspace-agent/ sub-app: a Hono service that runs inside the workspace container and exposes POST /clone plus GET /healthz. The contract is deliberately narrow:

POST /clone {owner, repo, token} -> {ok, path, commit}
GET  /healthz                    -> {ok: true}

Security guards

Every guard is enforced by a test. The cluster is what makes this PR worth treating as security-sensitive even though the surface is small:

  • Token via GIT_ASKPASS temp script — never in argv or URL (ps can't see it)
  • GIT_TRACE=0, GIT_TRACE_PACKET=0, GIT_TRACE_PERFORMANCE=0, GIT_CURL_VERBOSE=0 set in subprocess env
  • execFile (promisified); no shell interpolation
  • URL hardcoded to https://github.com/{owner}/{repo}.git — caller cannot redirect git anywhere else
  • -c credential.helper= (empty) defeats any operator-side credential helper that might intercept and persist the token
  • repoPath is NOT in the request body — agent derives it internally as /workspace/repos/{owner}/{repo} after sanitizing
  • owner/repo sanitized against [A-Za-z0-9._-]+, rejecting .., /, \
  • Askpass tempfile is chmod 0600 and deleted in finally
  • Credential scrubbing strips x-access-token:* from any error message

Layout

apps/workspace-agent/
├── package.json, tsconfig.json, tsdown.config.ts
├── AGENTS.md, README.md
└── src/
    ├── main.ts         (entry: starts @hono/node-server on port 9100)
    ├── server.ts       (Hono app factory — pure, for testability)
    ├── server.test.ts  (17 tests)
    ├── clone.ts        (execFile + askpass orchestration)
    ├── clone.test.ts   (18 tests)
    ├── sanitize.ts     (sanitizeOwner, sanitizeRepo, validateTokenShape)
    ├── sanitize.test.ts (17 tests)
    └── types.ts        (request/response shapes)

52 tests pass. Lint and types clean. Build produces dist/.

Out of scope

Explicitly NOT in this PR (PR D scope):

  • Gateway-side workspace-api/ client that calls this service
  • Compose stack integration (the agent image, network, port-bind, internal hostname)
  • fetch and checkout endpoints — PR C is the smallest useful clone-only surface
  • Repo-exists cleanup contract — returns 409 for now; PR D defines what callers do with it

Idempotency

If /workspace/repos/{owner}/{repo} already exists, the handler returns 409 with error: 'repo-exists'. PR D will own the decision of whether to delete-and-reclone vs fetch-and-reset based on the operator flow.

Plan

docs/plans/2026-05-23-001-feat-gateway-unit-5-add-project-plan.md PR C section.

@marcusrbrown marcusrbrown requested a review from fro-bot as a code owner May 24, 2026 04:35
@marcusrbrown marcusrbrown force-pushed the feat/gateway-workspace-agent branch from c377ee8 to 8e2f1eb Compare May 24, 2026 05:15
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: CONDITIONAL

Can merge after addressing the blocking SIGTERM handler conflict.


Blocking issues

1. Double SIGTERM handler causes graceful drain bypass (, )

registers its own / handlers at module load time. registers a second set. Node.js fires ALL listeners for a signal. When SIGTERM arrives with no in-flight clones, resolves immediately and calls — racing against (and winning over) 's 25-second drain window. In-flight HTTP requests get killed without completing.

Fix: remove the module-level / handlers from . Export and so can call them inside its own shutdown function before draining.

The handler in (line 124-126) is fine to keep since it's synchronous and process-exit cleanup.


Non-blocking concerns

1. unnecessarily in env ()

(including ) is passed to both and 8e2f1eb. The rev-parse call is local and needs no auth. Not exploitable in this context, but unnecessary token exposure in subprocess env.

2. Wildcard case in ()

The arm returns the token for any unknown prompt. With and this is unreachable in practice, but if git ever adds a new prompt type, the token silently leaks. An explicit fallthrough with would be tighter.

3. returns HTTP 500 ()

A timeout imposed by this service is semantically a 408 or 504, not 500 (which implies an internal server error). Non-breaking since PR D will handle this error code by name, not status code — but worth aligning before the client is written.

4. check uses string prefix comparison ()

/

This is correct as written (the suffix prevents from matching ). Just flagging it's load-bearing; any future refactor of should preserve the trailing slash.


Missing tests

  • No test for the SIGTERM/SIGINT handler behavior (the blocking issue above)
  • No test for the double-entry guard in () — while trivial, main.ts is untested entirely
  • No test covering not being passed as the wildcard arm of askpass ( case returns token — untestable from the outside, but at least a negative test that non-Username/Password prompts don't appear in stderr)
  • No integration test for the correctly serializing concurrent same-repo requests (thundering herd scenario under the semaphore)

These are all low-priority given the comprehensive coverage of the critical paths.


Risk assessment (MED):

The security surface is well-controlled — token never in argv, scrubbing in place, path confinement correct, atomic clone correct. The one MED-risk item is the SIGTERM race: in production this means a or could kill the agent mid-clone, leaving a partial in . The block handles cleanup of on normal exit, but if fires from the clone.ts handler while a clone is executing, the block does NOT run (process exits immediately). This could leave orphaned temp dirs on disk — low blast radius (disk space, no security impact) but worth fixing before this runs in production.

Overall the implementation is thorough and the security invariants are sound. One structural fix required before merge.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26352705103
Cache hit
Session ses_1a795c602ffeicuVVGmCSizZjQ

Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: CONDITIONAL

Can merge after addressing the blocking SIGTERM handler conflict.


Blocking issues

1. Double SIGTERM/SIGINT handlers cause graceful drain bypass (clone.ts lines 114-123, main.ts lines 48-49)

clone.ts registers its own SIGTERM and SIGINT handlers at module load time. main.ts registers a second set. Node.js fires ALL listeners for a signal. When SIGTERM arrives with no in-flight clones, asyncCleanupAllAskpassDirs() resolves immediately and calls process.exit(0) — racing against (and winning over) main.ts's 25-second drain window. In-flight HTTP requests get killed without completing.

More critically: if process.exit(0) fires from the clone.ts handler while a clone is mid-execution, the finally block in executeCloneInner does not run. Orphaned .tmp-{repo}-{suffix} directories survive under /workspace/repos/{owner}/.

Fix: remove the module-level SIGTERM/SIGINT handlers from clone.ts. Export asyncCleanupAllAskpassDirs and syncCleanupAskpassDirs so main.ts can call them inside its own shutdown() before calling server.close(). The process.on('exit', ...) sync handler in clone.ts is fine to keep.


Non-blocking concerns

1. GITHUB_TOKEN unnecessarily in rev-parse subprocess env (clone.ts line 388)

spawnEnv (including GITHUB_TOKEN) is passed to both git clone and git rev-parse HEAD. The rev-parse call is purely local and requires no authentication. Not exploitable here, but unnecessary token exposure in subprocess env.

2. Wildcard arm in askpass script returns token for unknown prompts (clone.ts line 295)

The catch-all arm returns GITHUB_TOKEN for any prompt string not explicitly matched. With GIT_TERMINAL_PROMPT=0 and credential.helper= this is unreachable in practice, but an explicit exit 1 fallthrough would be tighter.

3. clone-timeout returns HTTP 500 (clone.ts line 335)

A timeout imposed by this service is semantically a 408 or 504, not 500 (internal server error). Non-breaking since PR D routes on the error string, but worth aligning before the client is written.


Missing tests

  • No test covering SIGTERM handler interaction between clone.ts and main.ts (the blocking issue above)
  • main.ts is entirely untested (bootstrapping and drain logic)
  • No test for concurrent same-repo requests queueing through the per-repo lock (only semaphore overload path is exercised)

Risk assessment (MED): likelihood of regression, security exposure, or blast radius

Security invariants are sound: token never in argv, credential scrubbing in place, path confinement correct, atomic clone pattern correct. The MED risk is the SIGTERM race: a docker stop or compose down bypasses the drain window and may leave orphaned temp dirs. Blast radius is disk space only (no security impact), but should be fixed before production deployment.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26352705103
Cache hit
Session ses_1a795c602ffeicuVVGmCSizZjQ

@fro-bot
Copy link
Copy Markdown
Owner

fro-bot commented May 24, 2026

Note

Review already submitted in Run 26352705103 (CHANGES_REQUESTED). This run (26352875100) is a duplicate trigger on the same PR event — no new findings.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26352875100
Cache hit
Session ses_1a795c602ffeicuVVGmCSizZjQ

Add the workspace-agent app that runs inside the workspace container and
exposes a POST /clone endpoint for sandboxed git clones. The gateway
daemon calls it across the internal compose network so credentials never
cross the docker socket and the workspace never receives raw tokens via
argv or URL.

The contract:
- POST /clone {owner, repo, token} -> {ok, path, commit}
- GET  /healthz                    -> {ok: true}

Security and reliability guards (each enforced by a test):
- Askpass tempfile created via mkdtemp + fs.open with O_EXCL (race-free,
  0700 dir, 0600 file), never a predictable /tmp path
- Token passed to askpass via env var GITHUB_TOKEN; script body contains
  no secret literal; case-statement returns x-access-token for username
  prompts and the IAT for password prompts
- 60s execFile timeout via AbortController; clone-timeout error code
- Atomic clone via .tmp-{repo}-{random} subdir + fs.rename on success;
  partial-clone debris removed in finally
- Per-repo lock serializes concurrent clones for the same owner/repo
- Global concurrency cap (5 active, 50 queued) returns 503 overloaded
- Body-size limit 4 KB before c.req.json(); rejects >4096 or missing
  Content-Length with 413 body-too-large
- HEAD SHA failure returns head-resolution-failed (never ok: true with
  commit: unknown)
- Sanitize rejects "." and ".." explicitly in addition to the safe-segment
  regex; .github and repo.git remain valid
- Filesystem failures classified as disk-full / permission-denied /
  too-many-files distinct from clone-failed
- GIT_TRACE / GIT_TRACE_PACKET / GIT_TRACE_PERFORMANCE / GIT_CURL_VERBOSE
  forced to 0 in subprocess env
- execFile (not exec); URL hardcoded to https://github.com/{owner}/{repo}.git
- credential.helper= empty defeats operator-side credential helpers
- repoPath not in request body; agent derives from /workspace/repos/{owner}/{repo}
- Active controllers tracked for graceful shutdown via SIGTERM/SIGINT

Tests: 81 across sanitize, clone, server.

@hono/node-server pinned to 1.19.14 to clear GHSA-wc8c-qw6v-h7f6 (high)
and GHSA-92pp-h63x-v22m (moderate) which affected the unused serveStatic
middleware.
@marcusrbrown marcusrbrown force-pushed the feat/gateway-workspace-agent branch from 49b98d4 to 2d51e68 Compare May 24, 2026 05:36
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: PASS

All blocking issues from the previous review have been addressed in commit 2d51e68.

Blocking issues

None — all four items resolved:

  1. SIGTERM handler conflictclone.ts no longer registers SIGTERM/SIGINT at module load. asyncCleanupAllAskpassDirs is exported and called from main.ts's shutdown() before server.close(). The process.on('exit', ...) sync handler remains correctly in clone.ts.

  2. GITHUB_TOKEN in rev-parse env — destructured out via const { GITHUB_TOKEN: _GITHUB_TOKEN, ...localGitEnv } = spawnEnv before the rev-parse call (clone.ts line 381).

  3. Wildcard askpass arm — changed from printf GITHUB_TOKEN to *) exit 1 ;;, failing closed on unknown prompts.

  4. clone-timeout status code — corrected to 504 (gateway timeout).

Non-blocking concerns

None.

Missing tests

None — all four fixes have corresponding regression tests:

  • clone.ts — no module-level SIGTERM/SIGINT handlers verifies zero signal listeners from the module
  • executeClone — rev-parse env omits GITHUB_TOKEN asserts the key is absent in the rev-parse call's env
  • executeClone — askpass wildcard arm fails closed asserts no printf/GITHUB_TOKEN in the wildcard line
  • POST /clone — clone-timeout returns 504 verifies the HTTP status
  • executeClone — per-repo lock serialization now exercises the concurrent same-repo queuing path

Risk assessment (LOW): likelihood of regression, security exposure, or blast radius

Implementation is clean. Security invariants verified end-to-end: token never in argv, scrubbing in place, path confinement correct, drain lifecycle correct. Ready to merge.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26353075716
Cache hit
Session ses_1a795c602ffeicuVVGmCSizZjQ

@marcusrbrown marcusrbrown merged commit 201b294 into main May 24, 2026
10 checks passed
@marcusrbrown marcusrbrown deleted the feat/gateway-workspace-agent branch May 24, 2026 16:23
@fro-bot fro-bot mentioned this pull request May 24, 2026
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants