feat(workspace-agent): add Hono service for sandboxed git operations#674
Conversation
c377ee8 to
8e2f1eb
Compare
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: CONDITIONAL
Can merge after addressing the blocking SIGTERM handler conflict.
Blocking issues
1. Double SIGTERM handler causes graceful drain bypass (, )
registers its own / handlers at module load time. registers a second set. Node.js fires ALL listeners for a signal. When SIGTERM arrives with no in-flight clones, resolves immediately and calls — racing against (and winning over) 's 25-second drain window. In-flight HTTP requests get killed without completing.
Fix: remove the module-level / handlers from . Export and so can call them inside its own shutdown function before draining.
The handler in (line 124-126) is fine to keep since it's synchronous and process-exit cleanup.
Non-blocking concerns
1. unnecessarily in env ()
(including ) is passed to both and 8e2f1eb. The rev-parse call is local and needs no auth. Not exploitable in this context, but unnecessary token exposure in subprocess env.
2. Wildcard case in ()
The arm returns the token for any unknown prompt. With and this is unreachable in practice, but if git ever adds a new prompt type, the token silently leaks. An explicit fallthrough with would be tighter.
3. returns HTTP 500 ()
A timeout imposed by this service is semantically a 408 or 504, not 500 (which implies an internal server error). Non-breaking since PR D will handle this error code by name, not status code — but worth aligning before the client is written.
4. check uses string prefix comparison ()
/
This is correct as written (the suffix prevents from matching ). Just flagging it's load-bearing; any future refactor of should preserve the trailing slash.
Missing tests
- No test for the SIGTERM/SIGINT handler behavior (the blocking issue above)
- No test for the double-entry guard in () — while trivial, main.ts is untested entirely
- No test covering not being passed as the wildcard arm of askpass ( case returns token — untestable from the outside, but at least a negative test that non-Username/Password prompts don't appear in stderr)
- No integration test for the correctly serializing concurrent same-repo requests (thundering herd scenario under the semaphore)
These are all low-priority given the comprehensive coverage of the critical paths.
Risk assessment (MED):
The security surface is well-controlled — token never in argv, scrubbing in place, path confinement correct, atomic clone correct. The one MED-risk item is the SIGTERM race: in production this means a or could kill the agent mid-clone, leaving a partial in . The block handles cleanup of on normal exit, but if fires from the clone.ts handler while a clone is executing, the block does NOT run (process exits immediately). This could leave orphaned temp dirs on disk — low blast radius (disk space, no security impact) but worth fixing before this runs in production.
Overall the implementation is thorough and the security invariants are sound. One structural fix required before merge.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26352705103 |
| Cache | hit |
| Session | ses_1a795c602ffeicuVVGmCSizZjQ |
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: CONDITIONAL
Can merge after addressing the blocking SIGTERM handler conflict.
Blocking issues
1. Double SIGTERM/SIGINT handlers cause graceful drain bypass (clone.ts lines 114-123, main.ts lines 48-49)
clone.ts registers its own SIGTERM and SIGINT handlers at module load time. main.ts registers a second set. Node.js fires ALL listeners for a signal. When SIGTERM arrives with no in-flight clones, asyncCleanupAllAskpassDirs() resolves immediately and calls process.exit(0) — racing against (and winning over) main.ts's 25-second drain window. In-flight HTTP requests get killed without completing.
More critically: if process.exit(0) fires from the clone.ts handler while a clone is mid-execution, the finally block in executeCloneInner does not run. Orphaned .tmp-{repo}-{suffix} directories survive under /workspace/repos/{owner}/.
Fix: remove the module-level SIGTERM/SIGINT handlers from clone.ts. Export asyncCleanupAllAskpassDirs and syncCleanupAskpassDirs so main.ts can call them inside its own shutdown() before calling server.close(). The process.on('exit', ...) sync handler in clone.ts is fine to keep.
Non-blocking concerns
1. GITHUB_TOKEN unnecessarily in rev-parse subprocess env (clone.ts line 388)
spawnEnv (including GITHUB_TOKEN) is passed to both git clone and git rev-parse HEAD. The rev-parse call is purely local and requires no authentication. Not exploitable here, but unnecessary token exposure in subprocess env.
2. Wildcard arm in askpass script returns token for unknown prompts (clone.ts line 295)
The catch-all arm returns GITHUB_TOKEN for any prompt string not explicitly matched. With GIT_TERMINAL_PROMPT=0 and credential.helper= this is unreachable in practice, but an explicit exit 1 fallthrough would be tighter.
3. clone-timeout returns HTTP 500 (clone.ts line 335)
A timeout imposed by this service is semantically a 408 or 504, not 500 (internal server error). Non-breaking since PR D routes on the error string, but worth aligning before the client is written.
Missing tests
- No test covering SIGTERM handler interaction between
clone.tsandmain.ts(the blocking issue above) main.tsis entirely untested (bootstrapping and drain logic)- No test for concurrent same-repo requests queueing through the per-repo lock (only semaphore overload path is exercised)
Risk assessment (MED): likelihood of regression, security exposure, or blast radius
Security invariants are sound: token never in argv, credential scrubbing in place, path confinement correct, atomic clone pattern correct. The MED risk is the SIGTERM race: a docker stop or compose down bypasses the drain window and may leave orphaned temp dirs. Blast radius is disk space only (no security impact), but should be fixed before production deployment.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26352705103 |
| Cache | hit |
| Session | ses_1a795c602ffeicuVVGmCSizZjQ |
8e2f1eb to
49b98d4
Compare
|
Note Review already submitted in Run 26352705103 (CHANGES_REQUESTED). This run (26352875100) is a duplicate trigger on the same PR event — no new findings. Run Summary
|
Add the workspace-agent app that runs inside the workspace container and
exposes a POST /clone endpoint for sandboxed git clones. The gateway
daemon calls it across the internal compose network so credentials never
cross the docker socket and the workspace never receives raw tokens via
argv or URL.
The contract:
- POST /clone {owner, repo, token} -> {ok, path, commit}
- GET /healthz -> {ok: true}
Security and reliability guards (each enforced by a test):
- Askpass tempfile created via mkdtemp + fs.open with O_EXCL (race-free,
0700 dir, 0600 file), never a predictable /tmp path
- Token passed to askpass via env var GITHUB_TOKEN; script body contains
no secret literal; case-statement returns x-access-token for username
prompts and the IAT for password prompts
- 60s execFile timeout via AbortController; clone-timeout error code
- Atomic clone via .tmp-{repo}-{random} subdir + fs.rename on success;
partial-clone debris removed in finally
- Per-repo lock serializes concurrent clones for the same owner/repo
- Global concurrency cap (5 active, 50 queued) returns 503 overloaded
- Body-size limit 4 KB before c.req.json(); rejects >4096 or missing
Content-Length with 413 body-too-large
- HEAD SHA failure returns head-resolution-failed (never ok: true with
commit: unknown)
- Sanitize rejects "." and ".." explicitly in addition to the safe-segment
regex; .github and repo.git remain valid
- Filesystem failures classified as disk-full / permission-denied /
too-many-files distinct from clone-failed
- GIT_TRACE / GIT_TRACE_PACKET / GIT_TRACE_PERFORMANCE / GIT_CURL_VERBOSE
forced to 0 in subprocess env
- execFile (not exec); URL hardcoded to https://github.com/{owner}/{repo}.git
- credential.helper= empty defeats operator-side credential helpers
- repoPath not in request body; agent derives from /workspace/repos/{owner}/{repo}
- Active controllers tracked for graceful shutdown via SIGTERM/SIGINT
Tests: 81 across sanitize, clone, server.
@hono/node-server pinned to 1.19.14 to clear GHSA-wc8c-qw6v-h7f6 (high)
and GHSA-92pp-h63x-v22m (moderate) which affected the unused serveStatic
middleware.
49b98d4 to
2d51e68
Compare
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: PASS
All blocking issues from the previous review have been addressed in commit 2d51e68.
Blocking issues
None — all four items resolved:
-
SIGTERM handler conflict —
clone.tsno longer registersSIGTERM/SIGINTat module load.asyncCleanupAllAskpassDirsis exported and called frommain.ts'sshutdown()beforeserver.close(). Theprocess.on('exit', ...)sync handler remains correctly inclone.ts. -
GITHUB_TOKENin rev-parse env — destructured out viaconst { GITHUB_TOKEN: _GITHUB_TOKEN, ...localGitEnv } = spawnEnvbefore the rev-parse call (clone.tsline 381). -
Wildcard askpass arm — changed from
printf GITHUB_TOKENto*) exit 1 ;;, failing closed on unknown prompts. -
clone-timeoutstatus code — corrected to 504 (gateway timeout).
Non-blocking concerns
None.
Missing tests
None — all four fixes have corresponding regression tests:
clone.ts — no module-level SIGTERM/SIGINT handlersverifies zero signal listeners from the moduleexecuteClone — rev-parse env omits GITHUB_TOKENasserts the key is absent in the rev-parse call's envexecuteClone — askpass wildcard arm fails closedasserts noprintf/GITHUB_TOKENin the wildcard linePOST /clone — clone-timeout returns 504verifies the HTTP statusexecuteClone — per-repo lock serializationnow exercises the concurrent same-repo queuing path
Risk assessment (LOW): likelihood of regression, security exposure, or blast radius
Implementation is clean. Security invariants verified end-to-end: token never in argv, scrubbing in place, path confinement correct, drain lifecycle correct. Ready to merge.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26353075716 |
| Cache | hit |
| Session | ses_1a795c602ffeicuVVGmCSizZjQ |
Why
The gateway daemon needs to drive git operations (clone, fetch, checkout) inside a sandboxed workspace container without ever holding raw GitHub tokens itself, and without mounting
/var/run/docker.sockto exec into the workspace. The clean solution is a tiny HTTP service inside the workspace that the gateway calls across the internal compose network.This is PR C of the four-PR series implementing
/add-project(channel-repo binding). PR A (bindings store) and PR B (GitHub App auth) already landed.What
A new
apps/workspace-agent/sub-app: a Hono service that runs inside the workspace container and exposesPOST /cloneplusGET /healthz. The contract is deliberately narrow:Security guards
Every guard is enforced by a test. The cluster is what makes this PR worth treating as security-sensitive even though the surface is small:
GIT_ASKPASStemp script — never in argv or URL (pscan't see it)GIT_TRACE=0,GIT_TRACE_PACKET=0,GIT_TRACE_PERFORMANCE=0,GIT_CURL_VERBOSE=0set in subprocess envexecFile(promisified); no shell interpolationhttps://github.com/{owner}/{repo}.git— caller cannot redirect git anywhere else-c credential.helper=(empty) defeats any operator-side credential helper that might intercept and persist the tokenrepoPathis NOT in the request body — agent derives it internally as/workspace/repos/{owner}/{repo}after sanitizingowner/reposanitized against[A-Za-z0-9._-]+, rejecting..,/,\finallyx-access-token:*from any error messageLayout
52 tests pass. Lint and types clean. Build produces
dist/.Out of scope
Explicitly NOT in this PR (PR D scope):
workspace-api/client that calls this servicefetchandcheckoutendpoints — PR C is the smallest useful clone-only surfaceIdempotency
If
/workspace/repos/{owner}/{repo}already exists, the handler returns 409 witherror: 'repo-exists'. PR D will own the decision of whether to delete-and-reclone vs fetch-and-reset based on the operator flow.Plan
docs/plans/2026-05-23-001-feat-gateway-unit-5-add-project-plan.mdPR C section.