feat(harness): web::fetch worker with SSRF-hardened HTTP client#202
Conversation
A standalone harness worker exposing `web::fetch` so agents fetch URLs through a structured, guarded envelope instead of reaching for `shell::exec` curl. - SSRF guard (ssrf.ts): resolve-once / validate-all / pin-to-IP; blocks private, loopback (configurable), link-local, and cloud-metadata ranges, including `::ffff:`-mapped IPv4 in both dotted and hex forms; each redirect hop is re-validated against the resolved IP. - Transport (fetch.ts): node:http/https with a pinned DNS lookup + servername so the validated IP is the one actually dialed (no DNS-rebind window); strips Authorization/Cookie on cross-host redirects and https->http downgrades; byte-capped, timeout-bounded response reader. - Schemas (schemas.ts): zod ingress + JSON-schema export; case-insensitive method; json payload auto-stringify; text/base64/json response formats. - Standalone worker: main.ts + iii.worker.yaml + register.ts (registers `web::fetch`); not yet wired into the combined index.ts. Tests: 62 pass — ssrf unit, fetch guard/helper surface, handler, and a loopback http.createServer integration suite (pinning, per-hop re-validation, byte cap, timeout, POST). tsc -b clean; biome (2.4.10) clean.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Review limit reached
More reviews will be available in 53 minutes and 33 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (13)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
skill-check — worker0 verified, 13 skipped (no docs/).
Note 17 stale rendered artifact(s) detected on main, unrelated to this PR. This PR is fine; the drift was already there. A maintainer should open a chore PR to re-render these.
|
Single self-contained index.md skill for the web worker, mirroring the sandbox skill format: callable id, when-to-use table, live-schema pointer, the request/response envelope, the json-vs-body and response_format rules, truncation-vs-error semantics, and the SSRF guard (blocked ranges incl. ::ffff:-mapped IPv4, pin-to-IP, per-hop redirect re-check, cross-origin auth stripping). Lives at harness/src/web/skills/ alongside the worker source, same convention as coder/database/shell.
cce3a64 to
47533f5
Compare
Rewrite for the actual consumer (an agent calling web::fetch on the first try), applying DX principles to the doc: - Lead with the minimal call (url is the only required field). - Add the ok-vs-status rule: HTTP 4xx/5xx are ok:true (a completed fetch), ok:false is only fetch-level failure. Fixes the prior misleading `status:502` in the ok:false example (executeFetch never sets status on errors). - Add an error -> cause -> fix table for every ok:false code. - Decision table up top; request fields as a table with defaults + gotchas. - Document verified behaviors: response header keys are lower-cased, set-cookie joined with ', '; GET/HEAD ignore body/json; response_format json doesn't parse a truncated body. - Tighten prose for token budget (system-prompt injection).
The web::fetch worker (#202) shipped its own files and standalone main.ts but was never wired into src/index.ts, so `pnpm dev:all` and `start:all` (which run the composite all-in-one process) never started it. Add the registration so the worker boots with the rest. Also backfill the missing dev:web / iii-web wiring in package.json, plus the same gap for provider-llamacpp, and add a regression test asserting every runnable worker folder (src/*/main.ts) is wired into the composite manifest, dev scripts, and bin entries.
Summary
Adds a standalone harness worker exposing
web::fetchso the agent can fetch URLs through a structured, server-guarded envelope instead of reaching forshell::exec+ curl. Scoped entirely toharness/src/web/+harness/tests/web/— no changes to existing workers.What it does
web::fetchtakes{ url, method?, headers?, body?, json?, timeout_ms?, max_bytes?, follow_redirects?, response_format? }and returns{ ok, status, headers, body, … }(or{ ok:false, error, message }), with size/timeout caps and SSRF protection enforced server-side.Security model (
ssrf.ts+fetch.ts)lookup+ TLSservername) so there's no DNS-rebinding window between check and connect.::ffff:-mapped IPv4 in both dotted and hex forms.Authorization/Cookieare stripped on cross-host redirects and on anyhttps → httpdowngrade.bytes_truncatedflag) and per-request timeout.Shape
schemas.ts— zod ingress +zodToJsonSchemaexport; case-insensitive method;jsonpayload auto-stringify + content-type;text/base64/jsonresponse formats.main.ts+iii.worker.yaml+register.ts— standalone deployable worker registeringweb::fetch.Test plan
tsc -bcleanvitest run tests/web/— 62/62 pass:ssrfunit (mapped-IPv4 hex/dotted, all ranges),fetchguard + helper surface (stripCrossOriginAuth,readIncomingCapped), handler, and a realhttp.createServerloopback integration suite (IP pinning, per-hop re-validation, byte cap, timeout, POST/JSON round-trip)biome check(2.4.10) cleanhttps://smoke (servername/cert-identity path is correct-by-construction + integration-tested over plaintext loopback; a trusted-cert HTTPS server isn't feasible in unit tests)Notes
harness/src/index.ts— it's a standalone worker (ownmain.ts+ manifest), runnable vianode dist/web/main.js. Wiring it into the bundled harness can be a follow-up if desired.