Security: add URL allowlist for web_search and web_fetch by smartprogrammer93 · Pull Request #19042 · openclaw/openclaw

smartprogrammer93 · 2026-02-17T08:32:55Z

Summary

Problem: web_search and web_fetch had no mechanism to restrict which external domains the agent could reach, making it impossible to run a network-isolated or domain-scoped agent.
Why it matters: Operators deploying OpenClaw in restricted environments (corporate proxies, research sandboxes, prompt-injection hardening) need a declarative allowlist enforced at the tool layer — not just at the network layer.
What changed: Added an optional tools.web.urlAllowlist config field. When set, both web_search and web_fetch enforce it. When unset, all URLs are allowed (fully backwards compatible).
What did NOT change: Fetch/search behaviour, caching semantics, provider selection, or any auth flow.

URL matching rules

Pattern	Matches
`example.com`	exact domain only
`*.github.com`	all subdomains (including deeply nested)
`` / `.`	rejected at config parse time
`localhost`, `.localhost`, `.local`, `*.internal`	rejected at config parse time (SSRF guard blocks unconditionally)

Example config

tools:
  web:
    urlAllowlist:
      - "example.com"
      - "*.github.com"
      - "docs.openclaw.ai"

Change Type (select all)

Scope (select all touched areas)

User-visible / Behavior Changes

New optional config field: tools.web.urlAllowlist: string[]
When configured: web_fetch blocks requests (and redirect targets) not matching the allowlist, returning a structured { error: "allowlist_blocked" } tool result. web_search filters results, citations, and inlineCitations from all providers (Brave, Perplexity, Grok, Kimi, Gemini) post-cache so unfiltered data is stored and re-filtered on every read.
When not configured: no change in behaviour.
SSRF-blocked hostnames (localhost, localhost.localdomain, metadata.google.internal) and wildcard patterns like *.localhost, *.local, *.internal are rejected at config parse time with a clear error message, since the SSRF guard blocks them unconditionally at the network level.

Security Impact (required)

New permissions/capabilities? No — this is a restriction mechanism, not a capability expansion.
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? Yes — urlAllowlist narrows the set of URLs reachable by web_fetch and visible via web_search. Risk: none (additive restriction). Mitigation: config is opt-in and defaults to unrestricted.

Implementation details

web_fetch:

Allowlist check before fetch (returns structured error immediately).
Second check on finalUrl after redirect resolution — throws AllowlistBlockedError (new typed error exported from ssrf.ts) which is caught in execute and returned as { error: "allowlist_blocked" }. The redirect check is post-connection by design: the SSRF guard already validates each redirect hop's DNS at the network level, so the allowlist check is a content-policy gate only.

web_search:

applyUrlAllowlistToPayload filters all three citation shapes returned by LLM-based search providers:
- results: Array<{url}> — Brave, Perplexity-sonar
- citations: string[] — Perplexity-chat, Grok, Kimi, Gemini
- inlineCitations: Array<{url}> — Grok inline citations
Cache stores unfiltered results; allowlist applied on read so config changes take effect without cache invalidation.

Repro + Verification

Environment

OS: Linux (Ubuntu)
Runtime: Node 22
Model/provider: any
Relevant config: tools.web.urlAllowlist: ["example.com"]

Steps

Set tools.web.urlAllowlist: ["example.com"] in config
Ask agent to fetch https://evil.com via web_fetch
Ask agent to search for something and observe filtered results

Expected

Step 2: fetch blocked with allowlist error
Step 3: only example.com URLs in results/citations

Actual

Matches expected

Evidence

228-line test suite (web-tools.url-allowlist.test.ts) covering all path combinations, importing directly from production exports
AllowlistBlockedError smoke test
Citations filtering smoke test for LLM-provider payloads

Human Verification (required)

Verified scenarios: allowlist enforced on direct fetch, redirect targets, search results (results array), and LLM-provider citation arrays.
Edge cases checked: empty allowlist (no-op), wildcard patterns, invalid URLs in results, redirect to non-allowlisted domain.
What I did not verify: live Firecrawl fallback bypass (tested via code path analysis and unit tests).

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes — field is optional, defaults to unrestricted.
Config/env changes? No — new optional field only.
Migration needed? No

Failure Recovery (if this breaks)

How to disable/revert this change quickly: remove or leave tools.web.urlAllowlist unset in config.
Files/config to restore: tools.web.urlAllowlist key in openclaw.json / openclaw.yaml.
Known bad symptoms: agent reports URL blocked unexpectedly → check allowlist patterns.

Risks and Mitigations

Risk: allowlist silently misconfigured (e.g. https://example.com instead of example.com).
- Mitigation: Zod schema rejects URLs with protocol/path at parse time with a descriptive error. Patterns like * and *. are also rejected.
Risk: SSRF-blocked hostnames in allowlist causing confusing errors.
- Mitigation: Schema rejects localhost, localhost.localdomain, metadata.google.internal, and wildcard *.localhost/*.local/*.internal patterns with a clear error message explaining why.

alaindimabuyo · 2026-02-17T08:33:41Z

@greptileai please review

greptile-apps

_{10 files reviewed, 7 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T08:36:55Z

Additional Comments (1)

src/agents/tools/web-search.ts
Allowlist filtering bypassed by module-level cache

The SEARCH_CACHE (line 36) is a module-level Map, but the cache key (lines 609–614) does not incorporate the urlAllowlist. Filtered results are written to the cache at line 748. If the same process hosts multiple tool instances with different allowlist configs (or if the allowlist is changed across tool re-creations without restarting), a cache hit at line 617 will return previously-filtered (or unfiltered) results, bypassing the current allowlist.

In practice this is unlikely since the config is typically stable for the lifetime of a process, but it could be surprising in multi-agent setups with per-agent configs. Consider either:

Including a hash of the allowlist in the cache key, or
Applying filterResultsByAllowlist to cache hits as well

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/agents/tools/web-search.ts
Line: 609:619

Comment:
**Allowlist filtering bypassed by module-level cache**

The `SEARCH_CACHE` (line 36) is a module-level `Map`, but the cache key (lines 609–614) does not incorporate the `urlAllowlist`. Filtered results are written to the cache at line 748. If the same process hosts multiple tool instances with different allowlist configs (or if the allowlist is changed across tool re-creations without restarting), a cache hit at line 617 will return previously-filtered (or unfiltered) results, bypassing the current allowlist.

In practice this is unlikely since the config is typically stable for the lifetime of a process, but it could be surprising in multi-agent setups with per-agent configs. Consider either:
- Including a hash of the allowlist in the cache key, or
- Applying `filterResultsByAllowlist` to cache hits as well

How can I resolve this? If you propose a fix, please make it concise.

smartprogrammer93 · 2026-02-17T09:12:18Z

@greptileai please review

greptile-apps

_{8 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

smartprogrammer93 · 2026-02-17T09:31:34Z

@greptileai please review

greptile-apps

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

smartprogrammer93 · 2026-02-17T09:52:50Z

@greptileai please review

smartprogrammer93 · 2026-02-17T10:11:27Z

@greptileai please review

greptile-apps

_{9 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

smartprogrammer93 · 2026-02-17T10:16:43Z

@greptileai please review

smartprogrammer93 · 2026-02-17T10:18:41Z

@greptileai please review

smartprogrammer93 · 2026-02-17T10:26:23Z

ready for review and merge

smartprogrammer93 · 2026-02-22T07:43:34Z

Hey,
@steipete @sebslight
This PR replaces #18584 that was reverted. Please consider.

Apologies for the ping, i dont usually ping maintainers but this PR has been ignored for 3 releases and it was reverted so i felt it has some special circumstance.

smartprogrammer93 · 2026-02-28T01:27:17Z

@greptileai please review

greptile-apps · 2026-02-28T01:34:01Z

Greptile Summary

This PR adds an optional tools.web.urlAllowlist config field that restricts which external domains web_fetch and web_search can reach. When unset, all existing behaviour is unchanged (fully backwards-compatible). The implementation is well-structured and all major concerns from prior review rounds have been addressed in this revision.

Key changes:

web_fetch blocks non-allowlisted URLs before the fetch and before following each redirect (pre-connection, via a new onRedirectUrl hook in fetchWithSsrFGuard). AllowlistBlockedError is re-thrown in the Firecrawl-fallback catch block, preventing the bypass that existed in an earlier revision.
Both the initial-URL block and the redirect block now return the same { error: "allowlist_blocked" } structured tool result, giving the agent a consistent error surface.
web_search applies applyUrlAllowlistToPayload post-execute (after resolved.definition.execute(args)), so cache hits are re-filtered on every read rather than storing pre-filtered results — the previously flagged caching bug is resolved.
All three citation shapes are covered: results (Brave/Perplexity-sonar) is filtered; citations and inlineCitations (Grok/Kimi/Gemini/Perplexity-chat) use map() + "[blocked by urlAllowlist]" placeholder to preserve [N] positional index alignment in content.
Zod schema rejects SSRF-blocked patterns (*.localhost, *.local, *.internal, bare localhost, IP literals) at config parse time with clear error messages.
Duplicate resolver functions consolidated into a single resolveUrlAllowlist in web-shared.ts.
Tests now import directly from production exports, eliminating the prior drift risk.

Confidence Score: 5/5

This PR is safe to merge — it is a backward-compatible, opt-in restriction mechanism with no capability expansion.
All critical issues identified in prior review rounds (Firecrawl bypass, redirect-target bypass, LLM-provider citation leakage, pre-filtered caching, inconsistent error surface, import placement, test drift) have been addressed. The implementation correctly handles the pre-connection redirect check, citation index alignment, and schema validation edge cases. No new bugs were found during this review pass.
No files require special attention.

_{Last reviewed commit: "fix: pre-connection ..."}

smartprogrammer93 · 2026-03-22T07:51:33Z

@greptileai please review

smartprogrammer93 · 2026-03-22T07:59:23Z

@greptileai please review

Add optional urlAllowlist config at tools.web level that restricts which URLs can be accessed by web tools: - Config types: Add urlAllowlist?: string[] to tools.web - Zod schema: Add urlAllowlist field with domain pattern validation - Schema help: Add help text for new config fields - web_search: Filter Brave search results by allowlist - web_fetch: Block URLs not matching allowlist before fetching and on redirects - ssrf.ts: Export normalizeHostnameAllowlist and matchesHostnameAllowlist - web-shared.ts: Export resolveUrlAllowlist shared utility URL matching supports exact domain match and wildcard patterns (*.github.com). Single-label domains like localhost are also supported. When urlAllowlist is not configured, all URLs are allowed (backwards compatible). Tests: Add web-tools.url-allowlist.test.ts with 32 tests

…locked hostnames in schema

…locked

…ing as blocked" This reverts commit eebeb98e3378843f5b9dc5e4b742e3a168132e80.

… test

…dError for Firecrawl bypass

smartprogrammer93 · 2026-03-22T08:13:25Z

@greptileai please review

smartprogrammer93 · 2026-03-22T08:26:14Z

@greptileai please review

openclaw-barnacle · 2026-03-30T04:20:13Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

openclaw-barnacle · 2026-04-03T04:13:59Z

Closing due to inactivity.
If you believe this PR should be revived, post in #pr-thunderdome-dangerzone on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

dennisvanderpool · 2026-04-03T09:58:50Z

I solved it for myself using Pipelock
https://github.com/luckyPipewrench/pipelock

openclaw-barnacle Bot added agents Agent runtime and tooling size: M labels Feb 17, 2026

greptile-apps Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread src/agents/tools/web-fetch.ts Outdated

smartprogrammer93 force-pushed the feat/web-tools-url-allowlist branch from af37732 to 1eaf383 Compare February 17, 2026 08:37

greptile-apps Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread src/agents/tools/web-search.ts Outdated

Comment thread src/agents/tools/web-search.ts Outdated

greptile-apps Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread src/agents/tools/web-fetch.ts Outdated

greptile-apps Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread src/config/zod-schema.agent-runtime.ts Outdated

openclaw-barnacle Bot added size: L and removed size: M labels Feb 17, 2026

openclaw-barnacle Bot added size: M and removed size: L labels Feb 17, 2026

smartprogrammer93 changed the title ~~feat(tools): add URL allowlist for web_search and web_fetch~~ Security: add URL allowlist for web_search and web_fetch Feb 19, 2026

smartprogrammer93 force-pushed the feat/web-tools-url-allowlist branch 2 times, most recently from dcd5714 to ebd1c23 Compare February 28, 2026 00:04

github-actions Bot mentioned this pull request Feb 28, 2026

🦞 OpenClaw 生态日报 2026-02-28 duanyytop/agents-radar#24

Closed

greptile-apps Bot reviewed Feb 28, 2026

View reviewed changes

Comment thread src/config/schema.help.ts Outdated

Comment thread src/config/schema.labels.ts Outdated

Comment thread src/agents/tools/web-fetch.ts Outdated

Comment thread src/config/zod-schema.agent-runtime.ts Outdated

smartprogrammer93 and others added 12 commits March 22, 2026 11:12

fix: handle AllowlistBlockedError as structured result, reject SSRF-b…

5e12baf

…locked hostnames in schema

chore: regenerate config docs baseline for urlAllowlist schema change

d628293

chore: improve SSRF-blocked bare hostname comment clarity

c33eb04

test: replace localhost with intranet in allowlist test to match schema

6235c1d

fix: make filterResultsByAllowlist a pure no-op when allowlist is empty

bd377f8

fix: preserve non-string citation entries instead of mislabeling as b…

9d19fde

…locked

Revert "fix: preserve non-string citation entries instead of mislabel…

d661f2c

…ing as blocked" This reverts commit eebeb98e3378843f5b9dc5e4b742e3a168132e80.

fix: reject non-wildcard SSRF-blocked-suffix hostnames in schema

ad8ff00

fix: lowercase wildcard suffix comparison in urlAllowlist schema

9bbb4c1

fix: reject IPv4 literals in urlAllowlist schema, add inlineCitations…

4b125c4

… test

test: verify AllowlistBlockedError is distinguishable from SsrFBlocke…

4b159d2

…dError for Firecrawl bypass

smartprogrammer93 force-pushed the feat/web-tools-url-allowlist branch from 558c9e5 to 4b159d2 Compare March 22, 2026 08:13

fix: pre-connection redirect allowlist check via onRedirectUrl callback

545eb27

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Mar 30, 2026

This was referenced Apr 1, 2026

Upstream: OpenClaw v2026.3.31 — compatible Mattslayga/openclaw-railway#19

Closed

Upstream: OpenClaw v2026.4.1 — needs review Mattslayga/openclaw-railway#20

Closed

openclaw-barnacle Bot closed this Apr 3, 2026

github-actions Bot mentioned this pull request Apr 3, 2026

Upstream: OpenClaw v2026.4.2 — compatible Mattslayga/openclaw-railway#21

Closed

4 tasks

Uh oh!

Conversation

smartprogrammer93 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

URL matching rules

Example config

Change Type (select all)

Scope (select all touched areas)

User-visible / Behavior Changes

Security Impact (required)

Implementation details

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

alaindimabuyo commented Feb 17, 2026

Uh oh!

greptile-apps Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot commented Feb 17, 2026

Uh oh!

smartprogrammer93 commented Feb 17, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smartprogrammer93 commented Feb 17, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smartprogrammer93 commented Feb 17, 2026

Uh oh!

smartprogrammer93 commented Feb 17, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smartprogrammer93 commented Feb 17, 2026

Uh oh!

smartprogrammer93 commented Feb 17, 2026

Uh oh!

smartprogrammer93 commented Feb 17, 2026

Uh oh!

smartprogrammer93 commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smartprogrammer93 commented Feb 28, 2026

Uh oh!

greptile-apps Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smartprogrammer93 commented Mar 22, 2026

Uh oh!

smartprogrammer93 commented Mar 22, 2026

Uh oh!

smartprogrammer93 commented Mar 22, 2026

Uh oh!

smartprogrammer93 commented Feb 17, 2026 •

edited

Loading

greptile-apps Bot left a comment •

edited

Loading

smartprogrammer93 commented Feb 22, 2026 •

edited

Loading

greptile-apps Bot commented Feb 28, 2026 •

edited

Loading