Skip to content

fix: add SSRF guard to Anthropic/Gemini PDF providers and move Gemini API key to header#46377

Open
cdxiaodong wants to merge 1 commit into
openclaw:mainfrom
cdxiaodong:fix/pdf-provider-ssrf-guard
Open

fix: add SSRF guard to Anthropic/Gemini PDF providers and move Gemini API key to header#46377
cdxiaodong wants to merge 1 commit into
openclaw:mainfrom
cdxiaodong:fix/pdf-provider-ssrf-guard

Conversation

@cdxiaodong
Copy link
Copy Markdown

Summary

  • Both anthropicAnalyzePdf() and geminiAnalyzePdf() in src/agents/tools/pdf-native-providers.ts used raw fetch() with a user-controlled baseUrl parameter. An attacker could set baseUrl to an internal/private IP address, causing the server to make requests to internal services (SSRF).
  • The Anthropic function leaked the x-api-key header to any attacker-controlled destination.
  • The Gemini function passed the API key as a URL query parameter (?key=...), exposing it in server logs, proxy logs, and HTTP Referer headers (CWE-598).

Changes

  • Replace raw fetch() with fetchWithSsrFGuard(withStrictGuardedFetchMode(...)) in both functions, which validates the resolved hostname/IP against the SSRF blocklist before connecting.
  • For Gemini, move the API key from URL query parameter to the x-goog-api-key HTTP header to prevent credential leakage.
  • Add proper release() cleanup in finally blocks for both functions.

Test plan

  • Verify Anthropic PDF analysis still works with default https://api.anthropic.com base URL
  • Verify Gemini PDF analysis still works with default https://generativelanguage.googleapis.com base URL
  • Confirm that setting baseUrl to a private/internal IP (e.g. http://169.254.169.254) is blocked by the SSRF guard
  • Confirm Gemini API key no longer appears in URL query parameters

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 14, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 14, 2026

Greptile Summary

This PR fixes two concrete security vulnerabilities — SSRF exposure and API credential leakage — in anthropicAnalyzePdf and geminiAnalyzePdf by replacing raw fetch() calls with fetchWithSsrFGuard(withStrictGuardedFetchMode(...)), moving the Gemini API key from a URL query parameter to a request header, and adding proper release() cleanup in finally blocks.

  • SSRF guard applied — both functions now use fetchWithSsrFGuard with STRICT mode, which resolves the target hostname before connecting and rejects private/internal IPs, preventing a user-controlled baseUrl from reaching internal services.
  • Gemini credential moved to header — the API key is sent via x-goog-api-key instead of appending it to the URL, preventing exposure in server logs, proxy logs, and HTTP Referer headers (CWE-598).
  • Credentials stripped on cross-origin redirectsfetchWithSsrFGuard already strips non-safe headers on cross-origin redirects, so sensitive headers are not forwarded to unintended hosts.
  • release() cleanup added — both functions now call release() in a finally block, ensuring the pinned dispatcher is closed after every request outcome.
  • Style note: Neither call provides an auditContext, so blocked SSRF attempts will appear in logs as the generic "url-fetch" context rather than "anthropic-pdf" / "gemini-pdf", reducing observability. The try blocks also open after the await fetchWithSsrFGuard(...) assignment rather than wrapping the full acquisition, which is functionally safe but unconventional.

Confidence Score: 4/5

  • This PR is safe to merge — the security fixes are correct and well-implemented, with only minor style improvements remaining.
  • The SSRF guard is applied correctly with strict mode, the Gemini API key leakage via query parameter is fixed, and release() cleanup is handled in finally blocks. The fetchWithSsrFGuard implementation also ensures credentials are stripped on cross-origin redirects. Minor issues: no auditContext set (reduces security log observability) and the try block doesn't cover the resource acquisition scope (safe today but unconventional).
  • No files require special attention beyond the two style suggestions noted above.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 68-86

Comment:
**Consider adding `auditContext` for better SSRF log attribution**

Neither provider specifies `auditContext` in the `fetchWithSsrFGuard` call. When an SSRF attempt is blocked, the guard logs a warning using the value of `auditContext ?? "url-fetch"`, so all blocked attempts from these functions will appear as the generic `"url-fetch"` context rather than something like `"anthropic-pdf"` or `"gemini-pdf"`. Adding it would make security incident investigations substantially easier.

For `anthropicAnalyzePdf`:
```suggestion
  const { response: res, release } = await fetchWithSsrFGuard(
    withStrictGuardedFetchMode({
      url: fetchUrl,
      auditContext: "anthropic-pdf",
      init: {
```

A similar change for `geminiAnalyzePdf` (line ~159–173) would set `auditContext: "gemini-pdf"`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 87-118

Comment:
**`try` block doesn't cover resource acquisition**

The `try/finally` block opens *after* the `await fetchWithSsrFGuard(...)` call, meaning the `release` returned by a successful call is not guarded against an exception thrown between the awaited result assignment and the `try`. In practice this is safe today because `fetchWithSsrFGuard` calls its internal cleanup before re-throwing any error, making the returned `release` a no-op if it were skipped. However the pattern is fragile and may confuse future maintainers into thinking the `try` covers the full resource lifecycle.

A more conventional pattern that makes the intent explicit:

```typescript
let release: (() => Promise<void>) | undefined;
try {
  const result = await fetchWithSsrFGuard(withStrictGuardedFetchMode({ ... }));
  const res = result.response;
  release = result.release;

  // ... response handling
  return text.trim();
} finally {
  await release?.();
}
```

The same applies to the `geminiAnalyzePdf` function (lines 174–202).

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 970a85d

Comment on lines +68 to +86
const { response: res, release } = await fetchWithSsrFGuard(
withStrictGuardedFetchMode({
url: fetchUrl,
init: {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": apiKey,
"anthropic-version": "2023-06-01",
"anthropic-beta": "pdfs-2024-09-25",
},
body: JSON.stringify({
model: params.modelId,
max_tokens: params.maxTokens ?? 4096,
messages: [{ role: "user", content }],
}),
},
}),
});
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding auditContext for better SSRF log attribution

Neither provider specifies auditContext in the fetchWithSsrFGuard call. When an SSRF attempt is blocked, the guard logs a warning using the value of auditContext ?? "url-fetch", so all blocked attempts from these functions will appear as the generic "url-fetch" context rather than something like "anthropic-pdf" or "gemini-pdf". Adding it would make security incident investigations substantially easier.

For anthropicAnalyzePdf:

Suggested change
const { response: res, release } = await fetchWithSsrFGuard(
withStrictGuardedFetchMode({
url: fetchUrl,
init: {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": apiKey,
"anthropic-version": "2023-06-01",
"anthropic-beta": "pdfs-2024-09-25",
},
body: JSON.stringify({
model: params.modelId,
max_tokens: params.maxTokens ?? 4096,
messages: [{ role: "user", content }],
}),
},
}),
});
);
const { response: res, release } = await fetchWithSsrFGuard(
withStrictGuardedFetchMode({
url: fetchUrl,
auditContext: "anthropic-pdf",
init: {

A similar change for geminiAnalyzePdf (line ~159–173) would set auditContext: "gemini-pdf".

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 68-86

Comment:
**Consider adding `auditContext` for better SSRF log attribution**

Neither provider specifies `auditContext` in the `fetchWithSsrFGuard` call. When an SSRF attempt is blocked, the guard logs a warning using the value of `auditContext ?? "url-fetch"`, so all blocked attempts from these functions will appear as the generic `"url-fetch"` context rather than something like `"anthropic-pdf"` or `"gemini-pdf"`. Adding it would make security incident investigations substantially easier.

For `anthropicAnalyzePdf`:
```suggestion
  const { response: res, release } = await fetchWithSsrFGuard(
    withStrictGuardedFetchMode({
      url: fetchUrl,
      auditContext: "anthropic-pdf",
      init: {
```

A similar change for `geminiAnalyzePdf` (line ~159–173) would set `auditContext: "gemini-pdf"`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +87 to +118
try {

if (!res.ok) {
const body = await res.text().catch(() => "");
throw new Error(
`Anthropic PDF request failed (${res.status} ${res.statusText})${body ? `: ${body.slice(0, 400)}` : ""}`,
);
}

const json = (await res.json().catch(() => null)) as unknown;
if (!isRecord(json)) {
throw new Error("Anthropic PDF response was not JSON.");
}

const responseContent = json.content as AnthropicResponseContent | undefined;
if (!Array.isArray(responseContent)) {
throw new Error("Anthropic PDF response missing content array.");
}

const text = responseContent
.filter((block) => block.type === "text" && typeof block.text === "string")
.map((block) => block.text!)
.join("");

if (!text.trim()) {
throw new Error("Anthropic PDF returned no text.");
}

return text.trim();
} finally {
await release();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try block doesn't cover resource acquisition

The try/finally block opens after the await fetchWithSsrFGuard(...) call, meaning the release returned by a successful call is not guarded against an exception thrown between the awaited result assignment and the try. In practice this is safe today because fetchWithSsrFGuard calls its internal cleanup before re-throwing any error, making the returned release a no-op if it were skipped. However the pattern is fragile and may confuse future maintainers into thinking the try covers the full resource lifecycle.

A more conventional pattern that makes the intent explicit:

let release: (() => Promise<void>) | undefined;
try {
  const result = await fetchWithSsrFGuard(withStrictGuardedFetchMode({ ... }));
  const res = result.response;
  release = result.release;

  // ... response handling
  return text.trim();
} finally {
  await release?.();
}

The same applies to the geminiAnalyzePdf function (lines 174–202).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 87-118

Comment:
**`try` block doesn't cover resource acquisition**

The `try/finally` block opens *after* the `await fetchWithSsrFGuard(...)` call, meaning the `release` returned by a successful call is not guarded against an exception thrown between the awaited result assignment and the `try`. In practice this is safe today because `fetchWithSsrFGuard` calls its internal cleanup before re-throwing any error, making the returned `release` a no-op if it were skipped. However the pattern is fragile and may confuse future maintainers into thinking the `try` covers the full resource lifecycle.

A more conventional pattern that makes the intent explicit:

```typescript
let release: (() => Promise<void>) | undefined;
try {
  const result = await fetchWithSsrFGuard(withStrictGuardedFetchMode({ ... }));
  const res = result.response;
  release = result.release;

  // ... response handling
  return text.trim();
} finally {
  await release?.();
}
```

The same applies to the `geminiAnalyzePdf` function (lines 174–202).

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 970a85dea9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

messages: [{ role: "user", content }],
const fetchUrl = `${baseUrl}/v1/messages`;
const { response: res, release } = await fetchWithSsrFGuard(
withStrictGuardedFetchMode({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep guarded PDF fetches compatible with env proxy setups

Using withStrictGuardedFetchMode here forces fetchWithSsrFGuard down strict mode, which creates a pinned direct dispatcher and bypasses the global env-proxy dispatcher. In deployments that require HTTP(S)_PROXY for outbound access (no direct egress), native PDF calls will now fail for Anthropic/Gemini while the rest of model traffic can still work through the global proxy path, so this change introduces a production regression for proxy-only environments.

Useful? React with 👍 / 👎.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR wraps native Anthropic/Gemini PDF provider requests in strict SSRF-guarded fetches, moves Gemini auth to a header, and adds guarded-fetch release cleanup.

Reproducibility: yes. at source level. Current main passes model.baseUrl into the native PDF helpers, and those helpers still perform raw provider fetches against derived URLs; I did not run live provider credentials.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Summary: The PR targets a real security issue, but missing real behavior proof and a provider-transport regression make it not quality-ready.

Rank-up moves:

  • Rework native PDF requests through a provider-aware guarded fetch path that preserves env-proxy/dispatcher policy.
  • Add redacted real behavior proof for default Anthropic and Gemini native PDF calls plus a blocked private/internal baseUrl.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature, rarity, or ASCII portrait is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

Real behavior proof
Needs real behavior proof before merge: The PR body only has unchecked test-plan items; the contributor needs redacted live terminal/log/screenshot/recording proof for successful provider calls and a blocked private baseUrl, then should update the PR body for re-review.

Risk before merge
Why this matters: - Merging as-is can break Anthropic/Gemini native PDF calls in proxy-only deployments because the PR bypasses the provider transport path that selects trusted env-proxy mode.

  • Direct strict guarded fetch can route provider traffic outside the same controlled egress policy used by other model requests.
  • The branch is conflicting with current main, where Gemini header auth and provider base URL normalization have already changed.
  • No redacted live proof shows successful default Anthropic/Gemini native PDF calls or blocked private/internal baseUrl behavior after the patch.

Maintainer options:

  1. Rework Through Provider Transport (recommended)
    Route native PDF fetches through a provider-aware guarded model fetch helper so SSRF blocking and provider proxy/dispatcher policy are both preserved.
  2. Pause Until Security Proof Exists
    Keep the PR open but unmerged until a maintainer reviews the egress-policy shape and the contributor posts redacted real provider proof.

Next step before merge
Needs security owner review, conflict resolution, and contributor-supplied real behavior proof; automation cannot supply the contributor's live provider proof for this external PR.

Security
Needs attention: The patch attempts useful SSRF hardening but introduces a provider egress-policy regression by bypassing provider transport routing.

Review findings

  • [P1] Preserve provider transport routing — src/agents/tools/pdf-native-providers.ts:68-69
Review details

Best possible solution:

Introduce a provider-transport-aware guarded fetch path for native PDF analysis that preserves SSRF blocking, provider proxy/dispatcher policy, native PDF timeouts, Gemini header auth, audit context, and focused regression coverage.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main passes model.baseUrl into the native PDF helpers, and those helpers still perform raw provider fetches against derived URLs; I did not run live provider credentials.

Is this the best way to solve the issue?

No. The security direction is right, but direct strict guarded fetch is not the best fix because native PDF provider traffic needs to preserve the same provider transport policy as other model traffic.

Label justifications:

  • P1: The PR concerns SSRF and provider credential handling in an agent tool, but the proposed fix can regress real provider egress setups.
  • merge-risk: 🚨 compatibility: Strict guarded fetch can break existing proxy-only deployments that rely on provider transport env-proxy routing.
  • merge-risk: 🚨 auth-provider: The diff changes provider request routing and credential placement for native PDF provider calls.
  • merge-risk: 🚨 security-boundary: Bypassing provider transport can route model-provider traffic outside the intended controlled egress boundary.

Full review comments:

  • [P1] Preserve provider transport routing — src/agents/tools/pdf-native-providers.ts:68-69
    Using withStrictGuardedFetchMode directly sends native PDF POSTs outside buildGuardedModelFetch, where provider calls switch to trusted env-proxy mode when needed. This can break or bypass controlled egress in proxy-only deployments; the same issue applies to the Gemini call later in the file.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.91

Security concerns:

  • [medium] Strict guard bypasses provider proxy policy — src/agents/tools/pdf-native-providers.ts:68
    The new native PDF calls use strict guarded fetch directly, which can fall outside provider transport's trusted env-proxy behavior and break or bypass expected controlled egress for provider requests.
    Confidence: 0.88

Acceptance criteria:

  • node scripts/run-vitest.mjs src/agents/tools/pdf-native-providers.test.ts
  • node scripts/run-vitest.mjs src/agents/provider-transport-fetch.test.ts
  • Redacted live proof for default Anthropic and Gemini native PDF calls plus a blocked private/internal baseUrl

What I checked:

Likely related people:

  • tyler6204: Authored and merged the PDF analysis tool with native provider support, including the native PDF provider helper. (role: introduced behavior; confidence: high; commits: d0ac1b019517; files: src/agents/tools/pdf-native-providers.ts, src/agents/tools/pdf-tool.ts)
  • steipete: Recent commits changed provider policy hooks, Google Generative AI normalization, guarded fetch modes, and the release snapshot containing the current native PDF helper behavior. (role: recent area contributor; confidence: high; commits: c973b053a5e2, 5cdb50abe6c5, d042192c7c9c; files: src/agents/tools/pdf-native-providers.ts, src/agents/provider-transport-fetch.ts, src/infra/net/fetch-guard.ts)
  • 0xsline: Authored the merged Gemini PDF URL normalization fix, which is adjacent to the Gemini provider URL surface touched here. (role: adjacent bug-fix contributor; confidence: medium; commits: bfeea5d23fc6; files: src/agents/tools/pdf-native-providers.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 57028585538c.

@clawsweeper clawsweeper Bot added the P1 High-priority user-facing bug, regression, or broken workflow. label May 16, 2026
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context. labels May 16, 2026
@clawsweeper clawsweeper Bot added impact:security Security boundary, credential, authz, sandbox, or sensitive-data risk. impact:auth-provider Auth, provider routing, model choice, or SecretRef resolution may break. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. and removed impact:security Security boundary, credential, authz, sandbox, or sensitive-data risk. impact:auth-provider Auth, provider routing, model choice, or SecretRef resolution may break. labels May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant