fix: add SSRF guard to Anthropic/Gemini PDF providers and move Gemini API key to header#46377
fix: add SSRF guard to Anthropic/Gemini PDF providers and move Gemini API key to header#46377cdxiaodong wants to merge 1 commit into
Conversation
… API key to header
Greptile SummaryThis PR fixes two concrete security vulnerabilities — SSRF exposure and API credential leakage — in
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 68-86
Comment:
**Consider adding `auditContext` for better SSRF log attribution**
Neither provider specifies `auditContext` in the `fetchWithSsrFGuard` call. When an SSRF attempt is blocked, the guard logs a warning using the value of `auditContext ?? "url-fetch"`, so all blocked attempts from these functions will appear as the generic `"url-fetch"` context rather than something like `"anthropic-pdf"` or `"gemini-pdf"`. Adding it would make security incident investigations substantially easier.
For `anthropicAnalyzePdf`:
```suggestion
const { response: res, release } = await fetchWithSsrFGuard(
withStrictGuardedFetchMode({
url: fetchUrl,
auditContext: "anthropic-pdf",
init: {
```
A similar change for `geminiAnalyzePdf` (line ~159–173) would set `auditContext: "gemini-pdf"`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 87-118
Comment:
**`try` block doesn't cover resource acquisition**
The `try/finally` block opens *after* the `await fetchWithSsrFGuard(...)` call, meaning the `release` returned by a successful call is not guarded against an exception thrown between the awaited result assignment and the `try`. In practice this is safe today because `fetchWithSsrFGuard` calls its internal cleanup before re-throwing any error, making the returned `release` a no-op if it were skipped. However the pattern is fragile and may confuse future maintainers into thinking the `try` covers the full resource lifecycle.
A more conventional pattern that makes the intent explicit:
```typescript
let release: (() => Promise<void>) | undefined;
try {
const result = await fetchWithSsrFGuard(withStrictGuardedFetchMode({ ... }));
const res = result.response;
release = result.release;
// ... response handling
return text.trim();
} finally {
await release?.();
}
```
The same applies to the `geminiAnalyzePdf` function (lines 174–202).
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 970a85d |
| const { response: res, release } = await fetchWithSsrFGuard( | ||
| withStrictGuardedFetchMode({ | ||
| url: fetchUrl, | ||
| init: { | ||
| method: "POST", | ||
| headers: { | ||
| "Content-Type": "application/json", | ||
| "x-api-key": apiKey, | ||
| "anthropic-version": "2023-06-01", | ||
| "anthropic-beta": "pdfs-2024-09-25", | ||
| }, | ||
| body: JSON.stringify({ | ||
| model: params.modelId, | ||
| max_tokens: params.maxTokens ?? 4096, | ||
| messages: [{ role: "user", content }], | ||
| }), | ||
| }, | ||
| }), | ||
| }); | ||
| ); |
There was a problem hiding this comment.
Consider adding auditContext for better SSRF log attribution
Neither provider specifies auditContext in the fetchWithSsrFGuard call. When an SSRF attempt is blocked, the guard logs a warning using the value of auditContext ?? "url-fetch", so all blocked attempts from these functions will appear as the generic "url-fetch" context rather than something like "anthropic-pdf" or "gemini-pdf". Adding it would make security incident investigations substantially easier.
For anthropicAnalyzePdf:
| const { response: res, release } = await fetchWithSsrFGuard( | |
| withStrictGuardedFetchMode({ | |
| url: fetchUrl, | |
| init: { | |
| method: "POST", | |
| headers: { | |
| "Content-Type": "application/json", | |
| "x-api-key": apiKey, | |
| "anthropic-version": "2023-06-01", | |
| "anthropic-beta": "pdfs-2024-09-25", | |
| }, | |
| body: JSON.stringify({ | |
| model: params.modelId, | |
| max_tokens: params.maxTokens ?? 4096, | |
| messages: [{ role: "user", content }], | |
| }), | |
| }, | |
| }), | |
| }); | |
| ); | |
| const { response: res, release } = await fetchWithSsrFGuard( | |
| withStrictGuardedFetchMode({ | |
| url: fetchUrl, | |
| auditContext: "anthropic-pdf", | |
| init: { |
A similar change for geminiAnalyzePdf (line ~159–173) would set auditContext: "gemini-pdf".
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 68-86
Comment:
**Consider adding `auditContext` for better SSRF log attribution**
Neither provider specifies `auditContext` in the `fetchWithSsrFGuard` call. When an SSRF attempt is blocked, the guard logs a warning using the value of `auditContext ?? "url-fetch"`, so all blocked attempts from these functions will appear as the generic `"url-fetch"` context rather than something like `"anthropic-pdf"` or `"gemini-pdf"`. Adding it would make security incident investigations substantially easier.
For `anthropicAnalyzePdf`:
```suggestion
const { response: res, release } = await fetchWithSsrFGuard(
withStrictGuardedFetchMode({
url: fetchUrl,
auditContext: "anthropic-pdf",
init: {
```
A similar change for `geminiAnalyzePdf` (line ~159–173) would set `auditContext: "gemini-pdf"`.
How can I resolve this? If you propose a fix, please make it concise.| try { | ||
|
|
||
| if (!res.ok) { | ||
| const body = await res.text().catch(() => ""); | ||
| throw new Error( | ||
| `Anthropic PDF request failed (${res.status} ${res.statusText})${body ? `: ${body.slice(0, 400)}` : ""}`, | ||
| ); | ||
| } | ||
|
|
||
| const json = (await res.json().catch(() => null)) as unknown; | ||
| if (!isRecord(json)) { | ||
| throw new Error("Anthropic PDF response was not JSON."); | ||
| } | ||
|
|
||
| const responseContent = json.content as AnthropicResponseContent | undefined; | ||
| if (!Array.isArray(responseContent)) { | ||
| throw new Error("Anthropic PDF response missing content array."); | ||
| } | ||
|
|
||
| const text = responseContent | ||
| .filter((block) => block.type === "text" && typeof block.text === "string") | ||
| .map((block) => block.text!) | ||
| .join(""); | ||
|
|
||
| if (!text.trim()) { | ||
| throw new Error("Anthropic PDF returned no text."); | ||
| } | ||
|
|
||
| return text.trim(); | ||
| } finally { | ||
| await release(); | ||
| } |
There was a problem hiding this comment.
try block doesn't cover resource acquisition
The try/finally block opens after the await fetchWithSsrFGuard(...) call, meaning the release returned by a successful call is not guarded against an exception thrown between the awaited result assignment and the try. In practice this is safe today because fetchWithSsrFGuard calls its internal cleanup before re-throwing any error, making the returned release a no-op if it were skipped. However the pattern is fragile and may confuse future maintainers into thinking the try covers the full resource lifecycle.
A more conventional pattern that makes the intent explicit:
let release: (() => Promise<void>) | undefined;
try {
const result = await fetchWithSsrFGuard(withStrictGuardedFetchMode({ ... }));
const res = result.response;
release = result.release;
// ... response handling
return text.trim();
} finally {
await release?.();
}The same applies to the geminiAnalyzePdf function (lines 174–202).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tools/pdf-native-providers.ts
Line: 87-118
Comment:
**`try` block doesn't cover resource acquisition**
The `try/finally` block opens *after* the `await fetchWithSsrFGuard(...)` call, meaning the `release` returned by a successful call is not guarded against an exception thrown between the awaited result assignment and the `try`. In practice this is safe today because `fetchWithSsrFGuard` calls its internal cleanup before re-throwing any error, making the returned `release` a no-op if it were skipped. However the pattern is fragile and may confuse future maintainers into thinking the `try` covers the full resource lifecycle.
A more conventional pattern that makes the intent explicit:
```typescript
let release: (() => Promise<void>) | undefined;
try {
const result = await fetchWithSsrFGuard(withStrictGuardedFetchMode({ ... }));
const res = result.response;
release = result.release;
// ... response handling
return text.trim();
} finally {
await release?.();
}
```
The same applies to the `geminiAnalyzePdf` function (lines 174–202).
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 970a85dea9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| messages: [{ role: "user", content }], | ||
| const fetchUrl = `${baseUrl}/v1/messages`; | ||
| const { response: res, release } = await fetchWithSsrFGuard( | ||
| withStrictGuardedFetchMode({ |
There was a problem hiding this comment.
Keep guarded PDF fetches compatible with env proxy setups
Using withStrictGuardedFetchMode here forces fetchWithSsrFGuard down strict mode, which creates a pinned direct dispatcher and bypasses the global env-proxy dispatcher. In deployments that require HTTP(S)_PROXY for outbound access (no direct egress), native PDF calls will now fail for Anthropic/Gemini while the rest of model traffic can still work through the global proxy path, so this change introduces a production regression for proxy-only environments.
Useful? React with 👍 / 👎.
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. at source level. Current main passes model.baseUrl into the native PDF helpers, and those helpers still perform raw provider fetches against derived URLs; I did not run live provider credentials. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Where did the egg go?
Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Introduce a provider-transport-aware guarded fetch path for native PDF analysis that preserves SSRF blocking, provider proxy/dispatcher policy, native PDF timeouts, Gemini header auth, audit context, and focused regression coverage. Do we have a high-confidence way to reproduce the issue? Yes, at source level. Current main passes model.baseUrl into the native PDF helpers, and those helpers still perform raw provider fetches against derived URLs; I did not run live provider credentials. Is this the best way to solve the issue? No. The security direction is right, but direct strict guarded fetch is not the best fix because native PDF provider traffic needs to preserve the same provider transport policy as other model traffic. Label justifications:
Full review comments:
Overall correctness: patch is incorrect Security concerns:
Acceptance criteria:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 57028585538c. |
Summary
anthropicAnalyzePdf()andgeminiAnalyzePdf()insrc/agents/tools/pdf-native-providers.tsused rawfetch()with a user-controlledbaseUrlparameter. An attacker could setbaseUrlto an internal/private IP address, causing the server to make requests to internal services (SSRF).x-api-keyheader to any attacker-controlled destination.?key=...), exposing it in server logs, proxy logs, and HTTP Referer headers (CWE-598).Changes
fetch()withfetchWithSsrFGuard(withStrictGuardedFetchMode(...))in both functions, which validates the resolved hostname/IP against the SSRF blocklist before connecting.x-goog-api-keyHTTP header to prevent credential leakage.release()cleanup infinallyblocks for both functions.Test plan
https://api.anthropic.combase URLhttps://generativelanguage.googleapis.combase URLbaseUrlto a private/internal IP (e.g.http://169.254.169.254) is blocked by the SSRF guard