Skip to content

feat(agents-api): Add PDF URL attachments and Vercel-compatible file parts in chat#2804

Open
mike-inkeep wants to merge 2 commits intomainfrom
feat/pdf_urls
Open

feat(agents-api): Add PDF URL attachments and Vercel-compatible file parts in chat#2804
mike-inkeep wants to merge 2 commits intomainfrom
feat/pdf_urls

Conversation

@mike-inkeep
Copy link
Contributor

Chat endpoints can accept PDFs referenced by https:// URLs (not only inline base64). The data-stream API’s file parts now follow the Vercel-style shape (url + mediaType, optional filename) instead of embedding payload in text. Failed PDF URL ingestion surfaces as a 400 with a clear error instead of failing opaquely.

@vercel
Copy link

vercel bot commented Mar 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 24, 2026 1:18am
agents-docs Ready Ready Preview, Comment Mar 24, 2026 1:18am
agents-manage-ui Ready Ready Preview, Comment Mar 24, 2026 1:18am

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 23, 2026

🦋 Changeset detected

Latest commit: 7e2dfb9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-api Major
@inkeep/agents-manage-ui Major
@inkeep/agents-cli Major
@inkeep/agents-core Major
@inkeep/agents-email Major
@inkeep/agents-mcp Major
@inkeep/agents-sdk Major
@inkeep/agents-work-apps Major
@inkeep/ai-sdk-provider Major
@inkeep/create-agents Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@mike-inkeep mike-inkeep requested a review from amikofalvy March 23, 2026 20:29
@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 23, 2026

TL;DR — Lifts the restriction that blocked external PDF URLs in chat attachments. PDF URLs (both OpenAI file_data and Vercel url fields) are now downloaded server-side, validated with %PDF- magic-byte sniffing, and inlined as base64 before execution. The Vercel file part schema is also simplified to align with the current FileUIPart shape (url + required mediaType, dropping text/mimeType).

Key changes

  • PDF URL ingestion pipeline — New inlineExternalPdfUrlParts function fetches remote PDF URLs, converts them to base64 bytes parts, and attaches sanitized sourceUrl metadata (query/hash stripped).
  • Vercel file part schema alignmentVercelFilePartSchema now uses url (was text) and a required mediaType (was optional mediaType | mimeType union with refinement).
  • OpenAI file_data accepts HTTP URLsFileContentItemSchema.file.file_data is widened from PdfDataUriSchema to PdfDataOrUrlSchema (data: URI or https:// URL).
  • Security: byte-level PDF validationresolveDownloadedFileMimeType gains an expectedMimeType parameter; when set to application/pdf it validates %PDF- header bytes instead of image-sniffing.
  • Explicit PdfUrlIngestionError — Replaces the old BlockedExternalPdfUrlNotSupportedError; caught in both chat routes and surfaced as a 400 bad_request.
  • DNS lookup refactor — Extracts forwardLookupResult for testability; now checks all resolved addresses for private IPs, not just the first.

Summary | 21 files | 1 commit | base: mainfeat/pdf_urls


PDF URL ingestion pipeline

Before: External PDF URLs in file parts were rejected with BlockedExternalPdfUrlNotSupportedError; only base64 data: URIs were accepted.
After: Remote PDF URLs are downloaded via downloadExternalFile with SSRF protections, validated for %PDF- magic bytes, converted to inline base64, and passed to the execution handler.

The new inlineExternalPdfUrlParts runs after Zod parsing but before execution in both /chat/completions and /run/api/chat routes. On failure it throws PdfUrlIngestionError (with a sanitized sourceUrl), which both routes catch and convert to a 400 response.

file-upload-helpers.ts · file-content-security.ts · file-security-errors.ts · chat.ts · chatDataStream.ts


Vercel file part schema and OpenAI file_data widening

Before: Vercel file parts used text for the URI and accepted either mediaType or mimeType via a runtime refinement. OpenAI file_data only accepted data:application/pdf;base64,....
After: Vercel file parts use url with a required mediaType (no mimeType fallback). OpenAI file_data accepts either a PDF data URI or an HTTP URL via PdfDataOrUrlSchema.

Field Old New
Vercel file URI text: string url: string
Vercel MIME mediaType? | mimeType? + refinement mediaType: string (required)
OpenAI file_data PdfDataUriSchema only PdfDataOrUrlSchema (data URI | HTTP URL)

chat.ts (types) · message-parts.ts · openapi.json


Security hardening and DNS lookup refactor

Before: The DNS lookup callback only checked the first resolved address for private IPs, and sourceUrl metadata was not persisted.
After: forwardLookupResult checks all resolved addresses, makeSanitizedSourceUrl strips query strings and fragments before persisting, and file-upload.ts attaches sourceUrl metadata for remote HTTP attachments.

external-file-downloader.ts · file-url-security.ts · file-upload.ts


Tests and documentation

New and updated tests cover PDF download with byte validation, PdfUrlIngestionError 400 responses in both chat routes, forwardLookupResult with mixed public/private DNS records, makeSanitizedSourceUrl stripping, and Vercel FileUIPart shape acceptance. Documentation in chat-api.mdx is updated to reflect that PDF file inputs now accept both remote URLs and data URIs, and the asymmetric-restriction note is removed.

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 23, 2026

TL;DR — Lifts the restriction that blocked external PDF URLs in chat attachments. PDF URLs (both OpenAI file_data and Vercel url fields) are now downloaded server-side, validated with %PDF- magic-byte sniffing, and inlined as base64 before execution. The Vercel file part schema is also simplified to align with the current FileUIPart shape (url + required mediaType, dropping text/mimeType) — this is a breaking change for /run/api/chat consumers.

Key changes

  • PDF URL ingestion pipeline — New inlineExternalPdfUrlParts function fetches remote PDF URLs, converts them to base64 bytes parts, and attaches sanitized sourceUrl metadata (query/hash stripped). Data URIs are explicitly skipped via an isRemoteHttpOrHttpsUrl guard.
  • BREAKING: Vercel file part schema alignmentVercelFilePartSchema now uses url (was text) and a required mediaType (was optional mediaType | mimeType union with refinement). Changeset bumped to minor.
  • OpenAI file_data accepts HTTP URLsFileContentItemSchema.file.file_data is widened from PdfDataUriSchema to PdfDataOrUrlSchema (data: URI or https:// URL).
  • Security: byte-level PDF validationresolveDownloadedFileMimeType gains an expectedMimeType parameter; when set to application/pdf it validates %PDF- header bytes instead of image-sniffing.
  • Explicit PdfUrlIngestionError — Replaces the old BlockedExternalPdfUrlNotSupportedError; caught in both chat routes and surfaced as a 400 bad_request.
  • DNS lookup refactor — Extracts forwardLookupResult for testability; now checks all resolved addresses for private IPs, not just the first.

Summary | 21 files | 2 commits | base: mainfeat/pdf_urls


PDF URL ingestion pipeline

Before: External PDF URLs in file parts were rejected with BlockedExternalPdfUrlNotSupportedError; only base64 data: URIs were accepted.
After: Remote PDF URLs are downloaded via downloadExternalFile with SSRF protections, validated for %PDF- magic bytes, converted to inline base64, and passed to the execution handler. Data URIs are left untouched — the new isRemoteHttpOrHttpsUrl helper ensures only http(s):// URIs trigger a download.

The new inlineExternalPdfUrlParts runs after Zod parsing but before execution in both /chat/completions and /run/api/chat routes. On failure it throws PdfUrlIngestionError (with a sanitized sourceUrl), which both routes catch and convert to a 400 response.

file-upload-helpers.ts · file-content-security.ts · file-security-errors.ts · chat.ts · chatDataStream.ts


Vercel file part schema and OpenAI file_data widening

Before: Vercel file parts used text for the URI and accepted either mediaType or mimeType via a runtime refinement. OpenAI file_data only accepted data:application/pdf;base64,....
After: Vercel file parts use url with a required mediaType (no mimeType fallback). OpenAI file_data accepts either a PDF data URI or an HTTP URL via PdfDataOrUrlSchema. This is a breaking change — the changeset is bumped to minor.

Field Old New
Vercel file URI text: string url: string
Vercel MIME mediaType? | mimeType? + refinement mediaType: string (required)
OpenAI file_data PdfDataUriSchema only PdfDataOrUrlSchema (data URI | HTTP URL)

chat.ts (types) · message-parts.ts · openapi.json


Security hardening and DNS lookup refactor

Before: The DNS lookup callback only checked the first resolved address for private IPs, and sourceUrl metadata was not persisted.
After: forwardLookupResult checks all resolved addresses, makeSanitizedSourceUrl strips query strings and fragments before persisting, and file-upload.ts attaches sourceUrl metadata for remote HTTP attachments.

external-file-downloader.ts · file-url-security.ts · file-upload.ts


Tests and documentation

New and updated tests cover PDF download with byte validation, PdfUrlIngestionError 400 responses in both chat routes, forwardLookupResult with mixed public/private DNS records, makeSanitizedSourceUrl stripping, data URI pass-through (no download), and Vercel FileUIPart shape acceptance. Documentation in chat-api.mdx is updated with a remote PDF URL example and notes on error behavior for unreachable hosts, non-PDF responses, and oversized bodies.

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid feature addition. The SSRF protections are properly extended to the new PDF URL path (same downloadExternalFile pipeline with validateExternalFileUrl, validateUrlResolvesToPublicIp, private-IP checks, and byte-level PDF signature validation). Test coverage is thorough. A few issues below — one is a medium-severity breaking change concern, the rest are low-severity cleanups.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏

}

const file = part.file;
if (!('uri' in file) || !file.uri || file.mimeType?.toLowerCase() !== 'application/pdf') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlineExternalPdfUrlParts matches on 'uri' in file && file.uri but does not filter to HTTP(S) URIs. If an internal code path ever produces a file part with { uri: 'data:application/pdf;base64,...', mimeType: 'application/pdf' }, this would attempt to pass a data URI into downloadExternalFile, which would then fail at validateExternalFileUrl (scheme check). That said, buildFilePart in message-parts.ts correctly parses data URIs into { bytes } parts so this can't happen from user input today — but the guard is implicit rather than explicit. Consider adding && !file.uri.startsWith('data:') or checking for http(s) to make the contract self-documenting and resilient to future refactors.

mediaType: z.string().optional(),
mimeType: z.string().optional(),
url: z.string(),
mediaType: z.string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url: z.string() accepts any string. The old text field was also z.string(), but in practice it was always a data URI. Now that this field explicitly supports HTTP URLs alongside data URIs, consider tightening validation — e.g. require the value to be either a valid data URI or an HTTP(S) URL. Without this, a malformed url reaches buildFilePart and only fails with a generic Invalid file URI error, losing the opportunity for a clear Zod validation message at the schema boundary.

return;
}

callback(null, address);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavioral change: the old code called callback(null, selected.address, selected.family) (single string + family), the new code calls callback(null, address) (full LookupAddress[] array). This is correct for undici's lookup callback (which accepts both overloads), but worth a brief comment or test confirming multi-record responses work end-to-end through the dispatcher, since this changes which IP undici actually connects to (previously always address[0], now undici picks from the full list).


const logger = getLogger('file-security');

export function makeSanitizedSourceUrl(rawUrl: string): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeSanitizedSourceUrl is functionally identical to the private toSanitizedUrl in external-file-downloader.ts:256. Consider consolidating — either re-export this function from file-url-security.ts and use it in the downloader, or keep one and delete the other.

Suggested change
export function makeSanitizedSourceUrl(rawUrl: string): string {

Comment on lines +83 to +92
...(part.metadata || remoteAttachmentSourceUrl
? {
metadata: {
...(part.metadata || {}),
...(remoteAttachmentSourceUrl
? { sourceUrl: makeSanitizedSourceUrl(remoteAttachmentSourceUrl) }
: {}),
},
}
: {}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ternary is hard to scan. The semantics are: "merge part.metadata and optionally add sourceUrl". A flattened expression is clearer and avoids the double-spread of {}.

Suggested change
...(part.metadata || remoteAttachmentSourceUrl
? {
metadata: {
...(part.metadata || {}),
...(remoteAttachmentSourceUrl
? { sourceUrl: makeSanitizedSourceUrl(remoteAttachmentSourceUrl) }
: {}),
},
}
: {}),
metadata: {
...(part.metadata || {}),
...(remoteAttachmentSourceUrl
? { sourceUrl: makeSanitizedSourceUrl(remoteAttachmentSourceUrl) }
: {}),
},

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(5) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

Inline Comments:

  • 🔴 Critical: types/chat.ts:78-83 Breaking API contract: VercelFilePartSchema changes texturl, removes mimeType, makes mediaType required

🟠⚠️ Major (2) 🟠⚠️

Inline Comments:

  • 🟠 Major: .changeset/informal-green-porpoise.md:1-5 Version bump should be minor for breaking schema changes
  • 🟠 Major: chat-api.mdx:82-86 Missing PDF URL example in documentation

🟡 Minor (2) 🟡

🟡 1) file-security-errors.ts:139 Error message lacks actionable guidance

Issue: PdfUrlIngestionError says "Failed to ingest PDF URL: {url}" without explaining what went wrong or how to fix it.
Why: Users won't know if the issue is: URL unreachable, file too large, invalid PDF, or timeout. No guidance on resolution.
Fix: Consider surfacing selective cause information: "Failed to ingest PDF URL: the file does not appear to be a valid PDF" or "...the server returned an error".
Refs: file-security-errors.ts:135-142

🟡 2) file-url-security.test.ts Incomplete edge case coverage for makeSanitizedSourceUrl

Issue: Only tested with URL containing both query and hash. Missing: query-only, hash-only, already-clean URLs.
Why: Regression risk if URL sanitization logic is modified.
Fix: Add edge case tests for partial URL components.
Refs: file-url-security.test.ts:4-7

💭 Consider (1) 💭

Inline Comments:

  • 💭 Consider: chat.ts:555-560 Add logging before converting PdfUrlIngestionError to API error

🧹 While You're Here (2) 🧹

🧹 1) file-upload-helpers.ts:85-93 Silent fallback on file upload failure

Issue: buildPersistedMessageContent silently falls back to text-only content when file upload fails, logging the error but continuing without user notification.
Why: Users' file attachments are silently dropped with no indication. This is pre-existing behavior but worth noting.
Fix: Consider adding a flag to returned content indicating file parts were dropped, or propagating the error if critical.
Refs: file-upload-helpers.ts:85-93

🧹 2) file-upload.ts:117-140 Silent file part dropping on upload errors

Issue: uploadPartsFiles catches all errors during file uploads and silently drops failed parts from results.
Why: Combined with buildPersistedMessageContent, creates two-layer silent failure. Pre-existing behavior.
Fix: Consider surfacing partial upload failures to callers.
Refs: file-upload.ts:117-140


🚫 REQUEST CHANGES

Summary: The core PDF URL ingestion feature is well-implemented with robust SSRF protections. However, the Vercel file part schema change (texturl, mimeType removal) is a breaking API contract change that requires either:

  1. Backward compatibility: Accept both old and new property names during transition, OR
  2. Proper versioning: Bump to minor version and document migration path

The changeset currently uses patch for what is effectively a breaking change. Please address the schema compatibility concern before merging.

Discarded (3)
Location Issue Reason Discarded
file-upload-helpers.ts:20 Non-PDF pass-through not explicitly tested Test coverage suggestion; low criticality
file-upload-helpers.ts:30 Case-insensitive mimeType matching not tested Edge case; code handles it correctly
external-file-downloader.ts:85 expectedMimeType undefined behavior not tested Implicit in code paths; low priority
Reviewers (6)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-breaking-changes 3 0 0 0 2 0 1
pr-review-errors 4 1 1 2 0 0 0
pr-review-tests 6 1 0 0 0 0 3
pr-review-docs 3 0 0 0 1 0 2
pr-review-standards 0 0 0 0 0 0 0
pr-review-security-iam 0 0 0 0 0 0 0
Total 16 2 1 2 3 0 6

Note: Security review confirmed SSRF protections are adequate (DNS validation, connection-time IP checks, redirect re-validation).

Comment on lines +78 to 83
export const VercelFilePartSchema = z.object({
type: z.literal('file'),
text: z.string(),
mediaType: z.string().optional(),
mimeType: z.string().optional(),
url: z.string(),
mediaType: z.string(),
filename: z.string().optional(),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL: Breaking API contract change

Issue: VercelFilePartSchema changes from text to url property, removes mimeType option, and makes mediaType required. This breaks existing clients.

Why: Clients sending file parts with the previous shape will receive 400 validation errors:

  • text property → renamed to url
  • mimeType option → removed
  • mediaType → now required (was optional)

This affects @inkeep/ai-sdk-provider, Chat Widget, and direct API consumers.

Fix: Consider backward compatibility for transition:

Suggested change
export const VercelFilePartSchema = z.object({
type: z.literal('file'),
text: z.string(),
mediaType: z.string().optional(),
mimeType: z.string().optional(),
url: z.string(),
mediaType: z.string(),
filename: z.string().optional(),
});
export const VercelFilePartSchema = z.object({
type: z.literal('file'),
url: z.string().optional(),
text: z.string().optional(), // Deprecated: use 'url' instead
mediaType: z.string().optional(),
mimeType: z.string().optional(), // Deprecated: use 'mediaType' instead
filename: z.string().optional(),
}).refine(
(v) => Boolean(v.url || v.text),
{ message: 'Either url or text (deprecated) is required' }
).refine(
(v) => Boolean(v.mediaType || v.mimeType),
{ message: 'Either mediaType or mimeType (deprecated) is required' }
).transform((v) => ({
type: v.type,
url: v.url || v.text || '',
mediaType: v.mediaType || v.mimeType || '',
filename: v.filename,
}));

Alternatively, if this is an intentional breaking change, bump to minor version and document the migration path.

Refs:

Comment on lines +1 to +5
---
"@inkeep/agents-api": patch
---

Add PDF URL ingestion for chat attachments with Vercel-compatible file part shape and explicit bad-request errors on PDF URL ingest failures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Version bump should be minor for breaking schema changes

Issue: This changeset uses patch for a breaking API contract change (Vercel file part schema).

Why: Per semver and AGENTS.md guidelines, breaking changes require minor at minimum. Consumers relying on semver will auto-upgrade and experience failures.

Fix:

Suggested change
---
"@inkeep/agents-api": patch
---
Add PDF URL ingestion for chat attachments with Vercel-compatible file part shape and explicit bad-request errors on PDF URL ingest failures.
---
"@inkeep/agents-api": minor
---
BREAKING: Change Vercel file part schema from `text`/`mimeType` to `url`/`mediaType` (Vercel FileUIPart-compatible). Add PDF URL ingestion for chat attachments with explicit bad-request errors on PDF URL ingest failures.

Refs:

Comment on lines 82 to 86
"type": "file",
"text": "data:application/pdf;base64,JVBERi0xLjQKJcTl8uXr...",
"url": "data:application/pdf;base64,JVBERi0xLjQKJcTl8uXr...",
"mediaType": "application/pdf",
"filename": "report.pdf"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Missing PDF URL example

Issue: Documentation announces PDF URL support but only shows a data URI example. Users trying the new feature have no reference for the correct HTTPS URL shape.

Why: Users will guess incorrectly (e.g., using wrong property names), get 400 errors, and waste time debugging.

Fix: Add an HTTPS PDF URL example alongside the data URI example:

{
  "type": "file",
  "url": "https://example.com/document.pdf",
  "mediaType": "application/pdf",
  "filename": "document.pdf"
}

Also consider adding a note about error handling: "External PDF URLs must be publicly accessible; if unreachable or invalid, returns HTTP 400."

Comment on lines +555 to +560
if (error instanceof PdfUrlIngestionError) {
throw createApiError({
code: 'bad_request',
message: error.message,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Add logging before converting to API error

Issue: The cause chain from PdfUrlIngestionError is not logged before it's converted to a generic 400. Operators won't have visibility into why ingestion failed (DNS failure? Timeout? Invalid bytes?).

Why: Debugging production issues becomes harder without the underlying cause in logs.

Fix:

Suggested change
if (error instanceof PdfUrlIngestionError) {
throw createApiError({
code: 'bad_request',
message: error.message,
});
}
if (error instanceof PdfUrlIngestionError) {
logger.warn(
{ sourceUrl: error.sourceUrl, cause: error.cause?.message },
'PDF URL ingestion failed'
);
throw createApiError({
code: 'bad_request',
message: error.message,
});
}

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) New Issues | Risk: Low (delta only)


✅ Prior Blocking Issues Addressed

The commit 4bd14def3 ("Address main review feedback") resolves the key concerns from the prior review:

Prior Issue Status
🟠 Major: Changeset used patch for breaking schema change ✅ Fixed — now minor with BREAKING: prefix
🟠 Major: Missing PDF URL example in documentation ✅ Fixed — example added at lines 88-92
pullfrog: Data URI could reach downloadExternalFile ✅ Fixed — isRemoteHttpOrHttpsUrl guard added (line 39)

✅ New Code Quality

The delta adds:

  1. isRemoteHttpOrHttpsUrl helper — Clean guard that prevents data URIs from being passed to the external file downloader
  2. Test coverage — New test 'does not download when the PDF uri is a data URI' validates the guard
  3. Documentation — Remote PDF URL example with proper shape (url, mediaType, filename)

🕐 Pending Recommendations (from prior review)

These items were raised previously and remain applicable as minor improvements (not blocking):


✅ APPROVE

Summary: The blocking issues from the prior review have been addressed. The changeset now correctly uses minor with a BREAKING: annotation, the documentation includes the PDF URL example, and the data URI guard prevents the edge case pullfrog identified. The remaining minor suggestions are nice-to-haves that can be addressed in follow-up work. Ship it! 🚀

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator (delta) 0 0 0 0 0 3 0
Total 0 0 0 0 0 3 0

Note: Delta review — only analyzed changes since commit 9ddb7fd00. Prior minor findings remain as pending recommendations.

@itoqa
Copy link

itoqa bot commented Mar 23, 2026

Ito Test Report ❌

12 test cases ran. 1 failed, 11 passed.

Across the unified verification run, 12 test cases were executed with 11 passing and 1 failing, showing that the primary file-part workflows on /run/v1/chat/completions and /run/api/chat generally behaved as expected in both streaming and non-stream modes (including remote PDF URL handling and Vercel data-stream responses). The key issue is a confirmed High-severity bug where oversized inline PDF data (>10MB) is incorrectly accepted and streamed with HTTP 200 instead of rejected with HTTP 400 due to upload errors being swallowed, while other important checks passed for schema/contract enforcement (legacy contract rejection, required mediaType validation, malformed base64 and non-PDF blocking, mixed image+PDF shape validation, mobile docs contract accuracy, and URL policy blocking of unsupported schemes/ports).

❌ Failed (1)
Category Summary Screenshot
Edge ⚠️ Oversized inline PDF payload (>10MB) is accepted and streamed with HTTP 200 instead of being rejected with HTTP 400. EDGE-2
⚠️ Oversized inline PDF uploads are accepted instead of rejected
  • What failed: The API returns 200 and starts streaming instead of rejecting the oversized inline PDF with a 400 size-limit error.
  • Impact: Oversized file payloads bypass the intended request boundary and continue execution, weakening API-side safety controls and increasing risk of expensive or unstable processing. Clients cannot reliably enforce the documented 10MB contract because invalid requests appear successful.
  • Introduced by this PR: Yes – this PR modified the relevant code
  • Steps to reproduce:
    1. Send a POST request to /run/v1/chat/completions with stream set to true.
    2. Include a user message content array with a file part whose file_data is a data:application/pdf;base64 URI larger than 10MB.
    3. Observe that the response returns HTTP 200 with stream output instead of a 400 size-limit validation error.
  • Code analysis: The size guard exists for inline bytes, but upload-time failures are swallowed and the file part is dropped; then persisted content falls back to text-only and execution continues. This allows an oversized attachment request to proceed rather than fail fast.
  • Why this is likely a bug: The production code explicitly detects oversized inline files but then suppresses that error and continues request execution, which directly contradicts the endpoint contract that oversized payloads must be rejected.

Relevant code:

agents-api/src/domains/run/services/blob-storage/file-content-security.ts (lines 62-65)

function validateInlineFileSize(data: Uint8Array): void {
  if (data.length > MAX_FILE_BYTES) {
    throw new BlockedInlineFileExceedingError(MAX_FILE_BYTES);
  }
}

agents-api/src/domains/run/services/blob-storage/file-upload.ts (lines 114-140)

try {
  const uploaded = await uploadFilePart(part, ctx, index);
  results[index] = uploaded;
} catch (error) {
  const file =
    part.kind === 'file'
      ? {
          ...(part.file.mimeType ? { mimeType: part.file.mimeType } : {}),
          ...('uri' in part.file && part.file.uri ? { uri: part.file.uri } : {}),
          ...('bytes' in part.file && part.file.bytes
            ? { bytesLength: part.file.bytes.length }
            : {}),
        }
      : undefined;
  logger.error(
    {
      error: error instanceof Error ? error.message : String(error),
      index,
    },
    'Failed to upload file part, dropping from persisted message to avoid storing base64 in DB'
  );
}

agents-api/src/domains/run/services/blob-storage/file-upload-helpers.ts (lines 80-104)

try {
  const uploadedParts = await uploadPartsFiles(parts, ctx);
  const contentParts = makeMessageContentParts(uploadedParts);
  return { text, parts: contentParts };
} catch (error) {
  logger.error(
    {
      error: error instanceof Error ? error.message : String(error),
      messageId: ctx.messageId,
    },
    'Failed to upload files, persisting text only'
  );
  return { text };
}
✅ Passed (11)
Category Summary Screenshot
Adversarial Unsupported scheme (file://) and disallowed port (:8080) inputs were rejected with 400 responses as expected. ADV-3
Edge Non-PDF content on a PDF-declared file URL path is handled by ingestion/security checks and mapped to bad-request behavior. EDGE-1
Edge Invalid mixed-part image shape is rejected, and the contract-correct payload passes schema validation. EDGE-3
Edge iPhone 12 docs view for /talk-to-your-agents/chat-api remained readable and showed the current file-part contract (type: "file", url, mediaType) without legacy text + mimeType guidance. EDGE-4
Logic Re-execution returned HTTP 400 for the legacy file-part payload; prior blocked state was environment-context related, not contract acceptance. LOGIC-1
Logic Request without mediaType failed with HTTP 400, confirming fail-fast contract validation for required file-part fields. LOGIC-2
Logic Malformed inline PDF base64 was rejected with HTTP 400 JSON validation and did not start SSE streaming. LOGIC-3
Logic Content-plus-parts handling remained deterministic; prior 500/404 behavior was consistent with local setup artifacts rather than parser defects. LOGIC-4
Happy-path Remote PDF URL requests to /run/v1/chat/completions returned HTTP 200 with SSE text/event-stream output. ROUTE-1
Happy-path Vercel-style remote PDF file part (url + mediaType) on /run/api/chat returned HTTP 200 with x-vercel-ai-data-stream: v2. ROUTE-4
Happy-path Non-stream /run/api/chat file-part request returned HTTP 200 JSON chat.completion with assistant content and finish_reason=stop. ROUTE-5

Commit: 4bd14de

View Full Run


Tell us how we did: Give Ito Feedback

…parts in chat

Chat endpoints can accept PDFs referenced by https:// URLs (not only inline base64). The data-stream API’s file parts now follow the Vercel-style shape (url + mediaType, optional filename) instead of embedding payload in text. Failed PDF URL ingestion surfaces as a 400 with a clear error instead of failing opaquely.
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) New Blocking Issues | Risk: Low

This is a re-review after a rebase. The prior review (2026-03-23T20:58:30Z) approved this PR after blocking issues were addressed. No new code changes have been introduced since then.

✅ Prior Blocking Issues Remain Addressed

Prior Issue Status
🟠 Changeset used patch for breaking schema change ✅ Fixed — now minor with BREAKING: prefix
🟠 Missing PDF URL example in documentation ✅ Fixed — example added at lines 88-92
🟠 Data URI could reach downloadExternalFile ✅ Fixed — isRemoteHttpOrHttpsUrl guard added

💭 Consider (1)

Inline Comments:

  • 💭 Consider: file-upload-helpers.ts:17 HTTPS-only for external PDF downloads (security hardening)

🧹 While You're Here (2)

🧹 1) file-upload-helpers.ts:95-103 & file-upload.ts:117-140 Silent fallback on file upload failure

Issue: When file uploads fail (including size validation errors), errors are caught and silently swallowed — file parts are dropped, and execution continues with text-only content.
Why: The Ito test failure ("Oversized inline PDF uploads are accepted instead of rejected") is caused by this pre-existing pattern, not this PR. An oversized file throws BlockedInlineFileExceedingError, but it's caught and silently dropped rather than surfaced as a 400 error.
Fix: Consider propagating BlockedInlineFileExceedingError as a client error instead of swallowing it. This is a broader architectural change that would benefit from a separate PR.
Refs: file-upload-helpers.ts:95-103 · file-upload.ts:117-140

🧹 2) chat-api.mdx:56 Documentation refers to OpenAI type name in Vercel context

Issue: Line 56 references image_url in parentheses, but this page documents the Vercel AI SDK endpoint where the type is image (as shown in the example at line 74).
Why: Users may expect type: "image_url" when constructing requests, but the schema requires type: "image".
Fix: Change to **Images (image)**: support either remote URLs... to match the Vercel schema.
Refs: chat-api.mdx:56

🕐 Pending Recommendations (from prior reviews)


✅ APPROVE

Summary: The core PDF URL ingestion feature is well-implemented with robust SSRF protections (DNS validation, private IP blocking, redirect re-validation, PDF byte signature verification). The breaking schema change is properly documented in the changeset as minor with BREAKING: prefix. Prior blocking issues have been addressed. The Ito test failure about oversized files is a pre-existing architectural issue — not introduced by this PR. Ship it! 🚀

Discarded (4)
Location Issue Reason Discarded
chat.ts:80 URL validation uses z.string() Defense-in-depth concern only; downstream validation catches invalid URLs. Already raised by pullfrog.
file-upload-helpers.test.ts Missing test for non-file parts passthrough Low criticality; core paths are tested
file-upload-helpers.test.ts Missing test for non-PDF file parts passthrough Low criticality; unlikely regression path
file-content-security.test.ts Missing test for image fallback when no expectedMimeType Covered implicitly by integration tests
Reviewers (5)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-security-iam 2 0 1 0 1 0 1
pr-review-breaking-changes 1 0 0 0 0 0 0
pr-review-tests 5 0 0 0 0 0 4
pr-review-docs 1 0 0 1 0 0 0
Total 9 0 1 1 1 4 5

Note: Delta review after rebase — prior blocking issues already addressed in commit 7e2dfb9b3.


function isRemoteHttpOrHttpsUrl(uri: string): boolean {
const lower = uri.toLowerCase();
return lower.startsWith('https://') || lower.startsWith('http://');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: HTTPS-only for external PDF downloads

Issue: This function allows both HTTP and HTTPS URLs for PDF downloads. While existing image download behavior also allows HTTP, downloading PDFs over unencrypted connections exposes file contents to network-level attackers.

Why: A man-in-the-middle attacker could intercept HTTP PDF downloads and inject malicious content that passes byte validation (valid PDF with malicious payloads). The risk is mitigated by the server-to-external-host connection model, but HTTPS-only would be more secure.

Fix: Consider restricting to HTTPS-only for this security-sensitive use case:

function isRemoteHttpsUrl(uri: string): boolean {
  return uri.toLowerCase().startsWith('https://');
}

Refs:

@github-actions github-actions bot deleted a comment from claude bot Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant