fix(media): validate Content-Type and magic bytes before sending to model#793
fix(media): validate Content-Type and magic bytes before sending to model#793howie wants to merge 4 commits into
Conversation
…odel Fixes openabdev#776. When a Slack bot token lacks the `files:read` OAuth scope, Slack serves the workspace login HTML page (~55 KB) at HTTP 200 with a `text/html` Content-Type instead of the requested file binary. `download_and_encode_image` previously accepted this response because: 1. It never inspected the HTTP response `Content-Type` header. 2. On `resize_and_compress` failure for a body ≤ 1 MB it fell back to forwarding the raw bytes under the Slack-reported MIME (`image/png`), bypassing any format check. The result: a `ContentBlock::Image { media_type: "image/png", data: <base64 HTML> }` flowed through to Anthropic, which 400'd with "Could not process image". Because claude-agent-acp persists the user message into the session JSONL before the API reply, the bad block replayed on every subsequent turn in that Slack thread until an operator manually deleted the JSONL inside the pod. Changes: - Add `MediaFetchError` enum to `src/media.rs` so callers can distinguish "not an image, skip silently" (`NotAnImage`) from "claimed image, got unexpected bytes" (`UnsupportedResponseType`, `InvalidImageBody`). - Add `validate_image_response(content_type, body)` pure helper that: - Rejects any HTTP response whose Content-Type (stripped of params, lowercased) is not in `{image/png, image/jpeg, image/gif, image/webp}`. - Sniffs magic bytes via `image::ImageReader::with_guessed_format()` (no new dependencies) and rejects anything that doesn't decode as one of the four supported formats. - Change `download_and_encode_image` signature from `-> Option<ContentBlock>` to `-> Result<ContentBlock, MediaFetchError>`, capturing the Content-Type header before consuming the response with `.bytes()`. - Remove the ≤ 1 MB resize-error fallback that was the direct bug path. - Update `src/slack.rs` call site: on validation failure, collect filenames and post one aggregated user-visible warning to the Slack thread: ":warning: I couldn't access the file(s) you shared (`<name>`). This often means the bot is missing the `files:read` OAuth scope. Please ask an admin to reinstall the app with that scope." - Update `src/discord.rs` call site: `warn!` log on failure (Discord URLs are signed-public so the Slack scope hint is not applicable there). Preserve the existing `is_video_file` fallback for `Err(NotAnImage)`. - Add 12 unit tests for `validate_image_response` including the exact bug repro case (HTML body labeled `image/png`, first 8 bytes `3c21444f43545950`). Out of scope / follow-up issues: - Secondary defense: deferring claude-agent-acp JSONL persistence until after model returns 200 (requires changes in the claude-agent-acp Node project). - Startup preflight calling Slack `auth.test` to warn loudly on missing scopes. - Same Content-Type/magic-byte hardening for `download_and_transcribe` and `download_and_read_text_file`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
All PRs must reference a prior Discord discussion to ensure community alignment before implementation. Please edit the PR description to include a link like: This PR will be automatically closed in 3 days if the link is not added. |
- Remove dead hinted field from UnsupportedResponseType (always None) - Eliminate double reader.format() call with fmt@ binding - Deduplicate hex_prefix() in resize error path (compute once, reuse) - Promote strip_mime_params to media::strip_mime_params (pub crate), slack.rs delegates to it -- single source of truth for MIME stripping Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Critical: change Content-Type check from allow-list to block-list (Codex finding). The allow-list rejected application/octet-stream before magic-byte check ran, silently dropping valid images from CDNs. Only text/* is now rejected early; everything else falls through to magic-byte verification. Also: - Soften Slack warning message: no longer attributes all failures to files:read scope; now mentions format support as a second cause - Add SizeExceeded to Slack user notification (was silent) - Log failures from send_message() instead of using let _ = - Log discarded io::Error from with_guessed_format - Fix doc comments: download_and_encode_image (SizeExceeded fires pre-HTTP), validate_image_response (Content-Type check short-circuits, not sequential) - Replace inline "Validate Content-Type..." comment with WHY explanation - Restore doc comment on strip_mime_params wrapper in slack.rs - Add tests: octet-stream acceptance (Codex regression fix), JSON body rejection by magic bytes, missing Content-Type + invalid body Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex adversarial review found that user-controlled filenames embedded in the mrkdwn warning message could inject Slack markup (backtick break-out, <!here> mentions, <@uid> pings). Replace backticks and angle brackets with safe ASCII equivalents before embedding in the message. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex Challenge Report — Adversarial ReviewFinding 1: Slack filename mrkdwn injection [FIXED]Filenames embedded in the Slack warning message were user-controlled. A filename containing backticks, Finding 2: Corrupt GIF bodies pass magic-byte check [Known Issue — not in scope]GIF format is detected by magic bytes ( This is pre-existing behavior from before this PR. Fixing it would require decoding GIF frames for validation, which risks breaking animated GIF support. Filed as a known limitation; a follow-up PR should add frame-count validation for GIFs. Finding 3: failed_image_files Vec is unbounded per event [Acceptable]The Vec is bounded by Slack's own message attachment limit (~20 files). Not a persistent leak. Acceptable for now. No TOCTOU between Content-Type capture and body readHeaders and body come from the same immutable hex_prefix cannot panicUses Mixed success: one valid PNG + one HTML file in same messageValid PNG → pushed to Generated by /pr-review-cycle-codex Step 8 — Codex adversarial challenge |
|
Discord Discussion URL: https://discord.com/channels/1491295327620169908/1491969620754567270/1503586535088590868 |
Fixes #776.
Root cause
When a Slack bot token lacks the
files:readOAuth scope, Slack serves the workspace login HTML page (~55 KB) at HTTP 200 withContent-Type: text/htmlinstead of the requested file binary.download_and_encode_imageaccepted this response because:Content-Typeheader.resize_and_compressfailure for a body <= 1 MB it fell back to forwarding raw bytes under the Slack-reported MIME (image/png), bypassing any format check.The result: a
ContentBlock::Image { media_type: "image/png", data: <base64 of HTML> }flowed to Anthropic, which 400'd withCould not process image. Because claude-agent-acp persists the user message into the session JSONL before the API reply, the bad block replayed on every subsequent turn until an operator manually deleted the JSONL inside the pod.Changes
src/media.rs(primary change)MediaFetchErrorenum:NotAnImage(silent skip),UnsupportedResponseType,InvalidImageBody,SizeExceeded,Network,HttpStatus.validate_image_response(content_type, body)pure helper that:{image/png, image/jpeg, image/gif, image/webp}(strips params, case-insensitive).image::ImageReader::with_guessed_format()(zero new dependencies) and rejects anything that doesn't decode as one of the four supported formats.download_and_encode_imagesignature from-> Option<ContentBlock>to-> Result<ContentBlock, MediaFetchError>, capturing theContent-Typeheader before consuming the response with.bytes().src/slack.rs(call site)On validation failure, collect filenames and post one aggregated user-facing warning after the file loop:
Transient errors (
Network,HttpStatus) log atwarn!and skip silently.src/discord.rs(call site)Same
Resultpattern but log-only on failure (Discord URLs are signed-public; the Slack scope hint is not applicable). Preserves the existingis_video_filefallback forErr(NotAnImage).Tests
12 new unit tests in
src/media.rs::testsforvalidate_image_response, including the exact bug reproduction:All 319 tests pass.
Manual test plan (post-deploy)
files:read. Confirm viax-oauth-scopesfromauth.test.files:read, rotate token, redeploy. Upload an image in the same thread.Out of scope / follow-ups
auth.test+apps.permissions.infoat boot to warn on missingfiles:read(useful early-warning, separate concern).download_and_transcribe/download_and_read_text_file: analogous hardening for audio/text-file paths (lower-priority, separate PR).