Skip to content

fix(tools): web_fetch saves binary content with correct extension and raw bytes#461

Merged
Aaronontheweb merged 1 commit into
devfrom
fix/web-fetch-binary-content
Mar 27, 2026
Merged

fix(tools): web_fetch saves binary content with correct extension and raw bytes#461
Aaronontheweb merged 1 commit into
devfrom
fix/web-fetch-binary-content

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Fixes #386web_fetch was reading all HTTP responses as UTF-8 strings and saving non-HTML content with .txt extension. Binary files (images, PDFs) were mangled, breaking the attach_file workflow for Slack delivery.

  • Add binary content detection via Content-Type header (image/*, audio/*, video/*, application/pdf, etc.) with raw byte[] save path
  • Use URL path extension first (.png, .sh, .json), fall back to Content-Type mapping only when URL has no extension
  • Filter out numeric-only "extensions" like .25414 (arxiv version IDs)
  • Extract shared BuildFilePath helper to deduplicate save methods
  • Have ReadTextWithLimitAsync delegate to ReadBytesWithLimitAsync

Test plan

  • Binary image round-trip: PNG bytes saved with .png extension, byte-perfect verification
  • PDF with no URL extension: falls back to .pdf from Content-Type
  • JSON/shell script: URL extension (.json, .sh) preserved instead of .txt
  • Unknown binary with no URL extension: saves as .bin
  • Theory tests for IsBinaryContentType, GetExtensionFromUrl, GetFallbackExtension
  • All 59 existing + new tests pass
  • No new slopwatch violations

…386)

web_fetch was reading all HTTP responses as UTF-8 strings and saving
non-HTML content with .txt extension. Binary files (images, PDFs) were
mangled, breaking the attach_file workflow for Slack delivery.

- Add binary content detection via Content-Type header (image/*, audio/*,
  video/*, application/pdf, etc.) with raw byte[] save path
- Use URL path extension first (.png, .sh, .json), fall back to
  Content-Type mapping only when URL has no extension
- Filter out numeric-only "extensions" like .25414 (arxiv version IDs)
- Extract shared BuildFilePath helper to deduplicate save methods
- Have ReadTextWithLimitAsync delegate to ReadBytesWithLimitAsync
@Aaronontheweb Aaronontheweb merged commit 3d969f0 into dev Mar 27, 2026
3 checks passed
@Aaronontheweb Aaronontheweb deleted the fix/web-fetch-binary-content branch March 27, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(tools): web_fetch saves binary content (images, PDFs) as .txt files

1 participant