Skip to content

fix(files): never dedup external URL fetches by path filename#4733

Merged
waleedlatif1 merged 2 commits into
stagingfrom
waleedlatif1/slack-file-cache-key-bug
May 23, 2026
Merged

fix(files): never dedup external URL fetches by path filename#4733
waleedlatif1 merged 2 commits into
stagingfrom
waleedlatif1/slack-file-cache-key-bug

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • External URL fetches in the file parse route were keyed on the path filename, so two distinct URLs ending in image.png (e.g. every Slack clipboard paste) collided in the workspace cache and returned stale bytes from a prior fetch
  • Extracted fetch + workspace save into fetchExternalUrlToWorkspace helper so the broken dedup pattern can't be reintroduced
  • Helper always downloads; uploadWorkspaceFile already disambiguates same-name collisions on save (image.png -> image (1).png -> ...)
  • Audited the rest of the codebase for the same anti-pattern — none found, isolated to this route

Type of Change

  • Bug fix

Testing

  • New helper unit tests covering URL-vs-filename independence, SSRF rejection, permission gating, save-error handling, and header forwarding
  • Updated route test that previously asserted the buggy cache-reuse behavior — now asserts both URLs fetch fresh
  • Full `apps/sim/lib/uploads` + `apps/sim/app/api/files` suite green (158 tests)
  • `bun run check:api-validation` passes

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

External URL fetches in the file parse route were keyed on the path
filename, so two distinct URLs whose paths ended in `image.png` (e.g.
every Slack clipboard paste) collided in the workspace cache and
returned stale bytes from a prior fetch.

Extracts the fetch + save flow into `fetchExternalUrlToWorkspace` so
the broken dedup pattern can't be reintroduced. The helper always
downloads; `uploadWorkspaceFile` handles name collisions on save by
suffixing (`image.png` -> `image (1).png` -> ...).
@vercel
Copy link
Copy Markdown

vercel Bot commented May 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 23, 2026 7:42pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 23, 2026

PR Summary

Medium Risk
Changes external-URL download and workspace-save behavior in the file parse API, including SSRF-validation error handling, so regressions could affect file ingestion and security-sensitive fetching paths.

Overview
Fixes a bug where external URL parsing could reuse stale bytes when different URLs shared the same path filename (e.g. image.png) by removing filename-based workspace dedup and always fetching each URL independently.

Extracts external URL validation/fetch (+ optional workspace save) into a new fetchExternalUrlToWorkspace helper, updates the parse route to use it (and to map SSRF/DNS failures via ExternalUrlValidationError), and updates/adds tests to assert two same-filename URLs both download and to cover permission-gated workspace saves, header forwarding, and save-error resilience.

Reviewed by Cursor Bugbot for commit ab766f5. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 23, 2026

Greptile Summary

This PR fixes a cache-collision bug where two external URLs sharing the same path tail (e.g. every Slack clipboard paste is download/image.png) would key off the filename and reuse a stale workspace file from a prior fetch. The fix extracts URL fetching into a new fetchExternalUrlToWorkspace helper that always downloads fresh and delegates collision-free naming to uploadWorkspaceFile.

  • fetch-external-url.ts — new helper that validates, fetches, reads to buffer, then optionally saves to workspace with a fresh permission check; SSRF/DNS validation is retained and surfaced as ExternalUrlValidationError.
  • route.tshandleExternalUrl is simplified to call the helper; execution-file URLs still skip the workspace save; the ExternalUrlValidationError catch path is added to the existing error handler.
  • Tests — the route test that previously asserted buggy cache-reuse behavior is replaced with one asserting two network fetches; the helper ships its own unit tests covering dedup, SSRF rejection, permission gating, save-error tolerance, and header forwarding.

Confidence Score: 5/5

Safe to merge — the change is a focused bug fix that removes a filename-based cache lookup and replaces it with an always-fresh download helper, without altering any parse or auth logic downstream.

The root-cause fix is correct: the collision between distinct URLs sharing the same path tail is eliminated by removing the filename-based dedup path entirely. SSRF protection, permission checks, and parse logic are all preserved. The new helper is well-isolated, all previous thread concerns were addressed in the same commit, and the test suite directly pins the behavior that was broken.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/uploads/contexts/workspace/fetch-external-url.ts New helper that isolates URL fetch + workspace save logic; SSRF validation, permission checks, and save-error swallowing are all correctly handled.
apps/sim/app/api/files/parse/route.ts handleExternalUrl simplified to delegate to the new helper; execution-file detection logic is preserved; ExternalUrlValidationError catch is correctly ordered before the generic fallback.
apps/sim/lib/uploads/contexts/workspace/fetch-external-url.test.ts Comprehensive unit tests covering dedup independence, SSRF rejection, 4xx errors, permission variants (null, read, write), save-error tolerance, header forwarding, and mimeType passthrough.
apps/sim/app/api/files/parse/route.test.ts Route test updated to assert two independent network fetches for same-filename URLs; stale mock helpers removed cleanly.
apps/sim/lib/uploads/contexts/workspace/index.ts Re-exports the new helper from the barrel file; no logic changes.

Reviews (2): Last reviewed commit: "fix(files): distinguish non-member from ..." | Re-trigger Greptile

Comment thread apps/sim/lib/uploads/contexts/workspace/fetch-external-url.ts
Comment thread apps/sim/lib/uploads/contexts/workspace/fetch-external-url.ts
…kip log

Splits the workspace-save skip branch so non-members (permission === null)
log 'user is not a workspace member' instead of the misleading 'lacks
write permission'. Adds a test covering the null case so the relaxed
behavior (vs. the prior route's hard 'File not found') is explicit.
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit ab766f5. Configure here.

@waleedlatif1 waleedlatif1 merged commit 6b1210b into staging May 23, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/slack-file-cache-key-bug branch May 23, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant