Skip to content

improvement(vfs): update custom glob impl to use micromatch, fix vfs filename regex#3680

Merged
icecrasher321 merged 6 commits intostagingfrom
improvement/vfs-recog
Mar 19, 2026
Merged

improvement(vfs): update custom glob impl to use micromatch, fix vfs filename regex#3680
icecrasher321 merged 6 commits intostagingfrom
improvement/vfs-recog

Conversation

@icecrasher321
Copy link
Collaborator

Summary

  • Update custom glob implementation to use the micromatch lib
  • Regex fix to identify attachements with breaking names

Type of Change

  • Bug fix

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Mar 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 19, 2026 8:38pm

Request Review

@cursor
Copy link

cursor bot commented Mar 19, 2026

PR Summary

Medium Risk
Changes core VFS glob/grep behavior and file name resolution logic, which could subtly alter what paths match or how uploads/files are found. Also increases file read size limits, which may impact performance/memory when reading large files/images.

Overview
Switches VFS path matching to micromatch by replacing the in-house glob-to-regex implementation and improving grep scoping semantics (prefix vs glob scope, CRLF-safe line splitting, and multiline matching only in files_with_matches).

Adds shared normalizeVfsSegment and uses it across VFS sanitization and file lookup paths to better handle Unicode/whitespace edge cases (notably macOS screenshot names), including a new resolver for chat uploads and updated workspace file record matching.

Hardens open_resource for file resources to require workspace context and a canonical UUID (and uses the real file name as the resource title), updates mothership resource navigation to encodeURIComponent(fileId), expands inline file read limits (5MB text, 20MB images), and introduces storageContext so downloads can fetch from mothership vs workspace storage.

Written by Cursor Bugbot for commit 0fb04fb. Configure here.

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

@greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR replaces the hand-rolled globToRegExp implementation with the micromatch library and extracts a shared normalizeVfsSegment utility that collapses Unicode whitespace (including U+202F from macOS screenshot filenames) to a regular ASCII space. The normalization is then applied consistently to DB lookups in materialize-file.ts, upload-file-reader.ts, and workspace-file-manager.ts so that a filename passed by the LLM with unusual spacing still resolves to the correct record.

Key changes:

  • normalize-segment.ts — new shared helper (NFC + trim + strip control chars + /- + whitespace collapse); eliminates previously duplicated inline logic in workspace-vfs.ts and workspace-file-manager.ts
  • operations.ts — replaces custom globToRegExp with micromatch.isMatch; adds splitLinesForGrep for CRLF handling and pathWithinGrepScope to fix prefix-boundary false positives (e.g. workflows/ vs workflowsManual/)
  • materialize-file.ts / upload-file-reader.ts — switch from exact-name DB query to a normalised in-memory match over all chat uploads
  • workspace-file-manager.ts — adds optional storageContext field so mothership uploads are downloaded with the correct storage key

Issues found:

  • materialize-file.ts lines 74–77: the success output uses the raw caller-supplied fileName (potentially containing U+202F) for both the human-readable message and the path field, while the VFS will expose the file under files/${updated.originalName}. Any LLM follow-up using the returned path would fail to resolve.
  • upload-file-reader.ts / materialize-file.ts: every single-file read now issues a full-table scan over the chat's uploads rather than a targeted single-row query; a fast-path exact-match with a normalised fallback would recover the old performance characteristic.
  • operations.ts looksLikeGlob: includes { and extglob markers that are already disabled by nobrace: true / noext: true, creating dead detection branches.

Confidence Score: 3/5

  • Mostly safe but contains a logic bug where the materialized-file success response reports a path that may not match the actual VFS location.
  • The glob migration to micromatch is well-tested and the normalize-segment extraction is clean. The storageContext threading is correct. However, the materialize-file.ts success response uses the raw fileName for the path field instead of updated.originalName, which can produce a stale/incorrect VFS path when the normalisation changes the name. Additionally, the full-scan pattern introduced in both readChatUpload and findUploadRecord is a performance regression, though not a correctness issue.
  • apps/sim/lib/copilot/orchestrator/tool-executor/materialize-file.ts (logic bug on success path)

Important Files Changed

Filename Overview
apps/sim/lib/copilot/orchestrator/tool-executor/materialize-file.ts Switches from exact-name DB query to normalised in-memory match; contains a logic bug where the success response reports files/${fileName} (user-supplied, potentially with U+202F) instead of files/${updated.originalName} (actual stored DB name), which would give the LLM a non-existent VFS path.
apps/sim/lib/copilot/vfs/operations.ts Replaces hand-rolled globToRegExp with micromatch; adds splitLinesForGrep for CRLF handling and pathWithinGrepScope for correct directory-boundary filtering. Well-tested. Minor style issue: looksLikeGlob checks for { / extglob markers that are disabled by VFS_GLOB_OPTIONS.
apps/sim/lib/copilot/vfs/normalize-segment.ts New shared utility that extracts the existing inline normalisation logic (NFC, trim, strip control chars, /-, collapse Unicode whitespace). Clean extraction with no functional changes.
apps/sim/lib/copilot/vfs/operations.test.ts New test suite covering glob (single star, double star, virtual dir, nobrace) and grep (content, CRLF, count, files_with_matches, scope isolation, invalid regex, ignoreCase). Good coverage of the changed behaviour.
apps/sim/lib/copilot/orchestrator/tool-executor/upload-file-reader.ts Refactors readChatUpload to use listChatUploads + in-memory normalizeVfsSegment matching; trades one targeted DB query for a full-scan per read, which is a performance regression for high-upload chats.
apps/sim/lib/uploads/contexts/workspace/workspace-file-manager.ts Adds optional storageContext field to WorkspaceFileRecord, threads it through downloadWorkspaceFile (defaulting to 'workspace'), and consolidates findWorkspaceFileRecord matching to normalizeVfsSegment. Clean and correct.
apps/sim/lib/copilot/vfs/workspace-vfs.ts sanitizeName now delegates to normalizeVfsSegment — removes duplication and keeps DB lookups and VFS path keys in sync.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM / Copilot
    participant ME as materialize-file.ts
    participant UFR as upload-file-reader.ts
    participant NS as normalizeVfsSegment
    participant DB as Database (workspaceFiles)
    participant VFS as Workspace VFS

    LLM->>ME: executeMaterializeFile({ fileName })
    ME->>DB: SELECT * WHERE chatId & context='mothership'
    DB-->>ME: rows[]
    ME->>NS: normalizeVfsSegment(fileName)
    loop each row
        ME->>NS: normalizeVfsSegment(row.originalName)
    end
    ME-->>ME: row = first match
    ME->>DB: UPDATE SET context='workspace' WHERE id=row.id
    DB-->>ME: updated { id, originalName }
    ME-->>LLM: { path: "files/${updated.originalName}" }

    LLM->>UFR: readChatUpload(filename, chatId)
    UFR->>DB: SELECT * WHERE chatId & context='mothership'
    DB-->>UFR: all uploads[]
    UFR->>NS: normalizeVfsSegment(filename)
    loop each upload
        UFR->>NS: normalizeVfsSegment(upload.name)
    end
    UFR-->>LLM: FileReadResult

    LLM->>VFS: glob("files/*")
    VFS->>VFS: micromatch.isMatch(path, pattern, VFS_GLOB_OPTIONS)
    VFS-->>LLM: matching paths[]
Loading

Comments Outside Diff (1)

  1. apps/sim/lib/copilot/orchestrator/tool-executor/materialize-file.ts, line 74-77 (link)

    P1 Stale fileName used in success path response

    After the findUploadRecord / normalizeVfsSegment fuzzy match is introduced, the file found in the DB may have a different originalName than the caller-supplied fileName (e.g. fileName may carry U+202F while row.originalName stores a regular ASCII space). The success output currently uses the raw fileName for both the human-readable message and the path field:

    message: `…at files/${fileName}…`,
    path: `files/${fileName}`,

    But the VFS will expose the file under files/${updated.originalName}, so a follow-up operation using the returned path would resolve to a path that doesn't exist in the VFS.

    Use the actual stored name instead:

Last reviewed commit: "add tests"

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@icecrasher321 icecrasher321 merged commit 30b7192 into staging Mar 19, 2026
12 checks passed
@icecrasher321 icecrasher321 deleted the improvement/vfs-recog branch March 19, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant