Skip to content

fix(files): streaming preview invariant + OOXML style extraction#4335

Merged
waleedlatif1 merged 9 commits intostagingfrom
fix/docxs
Apr 29, 2026
Merged

fix(files): streaming preview invariant + OOXML style extraction#4335
waleedlatif1 merged 9 commits intostagingfrom
fix/docxs

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented Apr 29, 2026

Summary

Two bugs fixed and quality improvements across the binary document pipeline (.docx, .pptx, .pdf).

1. Streaming preview no longer flashes transient JS errors

Root cause: While Mothership streams partial JavaScript that compiles into a binary document, the preview pane would briefly show "Failed to preview document — Unexpected token '}'" mid-stream. The model hasn't finished writing balanced JS yet; the user just wants the preview to hold its skeleton or last good render.

Fix: Replaced a fragile string-allowlist suppression approach with a structural invariant:

  • streamingContent !== undefined → the file is mid-write → any render error is transient → never surface it
  • Streaming useEffect catch blocks in docx-preview, pptx-preview, and pdf-viewer (IframePreview) now only log at info level, never call setRenderError
  • error display is gated: streamingContent !== undefined ? null : resolvePreviewError(fetchError, renderError)
  • shouldSuppressStreamingDocumentError allowlist deleted entirely

2. Style extraction for existing binary documents

New: GET /api/workspaces/[id]/files/[fileId]/style — extracts a compact JSON style summary (theme colors, font pair, named styles) from uploaded .docx or .pptx files by parsing their OOXML theme XML via JSZip.

Also exposed as the VFS path files/by-id/{id}/style so the file agent can call read("files/by-id/{id}/style") before editing an uploaded document to match its visual identity.

3. Compile check for generated binary documents

New: GET /api/workspaces/[id]/files/[fileId]/compiled-check — reads the saved JavaScript source of a .docx/.pptx/.pdf file, compiles it through the shared isolated-VM sandbox, and returns { ok: true } or { ok: false, error, errorName }.

Also exposed as the VFS path files/by-id/{id}/compiled-check so the file agent can self-verify after each edit_content call before reporting the task as done. Catches user code errors (syntax errors, undefined references) without surfacing them to the user.

4. Component consistency fixes

  • Wrapped PptxPreview in React.memo (matching DocxPreview and IframePreview)
  • Fixed invalid --text-body CSS variable to --text-primary in PreviewError, MermaidDiagram, and UnsupportedPreview
  • Added key={file.id} to PptxPreview mount for correct remounting on file switch

Files Changed

  • apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/docx-preview.tsx — structural streaming gate
  • apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/pptx-preview.tsx — structural streaming gate + memo wrap
  • apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/file-viewer.tsx — structural gate for PDF, token fix, key prop
  • apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/preview-panel.tsx — token fix
  • apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/preview-shared.tsx — deleted dead shouldSuppressStreamingDocumentError
  • apps/sim/app/api/workspaces/[id]/files/[fileId]/style/route.ts — new style extraction endpoint
  • apps/sim/app/api/workspaces/[id]/files/[fileId]/compiled-check/route.ts — new compile-check endpoint
  • apps/sim/lib/copilot/vfs/workspace-vfs.ts — VFS handlers for both new paths
  • apps/sim/lib/copilot/vfs/document-style.ts — shared OOXML style extraction utility

Testing

Tested manually:

  • Stream a .docx/.pptx — preview shows skeleton during stream, never flashes an error
  • Once stream completes — final render appears
  • Open a corrupted file (not Mothership-generated) — static error path still surfaces via PreviewError
  • GET .../style on an uploaded .docx returns theme colors and fonts
  • GET .../compiled-check on a valid JS-source .docx returns { ok: true }
  • GET .../compiled-check on a broken JS-source file returns { ok: false, error: "..." }

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 29, 2026 3:48pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 29, 2026

PR Summary

Medium Risk
Introduces new authenticated file-processing endpoints and VFS dynamic reads that download files and invoke sandbox compilation/ZIP parsing, so correctness and resource limits (size, time, sandbox isolation) matter despite being scoped to workspace members.

Overview
Prevents transient streaming preview errors from flashing in the UI for .docx, .pptx, and .pdf by treating any failure during streamingContent renders as suppressed (no PreviewError state), keeping skeleton/last-good output instead and removing the old string-based suppression helper.

Adds two new workspace file APIs and matching VFS dynamic reads: GET .../style extracts OOXML theme/font (and selected DOCX named styles) from uploaded/compiled .docx/.pptx binaries via new extractDocumentStyle, and GET .../compiled-check compiles saved JS-source .docx/.pptx/.pdf files in the sandbox and returns { ok, error?, errorName? } with size limits enforced.

Also includes small viewer consistency tweaks (memoizing PptxPreview, ensuring remount on file switch via key, and fixing --text-body to --text-primary in several preview error/empty states).

Reviewed by Cursor Bugbot for commit 11c6d7d. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR fixes streaming preview flicker by replacing a string-allowlist suppression approach with a structural invariant (streamingContent !== undefined → suppress all render errors), adds two new endpoints and VFS paths for OOXML style extraction and JS-source compile checking, and applies consistency fixes across preview components.

  • The style/route.ts endpoint loads the full file buffer with no size guard, unlike the sibling compiled-check endpoint which enforces a 1 MB limit — a large OOXML file could exhaust memory or time out the handler.
  • The VFS compiled-check handler omits the ownerKey option passed to runSandboxTask, while the REST endpoint includes it. Sandbox calls initiated by the file agent therefore run outside any per-user quota context.

Confidence Score: 4/5

Safe to merge; the two P2 gaps (missing size guard in style route, missing ownerKey in VFS sandbox call) are low-severity and do not affect correctness on the happy path.

Both findings are P2 — no incorrect data, no security boundary violation, no broken user-facing behavior. The streaming preview fix is structurally sound and the new API surface is well-guarded with auth and membership checks.

apps/sim/app/api/workspaces/[id]/files/[fileId]/style/route.ts (missing size guard) and apps/sim/lib/copilot/vfs/workspace-vfs.ts (missing ownerKey in sandbox call)

Important Files Changed

Filename Overview
apps/sim/app/api/workspaces/[id]/files/[fileId]/compiled-check/route.ts New endpoint: auth, membership check, extension guard, 1 MB size limit, and correct SandboxUserCodeError vs infra-error distinction all look correct.
apps/sim/app/api/workspaces/[id]/files/[fileId]/style/route.ts New endpoint: auth and membership checks are solid, but no size guard before downloadWorkspaceFile, unlike the sibling compiled-check endpoint.
apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/file-viewer.tsx IframePreview removes renderError state (only set from streaming catches), adds key prop to PptxPreview, and fixes CSS token from --text-body to --text-primary.
apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/pptx-preview.tsx Wrapped in React.memo, streaming gate applied, skeleton shown when streamingContent is active with no slides yet — consistent with DocxPreview pattern.
apps/sim/lib/copilot/vfs/document-style.ts New OOXML style extraction utility; ZIP magic-byte guard, JSZip dynamic import, regex-based XML parsing for theme colors/fonts/named styles — logic looks correct for typical OOXML structure.
apps/sim/lib/copilot/vfs/workspace-vfs.ts New VFS handlers for style and compiled-check paths; compiled-check VFS call omits ownerKey that the REST endpoint passes, creating an inconsistency in sandbox execution context.
apps/sim/lib/execution/constants.ts New BINARY_DOC_TASKS map centralises the extension-to-task-ID mapping; clean and well-typed.

Sequence Diagram

sequenceDiagram
    participant Agent as File Agent (VFS)
    participant REST as REST Client
    participant VFS as WorkspaceVFS
    participant StyleRoute as GET /style
    participant CheckRoute as GET /compiled-check
    participant JSZip as JSZip (OOXML)
    participant Sandbox as Isolated-VM Sandbox

    Note over Agent,Sandbox: Style extraction flow
    Agent->>VFS: read files/by-id/id/style
    VFS->>JSZip: extractDocumentStyle(buffer, ext)
    JSZip-->>VFS: DocumentStyleSummary
    VFS-->>Agent: JSON string

    REST->>StyleRoute: GET /api/workspaces/id/files/fileId/style
    StyleRoute->>JSZip: extractDocumentStyle(buffer, ext)
    JSZip-->>StyleRoute: DocumentStyleSummary
    StyleRoute-->>REST: 200 JSON

    Note over Agent,Sandbox: Compiled-check flow
    Agent->>VFS: read files/by-id/id/compiled-check
    VFS->>Sandbox: runSandboxTask(taskId, payload)
    alt SandboxUserCodeError
        Sandbox-->>VFS: ok false
    else ok
        Sandbox-->>VFS: ok true
    end
    VFS-->>Agent: JSON string

    REST->>CheckRoute: GET /api/workspaces/id/files/fileId/compiled-check
    CheckRoute->>Sandbox: runSandboxTask(taskId, payload, ownerKey)
    alt SandboxUserCodeError
        Sandbox-->>CheckRoute: ok false
    else ok
        Sandbox-->>CheckRoute: ok true
    end
    CheckRoute-->>REST: 200 JSON
Loading

Reviews (4): Last reviewed commit: "fix(files): deduplicate BINARY_DOC_TASKS..." | Re-trigger Greptile

New GET /api/workspaces/[id]/files/[fileId]/style endpoint + VFS read
path files/by-id/{id}/style that returns a compact JSON style summary
from an uploaded binary .docx or .pptx: theme name, 12-slot color
palette, major/minor font pair, and key named styles (Normal, H1-H3,
Title). Logic lives in a shared lib/copilot/vfs/document-style.ts so
both the REST API and the VFS read handler reuse the same parsing code.
… guard

Explicit 'docx' | 'pptx' type annotation after the extension guard in
both route.ts and workspace-vfs.ts so TypeScript sees the narrowed type
rather than string. Only set summary.styles when the parsed array is
non-empty so the JSON response doesn't include "styles": []. Remove
redundant inline WHAT-comments from parseColorSlot.
…ency

- Apply structural invariant to PDF streaming path: never surface errors
  while streamingContent is defined; only log at info level
- Remove redundant setRenderError(null) from DOCX streaming effect — the
  gate at the display layer already suppresses errors during streaming
- Wrap PptxPreview in memo for consistency with DocxPreview
- Add key={file.id} to PptxPreview mount site (was missing, DocxPreview
  had it) so the component resets when the viewed file changes
- Fix --text-body → --text-primary across PreviewError, UnsupportedPreview,
  and MermaidDiagram error label; --text-body is not a valid EMCN token
@waleedlatif1 waleedlatif1 changed the title fix(files): suppress transient streaming preview errors for docx and pptx fix(files): streaming preview invariant + OOXML style extraction Apr 29, 2026
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

Thanks for the review @greptile-apps.

P2 — PPTX partial slide state: Acknowledged. The PPTX streaming path naturally preserves the last partial render — setSlides is called progressively via callbacks as each slide renders, so state is incrementally built up rather than cleared and rebuilt. Unlike DOCX which calls container.innerHTML = '' before re-rendering (requiring an explicit previousHtml restore on failure), PPTX never destructively clears slides. On error, whatever slides were rendered up to the failure point remain visible, which is the best possible outcome for a transient mid-stream compile error.

P2 — Indefinite skeleton during API unavailability: Intentional. If the streaming preview API is persistently unavailable, the skeleton holds until streamingContent becomes undefined (stream end), at which point the static reader takes over and surfaces a real error if the compiled file is also unavailable. Surfacing a mid-stream error while the model is still writing would be confusing and incorrect — the user just asked for a document, the model is generating it, and the preview is a bonus.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Both explanations are clear and the reasoning holds.

PPTX partial slide state: Agreed — the incremental onSlide callback pattern is materially different from DOCX's destructive innerHTML = '' approach. Since slides are never cleared before a streaming re-render, there's no regression to restore from. The analogy to DOCX was the wrong frame. No snapshot needed.

Indefinite skeleton: The UX rationale makes sense — surfacing an error while the model is actively generating would be a false negative. One small ask: the showSkeleton logic line is non-obvious to future readers who might wonder why there's no timeout. A short inline comment like:

// Skeleton held for entire stream duration — mid-stream errors are transient by definition;
// static reader surfaces real errors after streamingContent becomes undefined.
const showSkeleton = !hasRenderedPreview && (streamingContent !== undefined || isLoading || rendering)

...would make the intent self-documenting without changing behavior. Up to you whether that's worth the diff noise.

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/copilot/vfs/workspace-vfs.ts
Comment thread apps/sim/lib/copilot/vfs/workspace-vfs.ts Outdated
Comment thread apps/sim/lib/copilot/vfs/workspace-vfs.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 11c6d7d. Configure here.

@waleedlatif1 waleedlatif1 merged commit 94f5411 into staging Apr 29, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/docxs branch April 29, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant