feat(converter): render EMF+ images via embedded bitmaps (SD-2503) by gpardhivvarma · Pull Request #3214 · superdoc-dev/superdoc

gpardhivvarma · 2026-05-09T07:45:28Z

Summary

Renders EMF+ images that embed a compressed bitmap (PNG/JPEG/GIF) instead of falling back to the placeholder SVG. Most real-world EMF+ files generated by Office — cover slides, charts, illustrations — wrap a complete PNG or JPEG inside an EmfPlusObject(Image) record. Walking the EMF+ stream and pulling that bitmap out gives pixel-perfect rendering without implementing a GDI+ rasterizer.

Closes #3172.

What changed

packages/super-editor/src/editors/v1/core/super-converter/v3/handlers/wp/helpers/metafile-converter.js

New extractBitmapFromEmfPlus(buffer): walks EMR_COMMENT records carrying EMF+ payloads, scans inner EMF+ records for EmfPlusObject(Image) entries, reassembles continuation series via the TotalObjectSize prefix on the first chunk per MS-EMFPLUS § 2.3.5.1, and parses the resulting EmfPlusImage / EmfPlusBitmap to extract the encoded image bytes.
New parseEmfPlusImageObject(bytes): validates Image.Type=Bitmap and Bitmap.Type=Compressed, then returns the embedded PNG/JPEG/GIF as a data URI.
New detectCompressedImageFormat(bytes) helper using PNG/JPEG/GIF magic bytes.
Wired the extractor into convertEmfToSvg between the existing classic EMR_STRETCHDIBITS path and the EMF+ placeholder, so the placeholder remains the final fallback for pure-vector EMF+.
Pulled the literal 70 for EMR_COMMENT into a named constant shared with the existing isEmfPlus detector.
Updated the module-level docstring to reflect the layered strategy.

Spec correctness

Per MS-EMFPLUS § 2.3.5.1:

First chunk: ContinueBit=1, ObjectData = TotalObjectSize | first slice.
Middle chunks: ContinueBit=1, raw appended bytes.
Final chunk: ContinueBit=0 — the parser keys off this to flush.
Defensive fallback: if an off-spec encoder leaves ContinueBit=1 on the last record, the parser flushes early once TotalObjectSize bytes are accumulated.

What this does NOT cover

Pure-vector EMF+ (logos drawn entirely with GDI+ paths) — those still hit the placeholder. Implementing a full GDI+ renderer is out of scope.
Pixel-format (uncompressed) EmfPlusBitmap — also still hits the placeholder; rasterizing raw pixel buffers requires the same infrastructure.

Acceptance criteria

EMF+ images with embedded compressed bitmaps render actual document content instead of the placeholder SVG.
Existing classic EMF/WMF rendering behavior is preserved (293 tests in the helpers directory still pass).
DOCX round-trip export continues to preserve the original metafile asset (the import path stores originalSrc / originalExtension when a metafile is converted; not changed).
Targeted test coverage: 6 new tests using synthetic in-memory EMF+ buffers cover PNG/JPEG extraction, spec-compliant continuation reassembly, off-spec lenient continuation flush, fallback for non-Image objects, and rejection of pixel-format bitmaps.

Test plan

pnpm exec vitest run src/editors/v1/core/super-converter/v3/handlers/wp/helpers/metafile-converter.test.js — 14/14 pass.
pnpm exec vitest run src/editors/v1/core/super-converter/v3/handlers/wp/helpers/ — 293/293 pass (no regressions in adjacent helpers).
pnpm exec prettier --check on both modified files — clean.
Open the reproducer (m3 proposal.docx) in the editor and verify the cover image renders.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7900e7f5eb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

codecov-commenter · 2026-05-09T07:54:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

gpardhivvarma · 2026-05-09T08:10:03Z

@caio-pizzol please review this

caio-pizzol

hey @gpardhivvarma! PNG, JPEG, and GIF images all render correctly with this PR. but the file linked from #3172 stores its image as raw pixels, which this PR skips, so opening it still shows the placeholder.

needs work. one thing to flag plus two small nits, all inline.

Address review feedback on superdoc-dev#3214: 1. Raw-pixel EmfPlusBitmap support — the m3 proposal.docx reproducer from superdoc-dev#3172 stores its image as raw pixels (BitmapDataType=Pixel), which the prior extractor rejected. Now decode 24bppRGB / 32bppRGB / 32bppARGB / 32bppPARGB pixel data, draw onto a canvas, and export as PNG (mirroring the tiff-converter pattern). Indexed formats and missing-canvas environments still fall back to the placeholder. EMF+ pixel formats store channels in DWORD-little-endian order (B,G,R[,A]); the converter swaps to canvas-native R,G,B,A. PARGB un-premultiplies alpha so straight-alpha consumers render correctly. Negative height = top-down rows; positive height = bottom-up (classic Windows DIB), reversed before write. A MAX_PIXEL_BITMAP_PIXELS guard bounds canvas allocation at 100M pixels (~400 MB RGBA), matching tiff-converter. 2. Slice reassembled chunks to TotalObjectSize so an off-spec writer that overshoots its declared size doesn't tack trailing bytes onto the data URI. 3. Tighten EMR_COMMENT recordSize check to >= 20 to match isEmfPlus's existing minimum. Tests: 6 new pixel-bitmap tests using a vi.spyOn canvas mock cover the core 32bppARGB path, bottom-up row flipping, 24bppRGB byte order, 32bppPARGB un-premultiplication, the no-canvas fallback, and the indexed-format fallback. 20/20 in this file, 300/300 across the helpers directory.

gpardhivvarma · 2026-05-12T12:03:52Z

@caio-pizzol addressed the changes please take a look at it

EMF+ payloads use GDI+ drawing records that rtf.js doesn't implement, so prior to this change every EMF+ image rendered as the "Unable to render EMF+ image" placeholder. Most real-world EMF+ files generated by Office (cover slides, charts, illustrations) embed a complete PNG/JPEG inside an EmfPlusObject(Image) record with BitmapDataType=Compressed. Walk the EMR_COMMENT records in the EMF stream, parse the inner EMF+ records, reassemble continuation series, and return the embedded image directly. Per MS-EMFPLUS § 2.3.5.1 the EmfPlusObject header layout depends on the ContinueBit: ContinueBit=1: Type(2) Flags(2) Size(4) TotalObjectSize(4) DataSize(4) ObjectData ContinueBit=0: Type(2) Flags(2) Size(4) DataSize(4) ObjectData TotalObjectSize is present on every continued record (not only the first). The strict spec terminates a continued series with a ContinueBit=0 record; the parser also flushes early once TotalObjectSize bytes have been accumulated as a defense against off-spec encoders that leave ContinueBit=1 on the final record. Pure-vector and pixel-format EMF+ images still fall back to the placeholder — a full GDI+ rasterizer is out of scope here. Tests use synthetic in-memory EMF+ buffers and cover PNG/JPEG extraction, spec-compliant 2-record and 3-record continuation reassembly, off-spec early flush, the non-Image fallback path, and rejection of pixel-format bitmaps. Closes superdoc-dev#3172

Address review feedback on superdoc-dev#3214: 1. Raw-pixel EmfPlusBitmap support — the m3 proposal.docx reproducer from superdoc-dev#3172 stores its image as raw pixels (BitmapDataType=Pixel), which the prior extractor rejected. Now decode 24bppRGB / 32bppRGB / 32bppARGB / 32bppPARGB pixel data, draw onto a canvas, and export as PNG (mirroring the tiff-converter pattern). Indexed formats and missing-canvas environments still fall back to the placeholder. EMF+ pixel formats store channels in DWORD-little-endian order (B,G,R[,A]); the converter swaps to canvas-native R,G,B,A. PARGB un-premultiplies alpha so straight-alpha consumers render correctly. Negative height = top-down rows; positive height = bottom-up (classic Windows DIB), reversed before write. A MAX_PIXEL_BITMAP_PIXELS guard bounds canvas allocation at 100M pixels (~400 MB RGBA), matching tiff-converter. 2. Slice reassembled chunks to TotalObjectSize so an off-spec writer that overshoots its declared size doesn't tack trailing bytes onto the data URI. 3. Tighten EMR_COMMENT recordSize check to >= 20 to match isEmfPlus's existing minimum. Tests: 6 new pixel-bitmap tests using a vi.spyOn canvas mock cover the core 32bppARGB path, bottom-up row flipping, 24bppRGB byte order, 32bppPARGB un-premultiplication, the no-canvas fallback, and the indexed-format fallback. 20/20 in this file, 300/300 across the helpers directory.

…t sign MS-EMFPLUS § 2.2.2.2 is silent on what Height/Stride sign means for storage direction. The earlier reading "positive Height = bottom-up" borrowed from the classic Windows DIB convention, but every GDI+ producer (which means every Office-generated EMF+) lays pixel memory out top-down regardless of Height sign. Rendering the SD-2503 reproducer with the bottom-up assumption produced an upside-down cover image. Drop the row-reversal entirely; storage row 0 is the visual top in all cases. Update the corresponding test and JSDoc to reflect the empirical convention.

caio-pizzol

hey @gpardhivvarma! thanks for addressing last round.

i pushed a follow-up commit (a6e13bf) fixing the row-order bug codex flagged - emf+ stores raw pixel rows top-down regardless of the height sign, not bottom-up like windows dib, so the flip was inverting customer documents. updated the unit test and added 4 spec fixtures to our visual corpus.

lgtm.

superdoc-bot · 2026-05-12T20:18:48Z

🎉 This PR is included in @superdoc-dev/mcp v0.3.0-next.80

The release is available on GitHub release

superdoc-bot · 2026-05-12T20:18:55Z

🎉 This PR is included in @superdoc-dev/react v1.2.0-next.123

The release is available on GitHub release

superdoc-bot · 2026-05-12T20:19:07Z

🎉 This PR is included in vscode-ext v2.3.0-next.125

superdoc-bot · 2026-05-12T20:20:40Z

🎉 This PR is included in superdoc-cli v0.8.0-next.97

The release is available on GitHub release

superdoc-bot · 2026-05-12T20:22:48Z

🎉 This PR is included in superdoc-sdk v1.8.0-next.79

superdoc-bot · 2026-05-12T20:23:49Z

🎉 This PR is included in superdoc v1.30.0-next.79

The release is available on GitHub release

gpardhivvarma requested a review from a team as a code owner May 9, 2026 07:45

superdoc-bot Bot added review: thorough community labels May 9, 2026

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

Comment thread ...uper-editor/src/editors/v1/core/super-converter/v3/handlers/wp/helpers/metafile-converter.js Outdated

gpardhivvarma force-pushed the feat/emf-plus-embedded-bitmap-sd-2503 branch 2 times, most recently from d29aa7b to 6b26f2a Compare May 11, 2026 10:21

caio-pizzol reviewed May 11, 2026

View reviewed changes

gpardhivvarma added 2 commits May 12, 2026 17:35

gpardhivvarma force-pushed the feat/emf-plus-embedded-bitmap-sd-2503 branch from 34006b0 to 0ca6497 Compare May 12, 2026 12:05

gpardhivvarma requested a review from caio-pizzol May 12, 2026 12:06

caio-pizzol approved these changes May 12, 2026

View reviewed changes

caio-pizzol enabled auto-merge (squash) May 12, 2026 20:13

caio-pizzol disabled auto-merge May 12, 2026 20:16

caio-pizzol merged commit 71bf8d7 into superdoc-dev:main May 12, 2026
63 checks passed

caio-pizzol self-assigned this May 12, 2026

superdoc-bot Bot added the released on @next label May 12, 2026

Conversation

gpardhivvarma commented May 9, 2026

Summary

What changed

Spec correctness

What this does NOT cover

Acceptance criteria

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

codecov-commenter commented May 9, 2026

Codecov Report

Uh oh!

gpardhivvarma commented May 9, 2026

Uh oh!

caio-pizzol left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gpardhivvarma commented May 12, 2026

Uh oh!

caio-pizzol left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

superdoc-bot Bot commented May 12, 2026

Uh oh!

superdoc-bot Bot commented May 12, 2026

Uh oh!

superdoc-bot Bot commented May 12, 2026

Uh oh!

superdoc-bot Bot commented May 12, 2026

Uh oh!

superdoc-bot Bot commented May 12, 2026

Uh oh!

superdoc-bot Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants