Skip to content

feat(converter): render EMF+ images via embedded bitmaps (SD-2503)#3214

Merged
caio-pizzol merged 3 commits into
superdoc-dev:mainfrom
gpardhivvarma:feat/emf-plus-embedded-bitmap-sd-2503
May 12, 2026
Merged

feat(converter): render EMF+ images via embedded bitmaps (SD-2503)#3214
caio-pizzol merged 3 commits into
superdoc-dev:mainfrom
gpardhivvarma:feat/emf-plus-embedded-bitmap-sd-2503

Conversation

@gpardhivvarma
Copy link
Copy Markdown
Contributor

Summary

Renders EMF+ images that embed a compressed bitmap (PNG/JPEG/GIF) instead of falling back to the placeholder SVG. Most real-world EMF+ files generated by Office — cover slides, charts, illustrations — wrap a complete PNG or JPEG inside an EmfPlusObject(Image) record. Walking the EMF+ stream and pulling that bitmap out gives pixel-perfect rendering without implementing a GDI+ rasterizer.

Closes #3172.

What changed

packages/super-editor/src/editors/v1/core/super-converter/v3/handlers/wp/helpers/metafile-converter.js

  • New extractBitmapFromEmfPlus(buffer): walks EMR_COMMENT records carrying EMF+ payloads, scans inner EMF+ records for EmfPlusObject(Image) entries, reassembles continuation series via the TotalObjectSize prefix on the first chunk per MS-EMFPLUS § 2.3.5.1, and parses the resulting EmfPlusImage / EmfPlusBitmap to extract the encoded image bytes.
  • New parseEmfPlusImageObject(bytes): validates Image.Type=Bitmap and Bitmap.Type=Compressed, then returns the embedded PNG/JPEG/GIF as a data URI.
  • New detectCompressedImageFormat(bytes) helper using PNG/JPEG/GIF magic bytes.
  • Wired the extractor into convertEmfToSvg between the existing classic EMR_STRETCHDIBITS path and the EMF+ placeholder, so the placeholder remains the final fallback for pure-vector EMF+.
  • Pulled the literal 70 for EMR_COMMENT into a named constant shared with the existing isEmfPlus detector.
  • Updated the module-level docstring to reflect the layered strategy.

Spec correctness

Per MS-EMFPLUS § 2.3.5.1:

  • First chunk: ContinueBit=1, ObjectData = TotalObjectSize | first slice.
  • Middle chunks: ContinueBit=1, raw appended bytes.
  • Final chunk: ContinueBit=0 — the parser keys off this to flush.
  • Defensive fallback: if an off-spec encoder leaves ContinueBit=1 on the last record, the parser flushes early once TotalObjectSize bytes are accumulated.

What this does NOT cover

  • Pure-vector EMF+ (logos drawn entirely with GDI+ paths) — those still hit the placeholder. Implementing a full GDI+ renderer is out of scope.
  • Pixel-format (uncompressed) EmfPlusBitmap — also still hits the placeholder; rasterizing raw pixel buffers requires the same infrastructure.

Acceptance criteria

  • EMF+ images with embedded compressed bitmaps render actual document content instead of the placeholder SVG.
  • Existing classic EMF/WMF rendering behavior is preserved (293 tests in the helpers directory still pass).
  • DOCX round-trip export continues to preserve the original metafile asset (the import path stores originalSrc / originalExtension when a metafile is converted; not changed).
  • Targeted test coverage: 6 new tests using synthetic in-memory EMF+ buffers cover PNG/JPEG extraction, spec-compliant continuation reassembly, off-spec lenient continuation flush, fallback for non-Image objects, and rejection of pixel-format bitmaps.

Test plan

  • pnpm exec vitest run src/editors/v1/core/super-converter/v3/handlers/wp/helpers/metafile-converter.test.js — 14/14 pass.
  • pnpm exec vitest run src/editors/v1/core/super-converter/v3/handlers/wp/helpers/ — 293/293 pass (no regressions in adjacent helpers).
  • pnpm exec prettier --check on both modified files — clean.
  • Open the reproducer (m3 proposal.docx) in the editor and verify the cover image renders.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7900e7f5eb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@gpardhivvarma
Copy link
Copy Markdown
Contributor Author

@caio-pizzol please review this

@gpardhivvarma gpardhivvarma force-pushed the feat/emf-plus-embedded-bitmap-sd-2503 branch 2 times, most recently from d29aa7b to 6b26f2a Compare May 11, 2026 10:21
Copy link
Copy Markdown
Contributor

@caio-pizzol caio-pizzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @gpardhivvarma! PNG, JPEG, and GIF images all render correctly with this PR. but the file linked from #3172 stores its image as raw pixels, which this PR skips, so opening it still shows the placeholder.

needs work. one thing to flag plus two small nits, all inline.

gpardhivvarma added a commit to gpardhivvarma/superdoc that referenced this pull request May 11, 2026
Address review feedback on superdoc-dev#3214:

1. Raw-pixel EmfPlusBitmap support — the m3 proposal.docx reproducer
   from superdoc-dev#3172 stores its image as raw pixels (BitmapDataType=Pixel),
   which the prior extractor rejected. Now decode 24bppRGB / 32bppRGB /
   32bppARGB / 32bppPARGB pixel data, draw onto a canvas, and export as
   PNG (mirroring the tiff-converter pattern). Indexed formats and
   missing-canvas environments still fall back to the placeholder.

   EMF+ pixel formats store channels in DWORD-little-endian order
   (B,G,R[,A]); the converter swaps to canvas-native R,G,B,A. PARGB
   un-premultiplies alpha so straight-alpha consumers render correctly.
   Negative height = top-down rows; positive height = bottom-up
   (classic Windows DIB), reversed before write.

   A MAX_PIXEL_BITMAP_PIXELS guard bounds canvas allocation at 100M
   pixels (~400 MB RGBA), matching tiff-converter.

2. Slice reassembled chunks to TotalObjectSize so an off-spec writer
   that overshoots its declared size doesn't tack trailing bytes onto
   the data URI.

3. Tighten EMR_COMMENT recordSize check to >= 20 to match isEmfPlus's
   existing minimum.

Tests: 6 new pixel-bitmap tests using a vi.spyOn canvas mock cover the
core 32bppARGB path, bottom-up row flipping, 24bppRGB byte order,
32bppPARGB un-premultiplication, the no-canvas fallback, and the
indexed-format fallback. 20/20 in this file, 300/300 across the
helpers directory.
@gpardhivvarma
Copy link
Copy Markdown
Contributor Author

@caio-pizzol addressed the changes please take a look at it

EMF+ payloads use GDI+ drawing records that rtf.js doesn't implement, so
prior to this change every EMF+ image rendered as the "Unable to render
EMF+ image" placeholder.

Most real-world EMF+ files generated by Office (cover slides, charts,
illustrations) embed a complete PNG/JPEG inside an EmfPlusObject(Image)
record with BitmapDataType=Compressed. Walk the EMR_COMMENT records in
the EMF stream, parse the inner EMF+ records, reassemble continuation
series, and return the embedded image directly.

Per MS-EMFPLUS § 2.3.5.1 the EmfPlusObject header layout depends on the
ContinueBit:
  ContinueBit=1: Type(2) Flags(2) Size(4) TotalObjectSize(4) DataSize(4) ObjectData
  ContinueBit=0: Type(2) Flags(2) Size(4)                    DataSize(4) ObjectData

TotalObjectSize is present on every continued record (not only the
first). The strict spec terminates a continued series with a
ContinueBit=0 record; the parser also flushes early once TotalObjectSize
bytes have been accumulated as a defense against off-spec encoders that
leave ContinueBit=1 on the final record.

Pure-vector and pixel-format EMF+ images still fall back to the
placeholder — a full GDI+ rasterizer is out of scope here.

Tests use synthetic in-memory EMF+ buffers and cover PNG/JPEG
extraction, spec-compliant 2-record and 3-record continuation
reassembly, off-spec early flush, the non-Image fallback path, and
rejection of pixel-format bitmaps.

Closes superdoc-dev#3172
Address review feedback on superdoc-dev#3214:

1. Raw-pixel EmfPlusBitmap support — the m3 proposal.docx reproducer
   from superdoc-dev#3172 stores its image as raw pixels (BitmapDataType=Pixel),
   which the prior extractor rejected. Now decode 24bppRGB / 32bppRGB /
   32bppARGB / 32bppPARGB pixel data, draw onto a canvas, and export as
   PNG (mirroring the tiff-converter pattern). Indexed formats and
   missing-canvas environments still fall back to the placeholder.

   EMF+ pixel formats store channels in DWORD-little-endian order
   (B,G,R[,A]); the converter swaps to canvas-native R,G,B,A. PARGB
   un-premultiplies alpha so straight-alpha consumers render correctly.
   Negative height = top-down rows; positive height = bottom-up
   (classic Windows DIB), reversed before write.

   A MAX_PIXEL_BITMAP_PIXELS guard bounds canvas allocation at 100M
   pixels (~400 MB RGBA), matching tiff-converter.

2. Slice reassembled chunks to TotalObjectSize so an off-spec writer
   that overshoots its declared size doesn't tack trailing bytes onto
   the data URI.

3. Tighten EMR_COMMENT recordSize check to >= 20 to match isEmfPlus's
   existing minimum.

Tests: 6 new pixel-bitmap tests using a vi.spyOn canvas mock cover the
core 32bppARGB path, bottom-up row flipping, 24bppRGB byte order,
32bppPARGB un-premultiplication, the no-canvas fallback, and the
indexed-format fallback. 20/20 in this file, 300/300 across the
helpers directory.
@gpardhivvarma gpardhivvarma force-pushed the feat/emf-plus-embedded-bitmap-sd-2503 branch from 34006b0 to 0ca6497 Compare May 12, 2026 12:05
@gpardhivvarma gpardhivvarma requested a review from caio-pizzol May 12, 2026 12:06
…t sign

MS-EMFPLUS § 2.2.2.2 is silent on what Height/Stride sign means for
storage direction. The earlier reading "positive Height = bottom-up"
borrowed from the classic Windows DIB convention, but every GDI+
producer (which means every Office-generated EMF+) lays pixel memory
out top-down regardless of Height sign. Rendering the SD-2503
reproducer with the bottom-up assumption produced an upside-down
cover image.

Drop the row-reversal entirely; storage row 0 is the visual top in
all cases. Update the corresponding test and JSDoc to reflect the
empirical convention.
Copy link
Copy Markdown
Contributor

@caio-pizzol caio-pizzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @gpardhivvarma! thanks for addressing last round.

i pushed a follow-up commit (a6e13bf) fixing the row-order bug codex flagged - emf+ stores raw pixel rows top-down regardless of the height sign, not bottom-up like windows dib, so the flip was inverting customer documents. updated the unit test and added 4 spec fixtures to our visual corpus.

lgtm.

@caio-pizzol caio-pizzol enabled auto-merge (squash) May 12, 2026 20:13
@caio-pizzol caio-pizzol disabled auto-merge May 12, 2026 20:16
@caio-pizzol caio-pizzol merged commit 71bf8d7 into superdoc-dev:main May 12, 2026
63 checks passed
@caio-pizzol caio-pizzol self-assigned this May 12, 2026
@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 12, 2026

🎉 This PR is included in @superdoc-dev/mcp v0.3.0-next.80

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 12, 2026

🎉 This PR is included in @superdoc-dev/react v1.2.0-next.123

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 12, 2026

🎉 This PR is included in vscode-ext v2.3.0-next.125

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 12, 2026

🎉 This PR is included in superdoc-cli v0.8.0-next.97

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 12, 2026

🎉 This PR is included in superdoc-sdk v1.8.0-next.79

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 12, 2026

🎉 This PR is included in superdoc v1.30.0-next.79

The release is available on GitHub release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: render EMF+ metafile images

3 participants