Skip to content

Fix nativeRaster to be row-major so bitmaps serialize correctly#54

Merged
billdenney merged 2 commits into
mainfrom
fix-nativeraster-rowmajor-render
Jul 3, 2026
Merged

Fix nativeRaster to be row-major so bitmaps serialize correctly#54
billdenney merged 2 commits into
mainfrom
fix-nativeraster-rowmajor-render

Conversation

@billdenney

Copy link
Copy Markdown
Member

pdf_render_page(), pdf_image_bitmap(), pdf_image_rendered() and pdf_text_obj_rendered_bitmap() return a pdfium_bitmap that inherits from nativeRaster. The C++ writers filled the backing integer buffer column-major (the R matrix default), but a nativeRaster's buffer must be row-major: element k is the pixel at (row = k / width, col = k % width).

Any consumer that trusts the nativeRaster class and reads the buffer directly -- png::writePNG(), grid::grid.raster(), R's graphics engine -- therefore read every row from column-major storage and sheared it sideways ("stride-streak" garble). Downstream packages worked around this by re-decoding via as.array() or by reshaping inflated image streams by hand. The pixels PDFium produced were correct all along; only the R-side buffer layout was non-conformant.

Fix: emit the buffer row-major so the object is a genuine, conformant nativeRaster. png::writePNG(bmp) and grid::grid.raster(bmp) now render correctly with no reshape, matching png::readPNG() / magick::image_read() in dims, RGBA channel order and 0..1 range.

  • Add src/native_raster.h with shared row-major fill helpers (fill_bgra_rowmajor for the always-BGRA render path; fill_bitmap_rowmajor for the Gray/BGR/BGRx/BGRA image-object path), honoring FPDFBitmap_GetStride() per row.
  • render.cpp, images.cpp, tier3_extras.cpp: use the helpers (removes three near-duplicate copies of the format switch; the tier3 writer's prior "row-major" comment was wrong -- it used m(y, x), which is column-major).
  • as.array.pdfium_bitmap() / as.raster.pdfium_bitmap(): unpack the row-major buffer with matching byrow reshaping.
  • Tests: add nativeRaster-conformance regression tests for pdf_render_page() and pdf_image_bitmap() asserting writePNG(bmp) == writePNG(as.array(bmp)) byte-for-byte, plus a quadrant round-trip; update the hand-built column-major plot fixture to build a conformant row-major nativeRaster.

Verified: page 13 of a corpus scan (previously OCR'd to nothing through png::writePNG(bmp)) now OCRs its value axis correctly and is pixel-equal to the as.array() path and to magick/gs; full test suite passes (0 failures, 3021 assertions); a normal text page and a vector page show no regression.

billdenney and others added 2 commits July 3, 2026 14:56
pdf_render_page(), pdf_image_bitmap(), pdf_image_rendered() and
pdf_text_obj_rendered_bitmap() return a pdfium_bitmap that inherits from
`nativeRaster`. The C++ writers filled the backing integer buffer
column-major (the R matrix default), but a nativeRaster's buffer must be
row-major: element k is the pixel at (row = k / width, col = k % width).

Any consumer that trusts the `nativeRaster` class and reads the buffer
directly -- png::writePNG(), grid::grid.raster(), R's graphics engine --
therefore read every row from column-major storage and sheared it
sideways ("stride-streak" garble). Downstream packages worked around
this by re-decoding via as.array() or by reshaping inflated image
streams by hand. The pixels PDFium produced were correct all along; only
the R-side buffer layout was non-conformant.

Fix: emit the buffer row-major so the object is a genuine, conformant
nativeRaster. png::writePNG(bmp) and grid::grid.raster(bmp) now render
correctly with no reshape, matching png::readPNG() / magick::image_read()
in dims, RGBA channel order and 0..1 range.

- Add src/native_raster.h with shared row-major fill helpers
  (fill_bgra_rowmajor for the always-BGRA render path;
  fill_bitmap_rowmajor for the Gray/BGR/BGRx/BGRA image-object path),
  honoring FPDFBitmap_GetStride() per row.
- render.cpp, images.cpp, tier3_extras.cpp: use the helpers (removes
  three near-duplicate copies of the format switch; the tier3 writer's
  prior "row-major" comment was wrong -- it used m(y, x), which is
  column-major).
- as.array.pdfium_bitmap() / as.raster.pdfium_bitmap(): unpack the
  row-major buffer with matching byrow reshaping.
- Tests: add nativeRaster-conformance regression tests for
  pdf_render_page() and pdf_image_bitmap() asserting
  writePNG(bmp) == writePNG(as.array(bmp)) byte-for-byte, plus a
  quadrant round-trip; update the hand-built column-major plot fixture
  to build a conformant row-major nativeRaster.

Verified: page 13 of a corpus scan (previously OCR'd to nothing through
png::writePNG(bmp)) now OCRs its value axis correctly and is pixel-equal
to the as.array() path and to magick/gs; full test suite passes
(0 failures, 3021 assertions); a normal text page and a vector page show
no regression.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
R CMD check passed on every platform, but three checks were red:

- coverage / test-coverage: dropping bitmap_to_native_raster from
  images.cpp orphaned a `// # nocov end` (its `# nocov start` went with
  the removed block), so covr's parse_exclusions aborted with
  "7 range starts but 8 range ends". Remove the stray marker (images.cpp
  nocov is 7/7 again); the R-coverage gate then computes and passes at
  100% (verified locally with the workflow's exact R-narrowing logic).
- lint: the new nativeRaster-conformance test description was 104 chars,
  over the 100-char line_length_linter limit (CI sets
  LINTR_ERROR_ON_LINT=true). Shorten it.

Comment/label-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@billdenney billdenney merged commit 155a9fc into main Jul 3, 2026
13 checks passed
@billdenney billdenney deleted the fix-nativeraster-rowmajor-render branch July 3, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant