Release v0.6.9: feat: extract bounding boxes from PyMuPDF using block-level dict API · Mihailorama/docfold

v0.6.9
10504b1
Choose a tag to compare

Filter

View all tags

v0.6.9: feat: extract bounding boxes from PyMuPDF using block-level dict API

v0.6.9
10504b1
Choose a tag to compare

Filter

View all tags

Mihailorama tagged this 04 Mar 10:23

PyMuPDFEngine now uses page.get_text("dict") to extract block-level
bounding boxes (type, coordinates, page number, text content) for
every processed PDF. This enables bbox overlay in the UI for all
digital PDFs, not just OCR-processed ones via Marker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!