Skip to content

v0.28.2

Choose a tag to compare

@manykarim manykarim released this 29 Oct 14:41
· 33 commits to main since this release

Update sort=True for PDF text retrieval.
This will ensure that text is rerieved in natural western reading order (from top-left)

page_obj.pdf_text_data = page.get_text("text", sort=True)
page_obj.pdf_text_dict = page.get_text("dict", sort=True)
page_obj.pdf_text_words = page.get_text("words", sort=True)
page_obj.pdf_text_blocks = page.get_text("blocks", sort=True)