New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use pypdfium2's new range-based text extractor #5

Merged

MartinThoma merged 1 commit into py-pdf:main from mara004:patch-1

Oct 11, 2022

Commits on Oct 7, 2022

Use pypdfium2's new range-based text extractor

get_text() was boundary-based, which is not that suited for the use case of just extracting all text of a page.
I believe the new get_text_range() function might both yield better results and be more performant.

This can be merged once pypdfium2 3.3 is released.

mara004 committed Oct 7, 2022