Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pypdfium2's new range-based text extractor #5

Merged
merged 1 commit into from
Oct 11, 2022

Commits on Oct 7, 2022

  1. Use pypdfium2's new range-based text extractor

    get_text() was boundary-based, which is not that suited for the use case of just extracting all text of a page.
    I believe the new get_text_range() function might both yield better results and be more performant.
    
    This can be merged once pypdfium2 3.3 is released.
    mara004 committed Oct 7, 2022
    Configuration menu
    Copy the full SHA
    8b194a0 View commit details
    Browse the repository at this point in the history