Skip to content

v0.3.0

Compare
Choose a tag to compare
@jsvine jsvine released this 07 Mar 01:10
· 599 commits to stable since this release

A ton of improvements and new features:

  • Shifts to a lazy-loading paradigm, so that you don't have to process an entire PDF just to access one page.
  • Strips out pandas requirement and usage.
    • Results in a 3x-ish speedup for within_bbox and similar methods, thanks to short-circuiting & operators.
  • Moves from floats to Decimals to improve accuracy of equality comparisons.
  • Moves to a more modular architecture, adds Container, Page, and CroppedPage classes.
  • Adds Page.crop(...).
  • Adds Page.extract_table(...) for Tabula-like functionality.
  • Adds PDF.metadata property.
  • Adds derived properties Container.rect_edges and Container.edges, decomposing each rectangle decomposed into its constituent lines.
  • Renames collate_chars(...) to get_text(...) (while retaining a reference to the former).
  • Enriches the the command-line tool's JSON output to include PDF metadata and page dimensions. [https://github.com//issues/3]