Version 0.2.0 has been released #317
JorjMcKie
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This version offers a significantly improved way for extracting data from document by using package PyMuPDF-Layout.
After installation of that package and including the import
import pymupdf.layoutin your script before importing pymupdf4llm, the layout feature will automatically be used.Please consider trying it out. Among the many improvements are:
.to_markdown()- as before, however with some new parameters that allow page header and footer suppression. Other parameters are now obsolete or not yet adapted to the layout feature..to_text()- output of plain text. Table content its shown using package tabulate..to_json()- output the document's metadata and the selected pages in JSON format.Beta Was this translation helpful? Give feedback.
All reactions