-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Add comparison with pdfplumber #1837
Conversation
Added my take on the `pdfplumber` library compared to PyPDF.
I like it! I have to suggestions. What do you think about those? |
Yeah, I think it would work also, since |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
New Features (ENH) - Simplify metadata input (Document Information Dictionary) (#1851) - Extend cmap compatibilty to GBK_EUC_H/V (#1812) Bug Fixes (BUG) - Prevent infinite loop when no character follows after a comment (#1828) - get_contents does not return ContentStream (#1847) - Accept XYZ destination with zoom missing (default to zoom=0.0) (#1844) - Cope with 1 Bit images (#1815) Robustness (ROB) - Handle missing /Type entry in Page tree (#1845) Documentation (DOC) - Expand file size explanations (#1835) - Add comparison with pdfplumber (#1837) - Clarify that PyPDF2 is dead (#1827) - Add Hunter King as Contributor for #1806 Maintenance (MAINT) - Refactor internal Encryption class (#1821) - Add R parameter to generate_values (#1820) - Make encryption_key parameter of write_to_stream optional (#1819) - Prepare for adding AES enryption support (#1818) Code Style (STY): - Iterate directly over the list instead of using range (#1839) - Minor refactorings in _encryption.py (#1822) [Full Changelog](3.8.1...3.9.0)
|
||
[`pdfminer.six`](https://pypi.org/project/pdfminer.six/) is capable of | ||
extracting the [font size](https://stackoverflow.com/a/69962459/562769) | ||
/ font weight (bold-ness). It has no capabilities for writing PDF files. | ||
|
||
## pdfrw / pdfminer / pdfplumber | ||
[`pdfplumber`](https://pypi.org/project/pdfplumber/) is a library focused on extracting data from PDF documents. Since `pdfplumber` is built on top of `pdfminer.six`, there are **no capabilities of exporting or modifying a PDF file** (see [#440 (discussions)](https://github.com/jsvine/pdfplumber/discussions/440#discussioncomment-803880)). However, `pdfplumber` is capable of converting a PDF file into an image, [draw lines and rectangles on the image](https://github.com/jsvine/pdfplumber#drawing-methods), and save it as an image file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is capable of converting a PDF file into an image
From skimming the Readme, it looks like pdfplumber
calls Wand
for pdf rendering, which is a binding to ImageMagick
, which in turn uses ghostscript
, IIRC.
So this phrase is kinda misleading as pdfplumber is not an actual pdf rendering library (as opposed to mupdf/poppler/pdfium), but merely a rendering "wrapper-wrapper-wrapper".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree! It is not a PDF rendering library, there's just one function to convert the PDF into an image with the tools you mentioned. I'm not experienced with Wand
, ImageMagick
, and ghostscript
, so if you're an expert there, feel free to elaborate more on my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RitchieP You could rephrase
However,
pdfplumber
is capable of converting a PDF file into an image
to
However,
pdfplumber
is capable of converting a PDF file into an image via ImageMagick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! I'll make a PR in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Added my take on the
pdfplumber
library compared to PyPDF.