You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My hack has been to change display.py so that ImageMagick creates the image as an 8bit PNG using convert("png8"), which Pillow can then cope with. This "works for me".
--- a/pdfplumber/display.py
+++ b/pdfplumber/display.py
@@ -41,7 +41,7 @@ def get_page_image(stream, page_no, resolution):
if img.alpha_channel:
img.background_color = wand.image.Color("white")
img.alpha_channel = "remove"
- with img.convert("png") as png:
+ with img.convert("png8") as png:
im = PIL.Image.open(BytesIO(png.make_blob()))
return im.convert("RGB")
Environment
pdfplumber version: 0.5.28
ImageMagick version: 6.9.11.27
Wand version: 0.6.6
Pillow version: 8.2.0
Python version: 3.7.9
OS: Linux
The text was updated successfully, but these errors were encountered:
Hi @linuxsoftware, and thanks for flagging this! Since the default seems to work well for most PDFs, I'd lean toward an approach that allows the user to specify the conversion mode via an argument passed to get_page_image(...) and Page.to_image(...). I'll put this on my todo list, though you're also welcome to submit a PR.
I was thinking about this and realized it is already possible to pass a user-created original image in to to_image so perhaps the code does not need to change at all.
e.g.
def my_page_image(page):
stream = page.pdf.stream
page_no = page.page_number - 1
with wand.image.Image(resolution=150,
filename=f"{stream.name}[{page_no}]") as img:
with img.convert("png8") as png:
im = PIL.Image.open(BytesIO(png.make_blob()))
return im.convert("RGB")
pi=page.to_image(original=my_page_image(page))
The main thing is for the user to realize the 8 bit limitation of Pillow when converting images. Perhaps it is enough that this conversation will now show up in searches, or perhaps it's worth a note in the Visual Debugging documentation?
I believe that the latest version(s) of pdfplumber, which make some more generalized improvements/changes, now convert your PDF to an acceptable image:
Thank you for this extremely useful library.
I had a problem with visual debugging of a PDF that was mostly grey. All the text turned white so it could not be seen.
Here is an example PDF.
The problem is ImageMagick creates the image of the page as a 16bit greyscale PNG, but Pillow has a documented issue with converting that to RGB. (See https://stackoverflow.com/questions/19892919/pil-converting-an-image-with-mode-i-to-rgb-results-in-a-fully-white-image and python-pillow/Pillow#3011)
My hack has been to change display.py so that ImageMagick creates the image as an 8bit PNG using
convert("png8")
, which Pillow can then cope with. This "works for me".Environment
The text was updated successfully, but these errors were encountered: