Extract some transparent PNG images, discoloration occurs #670

mc373906408 · 2020-09-28T10:39:38Z

def recoveImage(xref, smake):
            def getimage(pix):
                if pix.colorspace.n != 4:
                    return pix
                tpix = fitz.Pixmap(fitz.csRGB, pix)
                return tpix

            pix1 = fitz.Pixmap(self.mu_Document, xref)
            pix2 = fitz.Pixmap(self.mu_Document, smake)


            pix = fitz.Pixmap(pix1)
            pix.setAlpha(pix2.samples)
            pix1 = pix2 = None

            return getimage(pix)

The original image：

Extracted image：

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2020-09-28T12:09:43Z

The logic to recover the alpha channel is known to be incomplete.
I am still investigating which cases need to be differentiated - so I will remove the bug label and label this as enhancement.
In your special case, this should help:

def recoveImage(xref, smake):
    def getimage(pix):
        if pix.colorspace.n != 4:
            return pix
        tpix = fitz.Pixmap(fitz.csRGB, pix)
        return tpix

    pix1 = fitz.Pixmap(self.mu_Document, xref)
    pix2 = fitz.Pixmap(self.mu_Document, smake)
    pix = fitz.Pixmap(pix1, 1)  # add alpha channel
    ba = bytearray(pix2.samples)
    for i in range(len(ba)):
        if ba[i] > 0:
            ba[i] = 255
    pix.setAlpha(ba)
    pix1 = pix2 = None
    return getimage(pix)

mc373906408 · 2020-09-28T12:11:38Z

I will temporarily use PIL to restore transparency, looking forward to improvement

JorjMcKie · 2020-09-28T12:14:51Z

Ah, ok.
Can you please let me know which PIL feature was a help / solved the problem?

mc373906408 · 2020-09-28T12:16:15Z

from PIL import Image

             ...

            pix1 = fitz.Pixmap(self.mu_Document, xref)
            pix2 = fitz.Pixmap(self.mu_Document, smake)

            mode="RGB"
            if pix1.alpha>0:
                mode="RGBA"
            pix=Image.frombytes(mode,(pix1.irect[2],pix1.irect[3]),pix1.samples)
            mask=Image.frombytes("L",(pix2.irect[2],pix2.irect[3]),pix2.samples)
            tpix=Image.new("RGBA",pix.size)
            tpix.paste(pix,None,mask)
            bf=BytesIO()
            tpix.save(bf,"png")
            
            return bf.getvalue()

JorjMcKie · 2020-09-28T12:42:48Z

Great, thanks.
I regard this whole business as being outside PyMuPDF scope. The base C library, MuPDF, also offers no solution here.
The image extraction scripts in the PyMuPDF-Utilities repo are examples, no solutions, for which I can take on responsibility.
Nevertheless I will test your solution a bit more and then modify the script accordingly.
Why re-inventing the wheel if we have Pillow?

mc373906408 · 2020-09-28T12:43:50Z

OK

JorjMcKie · 2020-09-30T08:46:33Z

I have changed the example scripts extract-imga.py and extract-imgb.py to make use of PIL/Pillow in case of transparent images.

mc373906408 added the bug label Sep 28, 2020

mc373906408 assigned JorjMcKie Sep 28, 2020

JorjMcKie added enhancement and removed bug labels Sep 28, 2020

JorjMcKie closed this as completed Sep 30, 2020

This was referenced Oct 4, 2020

Question / Comment: extract transparent image with EMPTY color-space and soft-mask #677

Closed

Can't reproduce transparent PNG images correctly ArtifexSoftware/pdf2docx#52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract some transparent PNG images, discoloration occurs #670

Extract some transparent PNG images, discoloration occurs #670

mc373906408 commented Sep 28, 2020 •

edited

JorjMcKie commented Sep 28, 2020

mc373906408 commented Sep 28, 2020

JorjMcKie commented Sep 28, 2020 •

edited

mc373906408 commented Sep 28, 2020 •

edited

JorjMcKie commented Sep 28, 2020

mc373906408 commented Sep 28, 2020

JorjMcKie commented Sep 30, 2020

Extract some transparent PNG images, discoloration occurs #670

Extract some transparent PNG images, discoloration occurs #670

Comments

mc373906408 commented Sep 28, 2020 • edited

JorjMcKie commented Sep 28, 2020

mc373906408 commented Sep 28, 2020

JorjMcKie commented Sep 28, 2020 • edited

mc373906408 commented Sep 28, 2020 • edited

JorjMcKie commented Sep 28, 2020

mc373906408 commented Sep 28, 2020

JorjMcKie commented Sep 30, 2020

mc373906408 commented Sep 28, 2020 •

edited

JorjMcKie commented Sep 28, 2020 •

edited

mc373906408 commented Sep 28, 2020 •

edited