apply_redactions moves some unredacted text #3278

indigoviolet · 2024-03-19T00:54:46Z

Description of the bug

Before:

After:

Notes:

a similar issue was previously reported in apply_redactions() moving text #2957
this is being reported on behalf of an Enterprise support customer (I can email details separately if needed).
We have noticed that some issues like this are resolved when using Preview.app to (re-) print the PDF with "scale to fit" turned on.

How to reproduce the bug

import fitz

source_doc = fitz.open('source.pdf')
dst_doc = fitz.open()
dst_doc.insert_pdf(source_doc, 0, 0)

redaction_coords = [(507.3806485, 35.06467372292002, 552.9736259804001, 24.838430185399943),
                    (212.5271666879999, 137.43544772292012, 258.12014416840003, 127.20920418540004),
                    (212.527166688, 129.4908031440001, 258.1201441684001, 139.2412260440001)]

page = dst_doc.load_page(0)

for coords in redaction_coords:
        redaction = page.add_redact_annot(
            coords,
            text="  ",
            fill=(0.5, 0.5, 0.5),  # grey
        )
        redaction.update()

fitz.utils.apply_redactions(page)

dst_doc.save("result.pdf")
dst_doc.close()

PyMuPDF version

1.23.26

Operating system

MacOS

Python version

3.11

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2024-03-19T10:40:19Z

Sorry - I was wrong: got confused by the first two coords rectangles: they are empty and thus invalid.
Re-opened this issue.
It is an error in the base library, and I will submit an issue there.

The error has nothing to do with redactions. The page's /Contents stream is destroyed even when "cleaning" the PDF (part of the internal redaction process).

JorjMcKie · 2024-03-19T10:44:45Z

In this case, the PDF resulting from a PDF-to-PDF conversion will not cause a problem. So you can use this approach is you need an immediate circumvention:

import fitz

source_doc = fitz.open("source.pdf")
pdfdata = source_doc.convert_to_pdf()  # make a PDF/PDF conversion
# this will issue some error messages: obviously, some fonts contain errors.
source_doc = fitz.open("pdf", pdfdata)  # re-open source after conversion
dst_doc = fitz.open()
dst_doc.insert_pdf(source_doc, 0, 0)

redaction_coords = [
    (507.3806485, 35.06467372292002, 552.9736259804001, 24.838430185399943),
    (212.5271666879999, 137.43544772292012, 258.12014416840003, 127.20920418540004),
    (212.527166688, 129.4908031440001, 258.1201441684001, 139.2412260440001),
]

page = dst_doc.load_page(0)

for coords in redaction_coords:
    if fitz.Rect(coords).is_empty:
        print(f"normalizing empty rect {coords=}")
        coords = tuple(fitz.Rect(coords).normalize())
    redaction = page.add_redact_annot(
        coords,
        text="  ",
        fill=(0.5, 0.5, 0.5),  # grey
    )
    page.draw_rect(coords, color=(1, 0, 0))

page._apply_redactions(0, 0)

dst_doc.save("result.pdf")
dst_doc.close()

JorjMcKie · 2024-03-19T10:51:01Z

Issue number in MuPDF's issue system: https://bugs.ghostscript.com/show_bug.cgi?id=707673.

manjuadditive · 2024-03-19T19:38:23Z

In this case, the PDF resulting from a PDF-to-PDF conversion will not cause a problem. So you can use this approach is you need an immediate circumvention:

import fitz

source_doc = fitz.open("source.pdf")
pdfdata = source_doc.convert_to_pdf()  # make a PDF/PDF conversion
# this will issue some error messages: obviously, some fonts contain errors.
source_doc = fitz.open("pdf", pdfdata)  # re-open source after conversion
dst_doc = fitz.open()
dst_doc.insert_pdf(source_doc, 0, 0)

redaction_coords = [
    (507.3806485, 35.06467372292002, 552.9736259804001, 24.838430185399943),
    (212.5271666879999, 137.43544772292012, 258.12014416840003, 127.20920418540004),
    (212.527166688, 129.4908031440001, 258.1201441684001, 139.2412260440001),
]

page = dst_doc.load_page(0)

for coords in redaction_coords:
    if fitz.Rect(coords).is_empty:
        print(f"normalizing empty rect {coords=}")
        coords = tuple(fitz.Rect(coords).normalize())
    redaction = page.add_redact_annot(
        coords,
        text="  ",
        fill=(0.5, 0.5, 0.5),  # grey
    )
    page.draw_rect(coords, color=(1, 0, 0))

page._apply_redactions(0, 0)

dst_doc.save("result.pdf")
dst_doc.close()

@JorjMcKie
We are trying this solution, but we are getting too many cannot create ToUnicode mapping warnings which are making redactions harder. In such cases convert_to_pdf probably (can you confirm?) converts the underlying text to images. When we redact the area with PDF_REDACT_IMAGE_REMOVE option, the images doesn't seem to get removed.

UPDATE: I don't think convert_to_pdf is creating images when cannot create ToUnicode mapping is encountered. But I am lost on what the replacement object is. And it is certainly not getting redacted although it overlaps the redaction bounds.

JorjMcKie · 2024-03-19T21:46:53Z

Document.convert_to_pdf() does not convert page content to images. However, as the documentation explains, only a subset of fonts is eligible for such a conversion.
And of course: internally all text and other content is rewritten to the new PDF, therefore any errors in that area are detected and complained about.

manjuadditive · 2024-03-19T21:53:57Z

Document.convert_to_pdf() does not convert page content to images. However, as the documentation explains, only a subset of fonts is eligible for such a conversion. And of course: internally all text and other content is rewritten to the new PDF, therefore any errors in that area are detected and complained about.

What happens if the conversion fails?

JorjMcKie · 2024-03-19T21:57:11Z

An exception is raised. There also is the risk of incomplete / distorted output.
As I wrote: a possible circumvention for the time being.
Not meant as a permanent solution.

indigoviolet · 2024-03-22T18:46:05Z

@JorjMcKie do you need our customer details to fill in on the MuPDF issue so it gets the corresponding priority?

JorjMcKie · 2024-03-23T09:16:15Z

@JorjMcKie do you need our customer details to fill in on the MuPDF issue so it gets the corresponding priority?

Yes, please provide it if available.

JorjMcKie · 2024-03-26T15:22:41Z

Receive the customer id - thank you!

julian-smith-artifex-com · 2024-04-02T20:29:12Z

Fixed in 1.24.1.

JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Mar 19, 2024

JorjMcKie closed this as completed Mar 19, 2024

JorjMcKie reopened this Mar 19, 2024

JorjMcKie added upstream bug bug outside this package and removed not a bug not a bug / user error / unable to reproduce labels Mar 19, 2024

julian-smith-artifex-com closed this as completed Apr 2, 2024

deeplow mentioned this issue Apr 2, 2024

Evaluate Dangerzone's Potential as a Redaction Tool (and add redaction capabilities) freedomofpress/dangerzone#763

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply_redactions moves some unredacted text #3278

apply_redactions moves some unredacted text #3278

indigoviolet commented Mar 19, 2024

JorjMcKie commented Mar 19, 2024

JorjMcKie commented Mar 19, 2024

JorjMcKie commented Mar 19, 2024

manjuadditive commented Mar 19, 2024 •

edited

JorjMcKie commented Mar 19, 2024

manjuadditive commented Mar 19, 2024

JorjMcKie commented Mar 19, 2024

indigoviolet commented Mar 22, 2024

JorjMcKie commented Mar 23, 2024

JorjMcKie commented Mar 26, 2024

julian-smith-artifex-com commented Apr 2, 2024

apply_redactions moves some unredacted text #3278

apply_redactions moves some unredacted text #3278

Comments

indigoviolet commented Mar 19, 2024

Description of the bug

Notes:

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented Mar 19, 2024

JorjMcKie commented Mar 19, 2024

JorjMcKie commented Mar 19, 2024

manjuadditive commented Mar 19, 2024 • edited

JorjMcKie commented Mar 19, 2024

manjuadditive commented Mar 19, 2024

JorjMcKie commented Mar 19, 2024

indigoviolet commented Mar 22, 2024

JorjMcKie commented Mar 23, 2024

JorjMcKie commented Mar 26, 2024

julian-smith-artifex-com commented Apr 2, 2024

manjuadditive commented Mar 19, 2024 •

edited