Redaction Annotation Fill Not Matching Up With Redacted Section #3575

lyon-tonic · 2024-06-13T14:20:31Z

Description of the bug

I am trying to redact words from a PDF, based on OCR-generated rectangles.

PyMuPdf has worked well for us, but I have run into a strange situation with a specific file that has some strange properties. (I've attached the file). The pages in this file are an abnormal size (8.5 x 6.5 in) and some of them are rotated.

I would like to have the coordinates in the rectangles relative to the top left, but even before I do that, I have noticed that the redacted rectangle is not in the same place as the fill.

If this is not a bug, I would like to understand why these appear to be being drawn on separate coordinate systems, and how to reconcile them.

How to reproduce the bug

This is a simple script that shows the problem in the files below:

Input:
input.pdf

Output:
output.pdf

import fitz  # PyMuPDF

def process_pdf(input_pdf_path, output_pdf_path):
    # Open the input PDF file
    document = fitz.open(input_pdf_path)
    
    # Iterate through each page
    for page_num in range(len(document)):
        page = document.load_page(page_num)  # load page
        
        # 234 is half of the width of the page
        rect = fitz.Rect(0, 0, 234, 234)

        redact_annot = page.add_redact_annot(rect)
        redact_annot.update(fill_color=(0, 0, 0))  # set fill color to black
        page.apply_redactions()
        page.insert_textbox(rect, f"Page {page_num + 1}", fontsize=12, fontname="helv", color=(1, 0, 0))


    document.save(output_pdf_path)

if __name__ == "__main__":
    input_pdf_path = "input.pdf"  # Replace with the path to your input PDF
    output_pdf_path = "output.pdf"  # Replace with the path to your output PDF
    
    process_pdf(input_pdf_path, output_pdf_path)
    print(f"Processed PDF saved to {output_pdf_path}")

PyMuPDF version

1.24.5

Operating system

Windows

Python version

3.11

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2024-06-14T12:34:22Z

Inserting / Adding stuff to rotated pages can be confusing. For most methods in PyMuPDF you must pass rotated coordinates (for points, rectangles, ...) to get them in the right place.
I think this script does what you want:

import pymupdf as fitz  # PyMuPDF

RED = fitz.pdfcolor["red"]


def process_pdf(input_pdf_path, output_pdf_path):
    # Open the input PDF file
    document = fitz.open(input_pdf_path)

    # Iterate through each page
    for page in document:
        # 234 is half of the width of the page
        rect = fitz.Rect(0, 0, 234, 234)
        rot_rect = rect * page.derotation_matrix
        redact_annot = page.add_redact_annot(
            rot_rect, text=f"{page.number=}", text_color=RED
        )
        page.apply_redactions()

    document.ez_save(output_pdf_path)


if __name__ == "__main__":
    input_pdf_path = "input.pdf"  # Replace with the path to your input PDF
    output_pdf_path = "output.pdf"  # Replace with the path to your output PDF

    process_pdf(input_pdf_path, output_pdf_path)
    print(f"Processed PDF saved to {output_pdf_path}")

lyon-tonic · 2024-06-14T14:15:04Z

Thanks for responding!

This is part of the issue, but it is still not solving the issue of the redact_annot fill. The fill rectangle appears to be rendering separately from the redact_annot, and I'm not sure why.

The black fill rect is not showing up here.

import pymupdf as fitz  # PyMuPDF

RED = fitz.pdfcolor["red"]


def process_pdf(input_pdf_path, output_pdf_path):
    # Open the input PDF file
    document = fitz.open(input_pdf_path)

    # Iterate through each page
    for page in document:
        # 234 is half of the width of the page
        rect = fitz.Rect(0, 0, 234, 234)
        rot_rect = rect * page.derotation_matrix
        redact_annot = page.add_redact_annot(
            rot_rect, text=f"{page.number=}", text_color=RED
        )
        redact_annot.update(fill_color=(0, 0, 0))  # set fill color to black
        page.apply_redactions()

    document.ez_save(output_pdf_path)


if __name__ == "__main__":
    input_pdf_path = "input.pdf"  # Replace with the path to your input PDF
    output_pdf_path = "output.pdf"  # Replace with the path to your output PDF

    process_pdf(input_pdf_path, output_pdf_path)
    print(f"Processed PDF saved to {output_pdf_path}")

JorjMcKie · 2024-06-14T15:10:04Z

This file indeed does a few unexpected things!
Here is a complete solution that removes the page rotations.

import pymupdf as fitz  # PyMuPDF

RED = fitz.pdfcolor["red"]
BLACK = fitz.pdfcolor["black"]


def process_pdf(input_pdf_path, output_pdf_path):
    rect = fitz.Rect(0, 0, 234, 234)

    # Open the input PDF file
    src = fitz.open(input_pdf_path)
    doc = fitz.open()  # output file

    # Iterate through each page
    for src_page in src:
        # the output PDF will contain pages with rotation 0
        src_rect = src_page.rect
        w, h = src_rect.br
        src_rot = src_page.rotation
        src_page.set_rotation(0)
        # make output page having the visible dimension of the input
        page = doc.new_page(width=w, height=h)
        page.show_pdf_page(  # insert source page
            page.rect,
            src,
            src_page.number,
            rotate=-src_rot,  # reversed original rotation
        )
        
        # now we can redact in a worry-free manner
        redact_annot = page.add_redact_annot(
            rect, text=f"{page.number=}", text_color=RED, fill=BLACK
        )
        page.apply_redactions()

    doc.ez_save(output_pdf_path)


if __name__ == "__main__":
    input_pdf_path = "input.pdf"  # Replace with the path to your input PDF
    output_pdf_path = "output.pdf"  # Replace with the path to your output PDF

    process_pdf(input_pdf_path, output_pdf_path)
    print(f"Processed PDF saved to {output_pdf_path}")

JorjMcKie · 2024-06-19T09:11:06Z

Close issue for lack of reaction.

JorjMcKie closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redaction Annotation Fill Not Matching Up With Redacted Section #3575

Redaction Annotation Fill Not Matching Up With Redacted Section #3575

lyon-tonic commented Jun 13, 2024

JorjMcKie commented Jun 14, 2024

lyon-tonic commented Jun 14, 2024

JorjMcKie commented Jun 14, 2024

JorjMcKie commented Jun 19, 2024

Redaction Annotation Fill Not Matching Up With Redacted Section #3575

Redaction Annotation Fill Not Matching Up With Redacted Section #3575

Comments

lyon-tonic commented Jun 13, 2024

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented Jun 14, 2024

lyon-tonic commented Jun 14, 2024

JorjMcKie commented Jun 14, 2024

JorjMcKie commented Jun 19, 2024