Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Support for Adding and Viewing Ink Annotations in Mac's Preview app #2332

Closed
themarisolhernandez opened this issue Dec 7, 2023 · 9 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-annotation Everything about annotating PDF files

Comments

@themarisolhernandez
Copy link

themarisolhernandez commented Dec 7, 2023

I am having problems adding Ink annotations back to a PDF using PdfWriter.add_annotation(). I think the problem is related to the PDF viewer. When I open the file after adding the Ink annotation in Mac's preview app, the Ink annotation is transparent. The file and annotation look fine when viewing it under AdobeReader.

Any ideas why?

This isn't an issue when using PyPDF2, but I would prefer to use pypdf. Are there plans to support this?

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-13.5.2-x86_64-i386-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.1, crypt_provider=('cryptography', '41.0.5'), PIL=9.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader, PdfWriter
from io import BytesIO


def add_annots(file_bytes: bytes,
               annot: list) -> bytes:
    writer = PdfWriter()

    with BytesIO(file_bytes) as input_stream, BytesIO() as output_stream:
        reader = PdfReader(input_stream)
        writer.append_pages_from_reader(reader)

        for page_num, annot in annots:
            writer.add_annotation(page_number=page_num, annotation=annot)

        # Add original metadata
        writer.add_metadata(reader.metadata)

        writer.write(output_stream)
        output_stream.seek(0)
        pdf_file = output_stream.read()

    return pdf_file

The input file is attached under the filename input.pdf. The output file is attached under the filename output.pdf. An image of the output file is also attached to show that the Ink annotation is transparent.

The annot input looks like:

[[0, {'/AP': {'/N': IndirectObject(78, 0, 5043561152)}, '/C': [0.898041, 0.133331, 0.215683], '/CreationDate': "D:20231122191551-08'00'", '/F': 4, '/M': "D:20231122191551-08'00'", '/NM': 'dc92223d-3b4c-47d4-a2a9-73c263b95c84', '/P': IndirectObject(48, 0, 5043561152), '/Popup': IndirectObject(76, 0, 5043561152), '/QuadPoints': [199.111, 614.634, 446.841, 614.634, 199.111, 602.86, 446.841, 602.86, 85.1462, 603.952, 135.067, 603.952, 85.1462, 590.839, 135.067, 590.839, 199.564, 603.952, 328.095, 603.952, 199.564, 590.839, 328.095, 590.839], '/Rect': [81.2359, 590.019, 450.393, 615.411], '/Subj': 'Cross-Out', '/Subtype': '/StrikeOut', '/T': 'Yoed', '/Type': '/Annot'}], [0, {'/F': 28, '/Open': False, '/Parent': IndirectObject(77, 0, 5043561152), '/Rect': [609.12, 522.634, 793.12, 614.634], '/Subtype': '/Popup', '/Type': '/Annot'}], [0, {'/AP': {'/N': IndirectObject(73, 0, 5043561152)}, '/C': [0.988235, 0.956863, 0.521576], '/CA': 0.399994, '/CreationDate': "D:20231122191602-08'00'", '/F': 4, '/M': "D:20231122191602-08'00'", '/NM': '1a628f96-03a9-452e-a596-fb484dbb7563', '/P': IndirectObject(48, 0, 5043561152), '/Popup': IndirectObject(71, 0, 5043561152), '/QuadPoints': [168.149, 545.59, 312.648, 545.59, 168.149, 534.318, 312.648, 534.318], '/Rect': [165.14, 533.966, 315.657, 545.942], '/Subj': 'Highlight', '/Subtype': '/Highlight', '/T': 'Yoed', '/Type': '/Annot'}], [0, {'/F': 28, '/Open': False, '/Parent': IndirectObject(72, 0, 5043561152), '/Rect': [609.12, 453.59, 793.12, 545.59], '/Subtype': '/Popup', '/Type': '/Annot'}], [0, {'/AP': {'/N': IndirectObject(70, 0, 5043561152), '/R': IndirectObject(69, 0, 5043561152)}, '/C': [1, 0.819611, 0], '/Contents': 'Testing', '/CreationDate': "D:20231122191647-08'00'", '/F': 28, '/M': "D:20231122191658-08'00'", '/NM': '315e7788-b990-4d64-807e-b210c23bd4ee', '/Name': '/Comment', '/P': IndirectObject(48, 0, 5043561152), '/Popup': IndirectObject(67, 0, 5043561152), '/RC': '<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:23.6.0" xfa:spec="2.0.2" ><p dir="ltr"><span dir="ltr" style="font-size:10.2pt;text-align:left;font-weight:normal;font-style:normal">Testing</span></p></body>', '/Rect': [252.351, 502.874, 276.351, 526.874], '/Subj': 'Sticky Note', '/Subtype': '/Text', '/T': 'Yoed', '/Type': '/Annot'}], [0, {'/F': 28, '/Open': False, '/Parent': IndirectObject(68, 0, 5043561152), '/Rect': [609.12, 434.874, 793.12, 526.874], '/Subtype': '/Popup', '/Type': '/Annot'}], [0, {'/AP': {'/N': IndirectObject(64, 0, 5043561152)}, '/C': [1, 1, 1], '/Contents': 'Test text here..........', '/CreationDate': "D:20231122191835-08'00'", '/DA': '0.898 0.1333 0.2157 rg /Helv 12 Tf', '/DS': 'font: Helvetica,sans-serif 12.0pt; text-align:left; color:#E52237 ', '/F': 4, '/M': "D:20231122191859-08'00'", '/NM': '85f8f2dd-0a22-4eed-81d0-230a7f66037f', '/P': IndirectObject(48, 0, 5043561152), '/RC': '<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:23.6.0" xfa:spec="2.0.2"  style="font-size:12.0pt;text-align:left;color:#FF0000;font-weight:normal;font-style:normal;font-family:Helvetica,sans-serif;font-stretch:normal"><p dir="ltr"><span style="font-family:Helvetica">Test text here..........</span></p></body>', '/Rect': [297.461, 479.137, 418.337, 497.006], '/Subj': 'Text Box', '/Subtype': '/FreeText', '/T': 'Yoed', '/Type': '/Annot'}], [0, {'/AP': {'/N': IndirectObject(62, 0, 5043561152)}, '/BS': IndirectObject(61, 0, 5043561152), '/C': [0.898041, 0.133331, 0.215683], '/CreationDate': "D:20231122191912-08'00'", '/F': 4, '/InkList': [[230.191, 417.648, 229.139, 418.699, 227.563, 421.852, 227.037, 423.429, 225.986, 425.531, 225.461, 428.684, 223.884, 433.94, 223.884, 434.465, 223.358, 438.144, 223.358, 439.195, 223.358, 441.823, 223.358, 446.027, 223.358, 446.553, 223.358, 449.706, 223.358, 450.757, 223.358, 451.808, 223.884, 452.334, 224.935, 453.911, 225.461, 454.962, 228.088, 457.589, 230.716, 459.692, 236.497, 463.371, 238.599, 464.422, 244.38, 465.998, 245.432, 465.998, 245.957, 466.524, 246.483, 466.524, 247.008, 466.524, 249.636, 466.524, 250.687, 466.524, 256.994, 466.524, 260.672, 466.524, 262.249, 466.524, 263.826, 466.524, 264.877, 465.998, 270.658, 463.371, 271.709, 462.845, 277.49, 459.692, 279.067, 458.641, 282.22, 456.538, 287.475, 452.334, 288.001, 451.283, 290.103, 449.181, 292.731, 444.976, 293.257, 443.925, 294.833, 439.721, 295.884, 438.144, 296.41, 436.042, 296.935, 433.414, 297.986, 429.735, 297.986, 428.159, 297.986, 426.057, 298.512, 422.903, 299.038, 421.327, 299.038, 419.75, 299.038, 417.122, 297.461, 409.239, 296.935, 408.188, 295.884, 403.983, 293.782, 398.728, 293.257, 397.677, 291.154, 393.998, 288.527, 390.319, 287.475, 389.794, 285.373, 387.166, 282.22, 385.064, 277.49, 382.961, 268.03, 380.859, 260.672, 379.283, 254.366, 378.232, 243.329, 378.232, 240.702, 378.232, 236.497, 378.232, 228.614, 378.232, 227.563, 379.283, 221.256, 381.91, 219.154, 383.487, 209.169, 389.794, 206.541, 391.37, 197.607, 403.458, 193.402, 410.816, 192.351, 415.02, 189.198, 423.429, 188.147, 426.582, 187.096, 432.363, 187.096, 434.465, 187.096, 439.195, 187.096, 441.297, 187.096, 444.976, 187.621, 446.027, 187.621, 447.078]], '/M': "D:20231122191912-08'00'", '/NM': '51684b6f-eb0c-43d3-b464-2b1443354af7', '/P': IndirectObject(48, 0, 5043561152), '/Popup': IndirectObject(46, 0, 5043561152), '/Rect': [182.443, 372.721, 305.745, 472.808], '/Subj': 'Pencil', '/Subtype': '/Ink', '/T': 'Yoed', '/Type': '/Annot'}], [0, {'/F': 28, '/Open': False, '/Parent': IndirectObject(47, 0, 5043561152), '/Rect': [609.12, 325.648, 793.12, 417.648], '/Subtype': '/Popup', '/Type': '/Annot'}]]

input.pdf

output.pdf

Screenshot 2023-12-07 at 3 52 34 PM

@pubpub-zz
Copy link
Collaborator

To complete the analysis:
this is the result in Acrobat Reader (Windows) / PdfXchange (windows):
image

with PDF.js (firefox) :
image

with Chrome:
image

@themarisolhernandez, I would expect the same results with the same softwares under Mac. Can you confirm ?

@pubpub-zz
Copy link
Collaborator

@themarisolhernandez

your code can not be run as it is. Can you complete it. Thanks

@MartinThoma
Copy link
Member

Comment by themarisolhernandez:

[Adding an Ink annotation] isn't an issue when using PyPDF2

That is interesting and unexpected.

@pubpub-zz
Copy link
Collaborator

Comment by themarisolhernandez:

[Adding an Ink annotation] isn't an issue when using PyPDF2

That is interesting and unexpected.

@themarisolhernandez, can you provide also the output when using PyPDF2

@themarisolhernandez
Copy link
Author

themarisolhernandez commented Dec 11, 2023

@pubpub-zz @MartinThoma Here is the complete code using pypdf

from pypdf.generic import NullObject, IndirectObject, ArrayObject, DictionaryObject
from pypdf import PdfReader, PdfWriter
from typing import Any
from io import BytesIO


def extract_annots_recursively(extracted_annots: list,
                               page_num: int,
                               annots: Any) -> None:
    if annots is None or isinstance(annots, NullObject):
        # Skip NullObjects
        return
    elif isinstance(annots, IndirectObject):
        obj = annots.get_object()

        extract_annots_recursively(extracted_annots=extracted_annots,
                                   page_num=page_num,
                                   annots=obj)
    elif isinstance(annots, list) or isinstance(annots, ArrayObject):
        for obj in annots:
            extract_annots_recursively(extracted_annots=extracted_annots,
                                       page_num=page_num,
                                       annots=obj)
    elif isinstance(annots, dict) or isinstance(annots, DictionaryObject):
        extracted_annots.append([page_num, annots])

def extract_annots(file_bytes: bytes) -> tuple[bytes, list]:
    writer = PdfWriter()
    extracted_annots = []

    with BytesIO(file_bytes) as input_stream, BytesIO() as output_stream:
        reader = PdfReader(input_stream)

        for page_num, page in enumerate(reader.pages):
            page_annots = page.get("/Annots", [])

            extract_annots_recursively(extracted_annots=extracted_annots,
                                       page_num=page_num,
                                       annots=page_annots)

            writer.add_page(page)

        # Remove annots from the PdfWriter
        writer.remove_annotations(subtypes=None)

        writer.write(output_stream)
        output_stream.seek(0)
        pdf_file = output_stream.read()

    return pdf_file, extracted_annots


def add_annots(file_bytes: bytes,
               annots: list) -> bytes:
    writer = PdfWriter()

    with BytesIO(file_bytes) as input_stream, BytesIO() as output_stream:
        reader = PdfReader(input_stream)
        writer.append_pages_from_reader(reader)

        for page_num, annot in annots:
            writer.add_annotation(page_number=page_num, annotation=annot)

        # Add original metadata
        writer.add_metadata(reader.metadata)

        writer.write(output_stream)
        output_stream.seek(0)
        pdf_file = output_stream.read()

    return pdf_file


if __name__ == "__main__":
    print("--- Extract Annots ---")
    with open("extract_annots__input_file.pdf", "rb") as f:
        input_file = f.read()

    output_file, annots = extract_annots(file_bytes=input_file)

    print("\n--- Add Annots ---")
    output_file = add_annots(file_bytes=output_file,
                             annots=annots)

    with open("add_annots__output_file.pdf", "wb") as f:
        f.write(output_file)

The input and output files are attached:
extract_annots__input_file.pdf
add_annots__output_file.pdf

Again, the Ink annotation appears visible when opening the output file in a PDF viewer like Adobe Acrobat Reader. But the Ink annotation does not appear visible when opening the output file in Mac's Preview app. As seen in the screenshot, the Ink annotation is there but it is transparent for some reason.

Screenshot 2023-12-11 at 10 07 15 AM

I will send another response with the output of PyPDF2.

@themarisolhernandez
Copy link
Author

themarisolhernandez commented Dec 11, 2023

Here are the results of using PyPDF2 instead,

from PyPDF2.generic import NullObject, IndirectObject, ArrayObject, DictionaryObject
from PyPDF2 import PdfReader, PdfWriter
from typing import Any
from io import BytesIO


def extract_annots_recursively(extracted_annots: list,
                               page_num: int,
                               annots: Any) -> None:
    if annots is None or isinstance(annots, NullObject):
        # Skip NullObjects
        return
    elif isinstance(annots, IndirectObject):
        obj = annots.get_object()

        extract_annots_recursively(extracted_annots=extracted_annots,
                                   page_num=page_num,
                                   annots=obj)
    elif isinstance(annots, list) or isinstance(annots, ArrayObject):
        for obj in annots:
            extract_annots_recursively(extracted_annots=extracted_annots,
                                       page_num=page_num,
                                       annots=obj)
    elif isinstance(annots, dict) or isinstance(annots, DictionaryObject):
        extracted_annots.append([page_num, annots])


def extract_annots(file_bytes: bytes) -> tuple[bytes, list]:
    writer = PdfWriter()
    extracted_annots = []

    # Note: input_stream is not closed explicitly because it leads to an I/O error for IndirectObjects
    input_stream = BytesIO(file_bytes)
    reader = PdfReader(input_stream)

    for page_num, page in enumerate(reader.pages):
        page_annots = page.get("/Annots", [])

        extract_annots_recursively(extracted_annots=extracted_annots,
                                   page_num=page_num,
                                   annots=page_annots)

        writer.add_page(page)

    # Remove annots from the PdfWriter
    writer.remove_links()

    with BytesIO() as output_stream:
        writer.write(output_stream)
        output_stream.seek(0)
        cleaned_pdf = output_stream.read()

    return cleaned_pdf, extracted_annots


def add_annots(file_bytes: bytes,
               annots: list) -> bytes:
    writer = PdfWriter()

    with BytesIO(file_bytes) as input_stream, BytesIO() as output_stream:
        reader = PdfReader(input_stream)
        writer.append_pages_from_reader(reader)

        for page_num, annot in annots:
            writer.add_annotation(page_number=page_num, annotation=annot)

        # Add original metadata
        writer.add_metadata(reader.metadata)

        writer.write(output_stream)
        output_stream.seek(0)
        pdf_file = output_stream.read()

    return pdf_file


if __name__ == "__main__":
    print("--- Extract Annots ---")
    with open("extract_annots__input_file.pdf", "rb") as f:
        input_file = f.read()

    output_file, annots = extract_annots(file_bytes=input_file)

    print("\n--- Add Annots ---")
    output_file = add_annots(file_bytes=output_file,
                             annots=annots)

    with open("add_annots__output_file_pypdf2.pdf", "wb") as f:
        f.write(output_file)

The input and output files are attached:
extract_annots__input_file.pdf
add_annots__output_file_pypdf2.pdf

The following is a screenshot of the output file opened in Mac's preview app. Here you can clearly see the Ink annotation.
Screenshot 2023-12-11 at 10 15 26 AM

@pubpub-zz
Copy link
Collaborator

@themarisolhernandez
Comparing the two files I've found that the "/BS" is refering an object that does not exists. The easiest would be to clone the annotation ignoring "/P". "/P" is declared as optional It may work without reference the new page

@MartinThoma MartinThoma added the workflow-annotation Everything about annotating PDF files label Dec 24, 2023
@MartinThoma MartinThoma removed their assignment Dec 24, 2023
@py-pdf py-pdf deleted a comment from pubpub-zz Dec 24, 2023
@py-pdf py-pdf deleted a comment from themarisolhernandez Dec 24, 2023
@MartinThoma MartinThoma changed the title Support for Adding and Viewing Ink Annotations in Mac's Preview app BUG: Support for Adding and Viewing Ink Annotations in Mac's Preview app Dec 24, 2023
@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Dec 24, 2023
@pubpub-zz
Copy link
Collaborator

@themarisolhernandez Comparing the two files I've found that the "/BS" is refering an object that does not exists. The easiest would be to clone the annotation ignoring "/P". "/P" is declared as optional It may work without reference the new page

any return about this ?

@pubpub-zz
Copy link
Collaborator

I close this issue as there is no news.feel free to send update if you want to reopen it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-annotation Everything about annotating PDF files
Projects
None yet
Development

No branches or pull requests

3 participants