Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filled form fields are gone when merging - in some PDF viewer, sometimes #506

Closed
wmoskal opened this issue Jul 5, 2019 · 6 comments
Closed
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfMerger The PdfMerger component is affected workflow-forms From a users perspective, forms is the affected feature/workflow

Comments

@wmoskal
Copy link

wmoskal commented Jul 5, 2019

I am using a fillable pdf that has a number of fields as a template, and there is an unknown number of individual pdfs. When I am attempting to merge the pdfs with the entered form data, it is giving a really weird condition where the merged pdf is showing different content depending on what context it is viewed in. If I view the code in Okular (debian linux PDF viewer) it shows no form fields and basically just a flattened pdf with no data. If I view the pdf on windows, in chrome it displays the propper field data for the first page, but then all the subsequent pages just contain duplicates of the form data found on the first page. If I view the page on windows, in Microsoft Edge, it displays the correct result for all pages.

The source code of the merged pdf seems to contain all the correct data, regardless of where it is viewed. I am not sure if this is an issue with PyPDF2, with the pdf itself, or with the browser/viewers. Any Help would be greatly appreciated

@Redjumpman
Copy link

Hey wmoskal. I just spent the last few days pulling my hair out with the exact same problem. I saw this issue in hopes that there was a solution, but was devastated to see you never got a reply.

HOWEVER. I was able to determine the solution. It's over a year late for you, but maybe the next poor soul that comes across this will be saved some heartache.

def set_need_appearances_writer(writer: PdfFileWriter):
    try:
        catalog = writer._root_object
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
            })

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

for idx, row in enumerate(data, 1): # In my case, data was pulled from SQLAlchemy
    first_page = myfile.getPage(0) # First page in first pdf
    writer.addPage(first_page)
        set_need_appearances_writer(writer)
        writer.updatePageFormFieldValues(first_page, fields=fields)

        for j in range(0, len(first_page['/Annots'])):
            writer_annot = first_page['/Annots'][j].getObject()
            for field in fields:
                if writer_annot.get('/T') == field:
                    writer_annot.update({
                        NameObject("/T"): createStringObject(writer_annot.get('/T') + f'#{idx}')}) # Change the field name
                    writer_annot.update({
                        NameObject("/Ff"): NumberObject(1)  # make field Read Only
                    })

        with open("path",
                  "wb") as new:
            writer.write(new)

This takes a pdf that I use as a template, fills out the form, then saves the pdf. You have to rename the fields because once merged, all the pdfs in the final version have the same field names and we want to avoid that. I then flatten the fields using a bit shift to make it read only. Once you have your pdfs filled out and saved. THEN you can merge them normally. Also the set_need_appearances_writer is necessary to make the fields visible.

@paulzuradzki
Copy link

@Redjumpman - I am the "next poor soul". Thank you very much for sharing.

Before encountering your recipe, I had encountered the set_appearances() snippet and the method of updating form field bit to 1 for read-only. I still experienced merge errors due to the documents sharing the same field names. Your post saved a lot of trouble (after much research) on how to update those field names to be unique per document. I am curious how pdtfk gets around this. When using the flatten command line option, the resulting PDF is able to be merged. Anyway, this is great to know we can do this sort of data-driven form-filling and merging of templates in PyPDF2.

@MartinThoma MartinThoma added the PdfMerger The PdfMerger component is affected label Apr 7, 2022
@MartinThoma
Copy link
Member

Do you have a PDF that shows this issue?

@MartinThoma
Copy link
Member

This might have the same root cause as #355

@MartinThoma MartinThoma changed the title Issues with merging fillable pdfs Filled form fields are gone when merging - in some PDF viewer, sometimes Apr 23, 2022
@MartinThoma MartinThoma added the workflow-forms From a users perspective, forms is the affected feature/workflow label Apr 23, 2022
@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Jun 26, 2022
@MartinThoma
Copy link
Member

I'm closing this as a duplicate of #355

@nantaphop-kkp
Copy link

@Redjumpman I'm a poor soul that you saved today 😭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfMerger The PdfMerger component is affected workflow-forms From a users perspective, forms is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

5 participants