Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python code is filling out PDF locally but not on AWS Lambda #1948

Closed
momo-ai opened this issue Jul 6, 2023 · 1 comment
Closed

Python code is filling out PDF locally but not on AWS Lambda #1948

momo-ai opened this issue Jul 6, 2023 · 1 comment

Comments

@momo-ai
Copy link

momo-ai commented Jul 6, 2023

I am developing a script to automatically fill out some paperwork I have to do for my day job. I'm trying to scale it to share with my coworkers by using an AWS lambda function and s3 database. The issue is that although the code successfully fills out the PDF on my local implementation, it only fills out one field on AWS.

Code + PDF

Local Code which works

def write_file(state, pdf, input_dict):
    # setup inputs from template
    temp = FORM_TEMPLATE[state]
    for key in temp.keys():
        if key in input_dict:
            temp[key] = input_dict[key]

    reader = PdfReader(pdf)
    writer = PdfWriter()

    page = reader.pages[0]
    fields = reader.get_fields()

    writer.append(reader)
    for page_number in range(len(reader.pages)):
        writer.update_page_form_field_values(
            writer.pages[page_number], temp
        )

    # write "output" to pypdf-output.pdf
    with open("filled-out.pdf", "wb") as output_stream:
        writer.write(output_stream)

AWS Code which seems to not work

def write_file(state, input_dict):
    # load the file
    state_no_spaces = state.replace(" ", "")
    file_path = '/tmp/' + state_no_spaces + '.pdf'
    reader = PdfReader(file_path)
    writer = PdfWriter()
    
    fields = reader.get_form_text_fields()
    temp = {}
    for field in fields.keys():
        if field in input_dict.keys():
            temp[field] = input_dict[field]
    print(fields)
    print(temp)
        
    
    for pagenum in range(len(reader.pages)):
        page = reader.pages[pagenum]
        writer.add_page(page)
        try:
            writer.update_page_form_field_values(writer.pages[pagenum], temp)
        except KeyError as e:
            print(f"Error in updating form field: {e}")
    
    filled_out_file_path = "/tmp/filled-out-" + state_no_spaces + ".pdf"
    with open(filled_out_file_path, "wb") as output_stream:
        writer.write(output_stream)
    return filled_out_file_path

I use the try-except block to call update_page_form_field_values because I otherwise run into the following issue:

{
  "errorMessage": "'/DA'",
  "errorType": "KeyError",
  "requestId": "fa672967-1642-4805-8754-19520721d2b7",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 45, in lambda_handler\n    body += \"lien is located in \" + data_json[\"state\"] + \": \" + process_state(data_json[\"state\"])\n",
    "  File \"/var/task/lambda_function.py\", line 22, in process_state\n    return action()\n",
    "  File \"/var/task/lambda_function.py\", line 13, in <lambda>\n    'NM': lambda: fill_pdf(\"New Mexico\"), # TODO: add parameters to this call\n",
    "  File \"/var/task/fill_pdf.py\", line 105, in fill_pdf\n    write_file(state, pdf, data)\n",
    "  File \"/var/task/fill_pdf.py\", line 84, in write_file\n    writer.update_page_form_field_values(\n",
    "  File \"/opt/python/pypdf/_writer.py\", line 973, in update_page_form_field_values\n    self._update_text_field(writer_annot)\n",
    "  File \"/opt/python/pypdf/_writer.py\", line 840, in _update_text_field\n    cast(str, field[AA.DA]).replace(\"\\n\", \" \").replace(\"\\r\", \" \").split(\" \")\n",
    "  File \"/opt/python/pypdf/generic/_data_structures.py\", line 309, in __getitem__\n    return dict.__getitem__(self, key).get_object()\n"
  ]
}

Nevertheless, my console prints "Error in updating form field: "'/DA'" and only one field of the form is filled out. When I print out the fields JSON, there is no '/DA' entry, even in the one field that is successfully filled out. I'm not entirely sure what the bug is, b ut how can it be that the code works locally but not as a Lambda?

@momo-ai
Copy link
Author

momo-ai commented Jul 6, 2023

Resolved - it turns out if font size is set to "auto" on a field, then it will not be recognized by pypdf

@momo-ai momo-ai closed this as completed Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant