Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating field's default value doesn't update rendered text #84

Open
JanChec opened this issue Apr 4, 2017 · 55 comments
Open

Updating field's default value doesn't update rendered text #84

JanChec opened this issue Apr 4, 2017 · 55 comments

Comments

@JanChec
Copy link

JanChec commented Apr 4, 2017

I have form fields in my PDF (that make it interactive - you can fill them and print with your data). I want to programatically fill those fields based on their names (template.Root.Pages.Kids[x].Annots[y] - name in 'T', default value in 'V'). The problem is that when I do so it's updated in metadata, but the old value is displayed until I edit the PDF in some desktop editor (I can see new default value and it starts to be displayed when I make any change to this field). I'd love it to be updated as well.

Example:

template = pdfrw.PdfReader('template.pdf')
template.Root.Pages.Kids[0].Annots[3].update(pdfrw.PdfDict(V='(test)'))
pdfrw.PdfWriter().write('test.pdf', template)
@pmaupin
Copy link
Owner

pmaupin commented Apr 5, 2017

Yeah, I'd like to have code for that, too. I haven't really looked at how that works yet.

@praveen049
Copy link

Hi
@pmaupin , if you can give me some introduction on how this could to be implemented, i can try to implement it. Thansk

@Sousaplex
Copy link

I dont know if there's been any movement on this, but this would be fantastic. I'll see if I can find any relevant information in the spec. FYI this is the one I'm looking at: https://wwwimages2.adobe.com/content/dam/acom/en/devnet/pdf/PDF32000_2008.pdf

@davidmacneil
Copy link

As a workaround, I was able have the fields show up by setting an empty string to the appearance dictionary (AP):

form = pdfrw.PdfReader(fname)
annotations = form.pages[0]['/Annots']
for annotation in annotations:
    # ... validate / update fields here
    annotation.update(pdfrw.PdfDict(AP=''))

The fields are then visible in Preview (Mac OS 10.13.4), but not Acrobat Reader DC. I suspect that Preview detects the invalid appearance dictionary and sets it to a default value.

@tbbooher
Copy link

I have the same problem. Same experience with preview and the appearance dictionary.

@sevetseh28
Copy link

Did anyone find a solution for this?

@bartmika
Copy link

+1

@jancoow
Copy link

jancoow commented Nov 12, 2018

I still have this problem. I'm trying to populate a few annotate fields. Some readers display the new annotate values correctly, however Adobe Reader leaves them blank.

@Eddiedigits
Copy link

I have 2 documents. Copies of each other. One is a blank Form (A). The other I have filled in the first field with a number and saved it in Acrobat Reader (B). When I open B again the number shows in the field.
If I open both documents in the Python interpreter. I can see B.Root.Pages.Kids[0].Annots[0].V has the value.
If I copy the value of the first Annotation from B to A and pdfWriter it out. It is only visible when the field has focus.
If I copy the whole Annotation from B to A and pdfWriter it out. The value is visible as we all want.
I have compared the 2 versions of the Annotation and the only difference I have found is Annot.AP.N.BBox is a bit different but copying this over to A doesn't help.
The only thing I haven't carefully compared is Annot.P because it seems to be just circular references to the Page information.
The bottom line is. I don't think pdfrw is the problem. There is something else in the PDF which needs to be programatically updated to make this work.

@Eddiedigits
Copy link

If I then open B (with a value added and saved in Acrobat Reader) in the interpreter, change the value of the field and output the PDF. When I open it in Acrobat Reader the original value is still shown, but when I click on the field the NEW value is shown.
I can't find the original value in the Python interpreter but it seems changing the .V attribute is not correct.
Something I don't understand is. When I access the value, saved in Acrobat, in the interpreter it prints with round brackets.

>>> field = doc.Root.AcroForm.Fields[0]
>>> field.V
'(777)'
>>> field.update(pdfrw.PdfDict(V=pdfrw.PdfString('444')))
>>> field.V
'444'

When I change the value. Making sure to use the pdfrw.PdfString object. There are no round brackets. If I try to add the round brackets when creating the value they are escaped and included in the field.

Does someone who knows more about pdfrw than me know what these brackets mean?

@PeterSlezak
Copy link

Characters enclosed in parentheses denotes literal string (type of PDF object).

@Eddiedigits
Copy link

Thanks Peter. If I do pdfrw.PdfString.encode() then I get the brackets. Unfortunately this still doesn't make the value visible.
My best guess at the moment is that Acrobat Reader is moving / copying the value into the PDF text on defocus. This is maybe why I can't find the value, as pdfrw doesn't really give access to the Pdf text.
I'm going to try and dump the text from document B with another library and see if I can find a way forward.
Unless someone knows to decode the String of bytes (not byte string) that comes out of the content.stream?

@jancoow
Copy link

jancoow commented Dec 4, 2018

@Eddiedigits I'm having the exact same issue. PDF's created and filled with pdfrw cannot be opened correctly in Adobe reader, while other PDF readers view them fine. The fields only appear while putting focus on them. See my other issue #158 . Even if I just read a pdf file and write it directly to a new file, without editing anything, all the annotate keys are added recursively. So I believe there is something wrong with the writing process of pdfrw.

@DrLou
Copy link

DrLou commented Dec 4, 2018

@Eddiedigits As am I. Opening the written file in Acrobat, I can only see the written fields - they are there - when focus is placed on them with mouse. Also, this only works for 2 of the 3 fields written. The 3rd, an email address, is apparently not written at all. Weird!
Reader/Form Editor is Acrobat Pro 11.0.3 on macOS.

@PeterSlezak
Copy link

PeterSlezak commented Dec 5, 2018

You need to modify /V and also appearance stream (indirect reference object specified by /AP). /V contains value of the field and /AP specify how to present it.

PDF reference 1.7 page 692

The field’s text is held in a text string (or, beginning with PDF 1.5, a stream) in the V (value) entry of the field dictionary. The contents of this text string or stream are used to construct an appearance stream for displaying the field, as described under “Variable Text” on page 677.

See "Tj" lines in example 8.18, it contains the text that will be displayed as default when you open pdf document (since /AP dictionary contains /N = annotation's normal appearance).

I don't have time right now to investigate if it is possible to easily update appearance stream XObject using pdfrw.

@PeterSlezak
Copy link

I used example pdf from #132 . Code below will add "im field_1 value" to the first text field. Please note that it's just a proof of concept rather than anything else:

from pdfrw import PdfWriter, PdfReader

INVOICE_TEMPLATE_PATH = 'sample-template.pdf'
INVOICE_OUTPUT_PATH = 'sample-output.pdf'

field1value = 'im field_1 value'

template_pdf = PdfReader(INVOICE_TEMPLATE_PATH)
#update first filed, it's assumed that it's text field
template_pdf.Root.AcroForm.Fields[0].V = field1value
#add apearnance stream to display it
template_pdf.Root.AcroForm.Fields[0].AP.N.stream = '''/Tx BMC
BT
 /Helvetica 8.0 Tf
 1.0 5.0 Td
 0 g
 (''' + field1value + ''') Tj
ET EMC'''

PdfWriter().write(INVOICE_OUTPUT_PATH, template_pdf)

See section 5 of PDF reference manual for more text formating/painting options.
When I open sample-output.pdf I can see field 1 text in foxit reader, adobe acrobat 11, chrome. Tested on Windows 10.

@jancoow
Copy link

jancoow commented Dec 7, 2018

I'm trying to update the appearance stream with your code. However, I get an error:
" AttributeError: 'NoneType' object has no attribute 'N' ". I assume that there is no appearance stream available in my field, so I tried creating it with:

            annotation.AP = pdfrw.PdfDict(N=pdfrw.PdfDict(stream='''/Tx BMC
                    BT
                     /Helvetica 8.0 Tf
                     3.0 5.0 Td
                     0 g
                     (''' + value + ''') Tj
                    ET EMC'''))

However this results in disappearing fields in all pdf readers...

@PeterSlezak
Copy link

PeterSlezak commented Dec 7, 2018

You're correct, the error is because no appearnace stream is associated with the field, but you've created it in a wrong way. You've just assigned and stream to AP dictionary. What you need to do is to assign an indirect Xobject to /N in /AP dictionary; and you need to crate Xobject from scratch.
The code should be something like the following, but I haven't tested it as I don't have any such pdf file with me right now and no time to create one. You can post an example pdf:

from pdfrw import PdfWriter, PdfReader, IndirectPdfDict, PdfName, PdfDict

INVOICE_TEMPLATE_PATH = 'untitled.pdf'
INVOICE_OUTPUT_PATH = 'untitled-output.pdf'

field1value = 'im field_1 value'

template_pdf = PdfReader(INVOICE_TEMPLATE_PATH)
template_pdf.Root.AcroForm.Fields[0].V = field1value

#this depends on page orientation
rct = template_pdf.Root.AcroForm.Fields[0].Rect
hight = round(float(rct[3]) - float(rct[1]),2)
width =(round(float(rct[2]) - float(rct[0]),2)

#create Xobject
xobj = IndirectPdfDict(
            BBox = [0, 0, width, hight],
            FormType = 1,
            Resources = PdfDict(ProcSet = [PdfName.PDF, PdfName.Text]),
            Subtype = PdfName.Form,
            Type = PdfName.XObject
            )

#assign a stream to it
xobj.stream = '''/Tx BMC
BT
 /Helvetica 8.0 Tf
 1.0 5.0 Td
 0 g
 (''' + field1value + ''') Tj
ET EMC'''

#put all together
template_pdf.Root.AcroForm.Fields[0].AP = PdfDict(N = xobj)

#output to new file
PdfWriter().write(INVOICE_OUTPUT_PATH, template_pdf)

FYI: /Type, /FormType, /Resorces are optional (/Resources is strongly recomended).
I'm not going to explain the code but if anything unclear just ask or check PDF Reference (all info is there :))

@Eddiedigits
Copy link

@PeterSlezak This works for me. I just changed the font to /TiRo because that's what is already used in my PDF and changed the stream to 1.0 1.0 Td because the number was appearing too high in the Form Field and cutting off the top half of the number.
Thank you very much!!!

@jancoow
Copy link

jancoow commented Jan 9, 2019

@PeterSlezak
Hi. I tried your solution. However, Adobe Acrobat Reader is crashing directly after opening the PDF. Also in any other PDF viewer the values aren't displayed anymore. I've tried exactly your code but it seems not to be working unfortunately.

@PeterSlezak
Copy link

Hi @jancoow,
Share your code and pdf file if possible. Otherwise I cannot help you.

@Keenpachi
Copy link

All data for testing are in under link:
https://bostata.com/post/how_to_populate_fillable_pdfs_with_python/

Which field in PDF array need to by changed to get updated value to appear in new PDF?

@Efk3
Copy link

Efk3 commented Jan 29, 2019

Hi @PeterSlezak,
thank you for your script, it works great. I have only one problem: I need to write latin-2 characters into the input. I attached a font into the pdf which supports characters like Ő and Ű and I used this font for render but I don't know how to write the value into the input.

I got this error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 308-311: ordinal not in range(256)

I'm using this:

xobj.stream = '''/Tx BMC
BT
 /LiberationSerif 12.0 Tf
 1.0 5.0 Td
 0 g
 (''' + value + ''') Tj
ET EMC'''

with value = "ÍŐŰ"

@gpontesss
Copy link

gpontesss commented Jan 30, 2019

Hi @PeterSlezak,
thank you for your script, it works great. I have only one problem: I need to write latin-2 characters into the input. I attached a font into the pdf which supports characters like Ő and Ű and I used this font for render but I don't know how to write the value into the input.

I got this error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 308-311: ordinal not in range(256)

I'm using this:

xobj.stream = '''/Tx BMC
BT
 /LiberationSerif 12.0 Tf
 1.0 5.0 Td
 0 g
 (''' + value + ''') Tj
ET EMC'''

with value = "ÍŐŰ"

I've got the same problem. I think the pdfrw library only deals with ASCII characters, for the message "ordinal not in range(256)". Probably it can't modify it with unicode, even though it's possible by manual typing. A solution for know may be to use reportlab. If someone has something better using pdfrw would be way more appreciated, I believe.

I see that you're not using a unicode string too. try using the following:

xobj.stream = u'''/Tx BMC
BT
 /LiberationSerif 12.0 Tf
 1.0 5.0 Td
 0 g
 ({}) Tj
ET EMC'''.format(value)

@PeterSlezak
Copy link

All data for testing are in under link:
https://bostata.com/post/how_to_populate_fillable_pdfs_with_python/

Which field in PDF array need to by changed to get updated value to appear in new PDF?

@ZarakiiKenpachi
I don't know why it doesn't work on your pdf. I can populate and display value on few fields but not all.

@PeterSlezak
Copy link

Hi @Efk3
I never needed non ASCII characters, but my suggestion would be to use \ddd sequence in literal string where ddd is octal character code; or you can try to use hexadecimal string instead of literal string.
original xobj.stream code snipped will change to:

xobj.stream = '''/Tx BMC
BT
 /Helv 8.0 Tf
 1.0 5.0 Td
 0 g
 <696D206669656C645f312076616C7565> Tj
ET EMC'''

It should display "im field_1 value"

@tlk3
Copy link

tlk3 commented Feb 14, 2019

Or you could use:

pdf_template = pdfrw.PdfReader(infile)
pdf_template.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
pdfrw.PdfWriter().write(outfile, pdf_template)

This adds the NeedAppearances key/value to the AcroForm dict. If I'm understanding your problem correctly.

  • Updated 07/15/2020 to fix the formatting. I glad this has helped so many.

@vincentaudoire
Copy link

@tlk3 Worked for me, thanks!

@dwasyl
Copy link

dwasyl commented Mar 8, 2019

@tlk3 That did the trick for me having the same problem with Adobe not showing the fields.

@tonimarie
Copy link

@tlk3 that totally saved my day. Thank you!

@tbbooher
Copy link

tbbooher commented May 9, 2019

@tlk3 boom! works great

@RuellePaul
Copy link

@tlk3 It works for me too, thank you so much !!

@fiapps
Copy link

fiapps commented May 17, 2019

@tlk3 it works with Adobe Reader, but not with Preview. To get field values to appear in Preview, use the solution above of setting the appearance dictionary for each modified field to an empty string.

@vasmedvedev
Copy link

vasmedvedev commented May 27, 2019

@tlk3 your solution helps very much, thanks! It also works for PyPDF2 in a similar way. However in my case I still have some fields (date field and checkboxes) that remain empty (not rendered). It seems to be a general PDF problem, not pdfrw one.

@Pikafu
Copy link

Pikafu commented Jun 13, 2019

@tlk3 It works! Thank you!

@l47y
Copy link

l47y commented Jul 19, 2019

@tlk3
this saved also my day :-) Thanks alot

@rau
Copy link

rau commented Aug 25, 2019

@tlk3 Thank you! Any clue why in a big dict of items, some of the filled fields show up, and every tenth or so form some just randomly dont appear?

@chdsbd
Copy link

chdsbd commented Sep 1, 2019

TLK3's solution works with Acrobat and macOS Preview, but it doesn't work with PDFjs. If I open a file created this way with Acrobat and save it from there, it will then show the field values in PDFjs.

@alexgarciaguilera
Copy link

alexgarciaguilera commented Jan 14, 2020

Putting all your help on a simple Script, This works for me in Windows 10

#/bin/python

import os
import pdfrw

def writeFillablePDF(input_pdf_path, output_pdf_path, data_dict):
    # Read Input PDF
    template_pdf = pdfrw.PdfReader(input_pdf_path)

    # Set Apparences ( Make Text field visible )
    template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))

    # Loop all Annotations
    for annotation in template_pdf.pages[0]['/Annots']:
        # Only annotations that are Widgets Text
        if annotation['/Subtype'] == '/Widget' and annotation['/T']: 
            key = annotation['/T'][1:-1] # Remove parentheses
            if key in data_dict.keys():
                annotation.update( pdfrw.PdfDict(V=f'{data_dict[key]}') )
                #print(f'={key}={data_dict[key]}=')
    pdfrw.PdfWriter().write(output_pdf_path, template_pdf)

if __name__ == '__main__':

    TEMPLATE_PATH = 'C:/tmp/OrigDoc.pdf'
    OUTPUT_PATH = 'C:/tmp/FilledDoc.pdf'

    # Assuming you know the Text Filed Name in the Document
    # Build dictionaty with Name & Values
    data_dict = {
        'CustomerName': 'Big Company Name',
        'PartNumber': 'PN12345',
        'Revision': '333',
    }

    writeFillablePDF(TEMPLATE_PATH, OUTPUT_PATH, data_dict)

@pmilano1
Copy link

Below is something that I threw together quick, I was able to iterate through and produce individual PDFs just fine, fields seemed visible (slightly different code).

When I added the merge code in order to produce a multi-page PDF containing results of objects in data, it seems to no longer work. Can someone take a quick look to see if I'm handling the merge and setting the appearance workaround properly, based on your experience? It's down low in __main__

Many thanks.

import pdfrw

IN_FILE = "awards.csv"
TEMPLATE_FILE = "template.pdf"
ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'
FIELDS = ["Certificate Category", "Certificate Rank"]
N = 1


# Updates single instance of template pdf, increment form field suffix
def modify_form(input_pdf_path, data_dict):
    global N  # need to get rid of this
    template_pdf = pdfrw.PdfReader(input_pdf_path)
    annotations = template_pdf.pages[0][ANNOT_KEY]
    for annotation in annotations:
        if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
            if annotation[ANNOT_FIELD_KEY]:
                key = annotation[ANNOT_FIELD_KEY][1:-1]
                if key in data_dict.keys():
                    annotation.update(
                        pdfrw.PdfDict(T="{}".format(key + str(N)))
                    )
                    annotation.update(
                        pdfrw.PdfDict(V="{}".format(data_dict[key]))
                    )
                    annotation.update(pdfrw.PdfDict(Ff=1))
    N += 1
    return template_pdf


def build_datadict(in_file):
    o = []
    with open(in_file) as file:
        reader = csv.DictReader(file, delimiter=',')
        for row in reader:
            m = {}
            for f in FIELDS:
                if row[f] and not row[f].isspace() and not row[f] is None:
                    m[f] = row[f]
            if m:
                m['Date'] = "January 25th, 2020"
                o.append(m)
    return o


if __name__ == '__main__':
    data = build_datadict(IN_FILE)
    writer = pdfrw.PdfWriter()
    writer.trailer.Info = pdfrw.IndirectPdfDict(
        Title='Combined PDF'
    )
    # Iterate array of 'data_dict's
    for d in data:
        this_pages = modify_form(TEMPLATE_FILE, d)  # fill the form
        this_pages.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))  # maintain appearances
        writer.addpages(this_pages.pages)  # merge into single pdf
    writer.write(IN_FILE.split(".")[0] + ".pdf")

@dementedhedgehog
Copy link

This is the second time I've bounced of pdfrw because of this issue :( The fixes above don't work for me. I've had to go back to pdftk.

@2Nipun
Copy link

2Nipun commented Feb 19, 2020

Seeing the same issue. PDF form has the values but its not displaying them till I click on these each field in a viewer. The moment I click out it goes away. Viewing the PDF in Mac on both Preview and Acrobat Reader & Pro. So in pro the form field still shows as unfilled (ie it has that blue color indicator of a unfilled form field).

So I guess I need to look at pdftk or some other solution beyond pdfrw?

@starlabs007
Copy link

@pmilano1 Yours is a slightly different issue (see here: #171) and it's regarding merging PDFs.

For anyone else reading this and finding that setting the Acroform / NeedAppearances doesn't work in Acrobat, verify that you're not merging pdf files. It seems the Acroform node is lost during the merging process when the concatenated pdf is written out. There's a Stack Overflow link that has working code that addresses this in the link above.

@cemoga
Copy link

cemoga commented Apr 27, 2020

@tlk3 you are the best. It worked for Acrobat

@cemoga
Copy link

cemoga commented Apr 27, 2020

@davidmacneil Your solution works perfectly for preview in Mac. Thank you!

@sazedulhaque
Copy link

@tlk3 Thank you buddy

@DimitrisAthanasiadis
Copy link

I had a rendering problem with my fields and I've been trying for a lot of hours to solve it. I used your help from here and the holy Stack Overflow but the problem remained. I decided to leave the AP as blank (AP='') when it was not present in the file just to see what happens. I also used Foxit Reader to open the file and everything was perfect. Even printed the pages on paper and it was correct. The same with the browser PDF reader. BUT the Adobe Acrobat did not render the text until I clicked the field and when I previewed the pages for printing, the fields were blank. Does anyone know what doesn't work well with Acrobat? Is something special needed to work properly with Adobe?

@cemoga
Copy link

cemoga commented Aug 4, 2020 via email

@TyrGo
Copy link

TyrGo commented Jan 20, 2021

I'm having the same problem as others here. Everything appears fine in Preview. But Adobe doesn't display the fields till clicked. The solutions above don't seem to me to fix that. Anyone solved that yet?

@summerswallow-whi
Copy link

I found this blog: (https://medium.com/@vivsvaan/filling-editable-pdf-in-python-76712c3ce99) and corresponding repo (https://github.com/vivsvaan/filling_editable_pdf_python). It seems to work at least on Reader DC and chrome. I did notice that some fields don't appear filled in on preview, but that could just be me. I only tried the code an hour ago.

@vijeshkpaei
Copy link

vijeshkpaei commented Aug 31, 2021

Please make sure input pdf is flattern****

@misokol-earthlink
Copy link

I have a pdf form filler using pdfrw and it almost works. I can fill out the form from my custom dictionary to replace blank text fields from the template form which happens to be IRS F941. But the saved from does not display the saved entries even though I have used the code to update the NeedAppearances suggested by many. My script concludes by reopening the saved file and dumps out the values that were saved but invisible so the substitution code worked. Further, when I open the form with a PDF editor, in addition to not being able to see any of the field values, when I click on any field, I get a message saying I cannot make any changes and resave the file which I was intending to do to handle the check boxes which I have not yet coded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests