PdfFileWriter addBookmark DictionaryObject issue #264

LightningMan711 · 2016-05-23T17:18:57Z

I have a bookmark level swapping program that I wrote using PdfFileMerge but I would like to use PdfFileWriter instead, since it allows me to target bookmarks to a specific area on the page, which is how the original, unswapped bookmarks are targeted.

I tried using the revision of cloneDocumentFromReader at this link to retain the original tree (which for my purposes needs to be done) and while that worked fine (no empty pages), addBookmark when given a dummy bookmark (and not all the data I really want to pass) throws an error.

The code:

#Reading the original file
original = PdfFileReader(file("[redacted].pdf", "rb"))

#Creating a dictionary of page number meanings
decode = MapPDFPageNum(original)

#Getting the bookmarks
stacker = unravel(original.getOutlines(),decode)

#Creating the By Domain set
newBk = swap(stacker,2)


#I would like to use PdfFileWriter so as to aim destinations but I cannot
#figure out the syntax.
#--------
new = PdfFileWriter()
#the code I would use to add the bookmarks
#for bk in newBk:
    #print bk[1]
    #new.addBookmark("test",0)
    #if len(bk[7])<1:
        #print bk[1] + " " + str(bk[4]) + bk[2]
        #new.addBookmark(bk[1],bk[4])
    #else:
        #new.addBookmark(bk[1],bk[4],bk[7])
new.cloneDocumentFromReader(original)
#The test bookmark
new.addBookmark("test",0)
new.setPageMode("/UseOutlines")
outputStream = file("Clone.pdf","wb")
new.write(outputStream)
outputStream.close()

The error I get is this:

Traceback (most recent call last):
  File "C:\Users\[redacted]\BMTester2.py", line 117, in <module>
    new.addBookmark("test",0)
  File "C:\Python27\lib\site-packages\PyPDF2\pdf.py", line 848, in addBookmark
    parent.addChild(bookmarkRef, self)
AttributeError: 'DictionaryObject' object has no attribute 'addChild'

It appears that the addBookmark code is expecting a TreeObject but parent is a DictionaryObject .

The addBookmark method works fine when there are no existing bookmarks. It's only when there is an existing tree that this is an issue, and, as you recall, that is the point, to retain the original bookmarks.

Any help would be appreciated.

The text was updated successfully, but these errors were encountered:

rwirth · 2017-08-16T18:06:20Z

The support for cloning and editing seems to be rudimentary at best. The error you get is because the PDF file contains an Outlines tree, which is represented as a DictionaryObject, not a TreeObject. During reading, one cannot decide whether a Dictionary is a Tree or not because all attributes are optional for the leaves. I've tried to fix that by walking the outlines tree and changing the class of each node to a Tree in PdfFileWriter.getOutlineRoot.

if not isinstance(outline, TreeObject):
    def _walk(node):
        node.__class__ = TreeObject
        for child in node.children():
            _walk(child)
    _walk(outline)

Worse, all indirect references that link the tree nodes together are still pointing to the reader that was cloned from. The objects themselves are not part of the writer's _objects and one cannot obtain references to them, which prevents the creation of new tree nodes. But before writing, all objects are copied and the indirect references rewritten so that they point to the new objects. As an ugly workaround, we can thus call write for its side effects:

new.write(BytesIO())
new.addBookmark("test", 0) # works now

LightningMan711 · 2017-08-29T20:02:43Z

So write to BytesIO, add the bookmark, then write to outputStream, correct?

LightningMan711 · 2017-08-30T05:17:09Z

Okay, this almost worked, so let me tell you what I did to make it work. First, I put your getOutlineRoot code in below everything in getOutlineRoot except the return:

       # start here
        if not isinstance(outline, TreeObject):
            def _walk(node):
                node.__class__ = TreeObject
                for child in node.children():
                    _walk(child)
            _walk(outline)
        # end here

I then added the new.write(BytesIO()) code to my script (after importing BytesIO from io):

new = PdfFileWriter()
new.cloneDocumentFromReader(original)
new.write(BytesIO())
#new.addBookmark("test",0)

It wrote the test bookmark fine. But when I un-commented the bookmark swapping code in the original post to create the swapped nested bookmarks, it couldn't do the parent.addChild function, since it was being passed a unicode string and not the bookmark object itself. So back in pdf.py I altered the addBookmark text in two ways. I created an exception for a root level bookmark to write as originally written, but otherwise, I had it drill down the outline until it found the bookmark it was looking for and then wrote the child bookmark.

The new recursive function (defined inside of the addBookmark definition, just below the parameter explanation:

        # New function to drill recursively for bookmarks
        def drillDown(dictObj, daddy, tuck):
             huntObj = dictObj
             for kidObj in huntObj:
                  if daddy in kidObj.itervalues():
                       kidObj.addChild(tuck, self)
                  else:
                       drillDown(kidObj, daddy, tuck)

And the rewritten parent section at the bottom of the addBookmark defnition:

        # Added by me
        if parent != outlineRef:
            drillDown(outlineRef, parent, bookmarkRef)
        else:
            parent = parent.getObject()
            parent.addChild(bookmarkRef, self)

        return bookmarkRef

If you have any questions, please ask.

MartinThoma · 2022-07-09T14:31:25Z

@LightningMan711 Do you know if there is anything left to do for this one? What exactly is the issue?

MartinThoma · 2023-03-23T05:46:34Z

I assume the issue was solved. Please comment if it still exists with recent pypdf versions :-)

Firestar-Reimu · 2023-10-01T15:46:52Z

Still have this issue with pypdf 3.15.5

https://pastebin.com/AgnYe8zu Line 51

AttributeError: 'DictionaryObject' object has no attribute 'insert_child'

stefan6419846 · 2023-10-01T15:57:28Z

Please open a new issue with example code and a reproducing PDF file as well as the full traceback.

LightningMan711 mentioned this issue Mar 9, 2018

Rebooting PyPDF2 Maintenance #385

Closed

MartinThoma added PdfWriter The PdfWriter component is affected is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Apr 7, 2022

MartinThoma added the workflow-bookmarks From a users perspective, bookmarks is the affected feature/workflow label Apr 22, 2022

MartinThoma closed this as completed Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PdfFileWriter addBookmark DictionaryObject issue #264

PdfFileWriter addBookmark DictionaryObject issue #264

LightningMan711 commented May 23, 2016 •

edited by MartinThoma

Loading

rwirth commented Aug 16, 2017

LightningMan711 commented Aug 29, 2017

LightningMan711 commented Aug 30, 2017

MartinThoma commented Jul 9, 2022

MartinThoma commented Mar 23, 2023 •

edited

Loading

Firestar-Reimu commented Oct 1, 2023 •

edited

Loading

stefan6419846 commented Oct 1, 2023

PdfFileWriter addBookmark DictionaryObject issue #264

PdfFileWriter addBookmark DictionaryObject issue #264

Comments

LightningMan711 commented May 23, 2016 • edited by MartinThoma Loading

rwirth commented Aug 16, 2017

LightningMan711 commented Aug 29, 2017

LightningMan711 commented Aug 30, 2017

MartinThoma commented Jul 9, 2022

MartinThoma commented Mar 23, 2023 • edited Loading

Firestar-Reimu commented Oct 1, 2023 • edited Loading

stefan6419846 commented Oct 1, 2023

LightningMan711 commented May 23, 2016 •

edited by MartinThoma

Loading

MartinThoma commented Mar 23, 2023 •

edited

Loading

Firestar-Reimu commented Oct 1, 2023 •

edited

Loading