Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfFileWriter addBookmark DictionaryObject issue #264

Closed
LightningMan711 opened this issue May 23, 2016 · 7 comments
Closed

PdfFileWriter addBookmark DictionaryObject issue #264

LightningMan711 opened this issue May 23, 2016 · 7 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfWriter The PdfWriter component is affected workflow-bookmarks From a users perspective, bookmarks is the affected feature/workflow

Comments

@LightningMan711
Copy link

LightningMan711 commented May 23, 2016

I have a bookmark level swapping program that I wrote using PdfFileMerge but I would like to use PdfFileWriter instead, since it allows me to target bookmarks to a specific area on the page, which is how the original, unswapped bookmarks are targeted.

I tried using the revision of cloneDocumentFromReader at this link to retain the original tree (which for my purposes needs to be done) and while that worked fine (no empty pages), addBookmark when given a dummy bookmark (and not all the data I really want to pass) throws an error.

The code:

#Reading the original file
original = PdfFileReader(file("[redacted].pdf", "rb"))

#Creating a dictionary of page number meanings
decode = MapPDFPageNum(original)

#Getting the bookmarks
stacker = unravel(original.getOutlines(),decode)

#Creating the By Domain set
newBk = swap(stacker,2)


#I would like to use PdfFileWriter so as to aim destinations but I cannot
#figure out the syntax.
#--------
new = PdfFileWriter()
#the code I would use to add the bookmarks
#for bk in newBk:
    #print bk[1]
    #new.addBookmark("test",0)
    #if len(bk[7])<1:
        #print bk[1] + " " + str(bk[4]) + bk[2]
        #new.addBookmark(bk[1],bk[4])
    #else:
        #new.addBookmark(bk[1],bk[4],bk[7])
new.cloneDocumentFromReader(original)
#The test bookmark
new.addBookmark("test",0)
new.setPageMode("/UseOutlines")
outputStream = file("Clone.pdf","wb")
new.write(outputStream)
outputStream.close()

The error I get is this:

Traceback (most recent call last):
  File "C:\Users\[redacted]\BMTester2.py", line 117, in <module>
    new.addBookmark("test",0)
  File "C:\Python27\lib\site-packages\PyPDF2\pdf.py", line 848, in addBookmark
    parent.addChild(bookmarkRef, self)
AttributeError: 'DictionaryObject' object has no attribute 'addChild'

It appears that the addBookmark code is expecting a TreeObject but parent is a DictionaryObject .

The addBookmark method works fine when there are no existing bookmarks. It's only when there is an existing tree that this is an issue, and, as you recall, that is the point, to retain the original bookmarks.

Any help would be appreciated.

@rwirth
Copy link

rwirth commented Aug 16, 2017

The support for cloning and editing seems to be rudimentary at best. The error you get is because the PDF file contains an Outlines tree, which is represented as a DictionaryObject, not a TreeObject. During reading, one cannot decide whether a Dictionary is a Tree or not because all attributes are optional for the leaves. I've tried to fix that by walking the outlines tree and changing the class of each node to a Tree in PdfFileWriter.getOutlineRoot.

if not isinstance(outline, TreeObject):
    def _walk(node):
        node.__class__ = TreeObject
        for child in node.children():
            _walk(child)
    _walk(outline)

Worse, all indirect references that link the tree nodes together are still pointing to the reader that was cloned from. The objects themselves are not part of the writer's _objects and one cannot obtain references to them, which prevents the creation of new tree nodes. But before writing, all objects are copied and the indirect references rewritten so that they point to the new objects. As an ugly workaround, we can thus call write for its side effects:

new.write(BytesIO())
new.addBookmark("test", 0) # works now

@LightningMan711
Copy link
Author

So write to BytesIO, add the bookmark, then write to outputStream, correct?

@LightningMan711
Copy link
Author

Okay, this almost worked, so let me tell you what I did to make it work. First, I put your getOutlineRoot code in below everything in getOutlineRoot except the return:

       # start here
        if not isinstance(outline, TreeObject):
            def _walk(node):
                node.__class__ = TreeObject
                for child in node.children():
                    _walk(child)
            _walk(outline)
        # end here

I then added the new.write(BytesIO()) code to my script (after importing BytesIO from io):

new = PdfFileWriter()
new.cloneDocumentFromReader(original)
new.write(BytesIO())
#new.addBookmark("test",0)

It wrote the test bookmark fine. But when I un-commented the bookmark swapping code in the original post to create the swapped nested bookmarks, it couldn't do the parent.addChild function, since it was being passed a unicode string and not the bookmark object itself. So back in pdf.py I altered the addBookmark text in two ways. I created an exception for a root level bookmark to write as originally written, but otherwise, I had it drill down the outline until it found the bookmark it was looking for and then wrote the child bookmark.

The new recursive function (defined inside of the addBookmark definition, just below the parameter explanation:

        # New function to drill recursively for bookmarks
        def drillDown(dictObj, daddy, tuck):
             huntObj = dictObj
             for kidObj in huntObj:
                  if daddy in kidObj.itervalues():
                       kidObj.addChild(tuck, self)
                  else:
                       drillDown(kidObj, daddy, tuck)

And the rewritten parent section at the bottom of the addBookmark defnition:

        # Added by me
        if parent != outlineRef:
            drillDown(outlineRef, parent, bookmarkRef)
        else:
            parent = parent.getObject()
            parent.addChild(bookmarkRef, self)

        return bookmarkRef

If you have any questions, please ask.

@MartinThoma MartinThoma added PdfWriter The PdfWriter component is affected is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Apr 7, 2022
@MartinThoma MartinThoma added the workflow-bookmarks From a users perspective, bookmarks is the affected feature/workflow label Apr 22, 2022
@MartinThoma
Copy link
Member

@LightningMan711 Do you know if there is anything left to do for this one? What exactly is the issue?

@MartinThoma
Copy link
Member

MartinThoma commented Mar 23, 2023

I assume the issue was solved. Please comment if it still exists with recent pypdf versions :-)

@Firestar-Reimu
Copy link

Firestar-Reimu commented Oct 1, 2023

Still have this issue with pypdf 3.15.5

https://pastebin.com/AgnYe8zu Line 51

AttributeError: 'DictionaryObject' object has no attribute 'insert_child'

@stefan6419846
Copy link
Collaborator

Please open a new issue with example code and a reproducing PDF file as well as the full traceback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfWriter The PdfWriter component is affected workflow-bookmarks From a users perspective, bookmarks is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

5 participants