Skip to content

How to Maintain PDF Links

Jorj X. McKie edited this page Apr 3, 2017 · 3 revisions

This shows new Page methods at work that maintain a PDF page's set of links.

Background

Links are pointers to some other places including things like

  1. other locations in the same document
  2. locations in another PDF file
  3. other general files (executed on hot area clicks)
  4. internet addresses (accessed on hot area clicks)

Links of a page can usually be recognized by changes of the cursor appearance - from an arrow to a pointing hand. The spot where this happens is often called "hot area". It is a rectangular area surrounding e.g. a text or an image.

Implementation in PyMuPDF / MuPDF

As always when dealing with document changes, these methods are restricted to PDF files.

MuPDF handles link annotations differently from other annotation types. Because of its technical implications we have decided to also let PyMuPDF handle links in a different way. The implementation works as follows:

  • there are three methods dealing with links: insertLink(), deleteLink() and updateLink().
  • each method accepts one parameter, which is a dictionary of values describing the link destination.
  • The dictionary has the same format as the entries of getLinks() and the fourth component of entries in getToC(simple = False). It contains almost the same information as the linkDestobject - only in a more readable and desambiguised format.

Example Session

We read a PDF document page and all of its links. We then delete the last link, change the first link and finally insert a new link. To see a fully functional GUI supporting this, have a look at this example, it presents an interface like this.

>>> import fitz
>>> doc = fitz.open("pymupdf.pdf")
>>> page = doc[6]                   # this page contains 4 links
>>> lnks = page.getLinks()
>>> for l in lnks:
        print(l)
{'kind': 2, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'uri', 'uri': 'https://github.com/rk700/PyMuPDF'}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri',
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
{'kind': 2, 'xref': 1059, 'from': fitz.Rect(383.579, 526.211,
430.582, 537.961), 'type': 'uri', 'uri': 'https://en.wikipedia.org/wiki/MuPDF'}
>>> 
>>> #-------------------------------------------------------------------------------------------------
>>> # delete last link on page
>>> #-------------------------------------------------------------------------------------------------
>>> l = lnks[-1]
>>> page.deleteLink(l)
>>> # retrieve all links again to show it is gone
>>> for l in page.getLinks():
    print(l)
{'kind': 2, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'uri', 'uri': 'https://github.com/rk700/PyMuPDF'}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri',
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
>>>
>>> #-------------------------------------------------------------------------------------------------
>>> # now change first link to point to somewhere on page 1 of same file
>>> #-------------------------------------------------------------------------------------------------
>>> l = lnks[0]
>>> l["kind"] = fitz.LINK_GOTO
>>> l["page"] = 1
>>> l["to"] = fitz.Point(100, 200)
>>> page.updateLink(l)
>>> # again demonstrate what happened
>>> for l in page.getLinks():
    print(l)
{'kind': 1, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'goto', 'page': 1, 'to': fitz.Point(100.0, 200.0), 'zoom': 0.0}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri', 
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
>>>
>>> #-------------------------------------------------------------------------------------------------
>>> # now recreate the deleted link to open another file
>>> #-------------------------------------------------------------------------------------------------
>>> l = lnks[3]
>>> l["kind"] = fitz.LINK_LAUNCH
>>> l["file"] = "some.file"
>>> page.insertLink(l)
>>> for l in page.getLinks():
    print(l)
{'kind': 1, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'goto', 'page': 1, 'to': fitz.Point(100.0, 200.0), 'zoom': 0.0}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri',
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
{'kind': 3, 'xref': 1251, 'from': fitz.Rect(383.579, 526.211,
430.582, 537.961), 'type': 'launch', 'file': 'some.file'}
Clone this wiki locally