Skip to content

Inserting Pages from other PDFs

Jorj X. McKie edited this page Nov 19, 2019 · 10 revisions

The recipe below the line is no longer required since v1.16.8: PyMuPDF is now a module and thus supports execution vie the commandline. To join arbitrary number of pages from several PDFs execute the following command:

python -m fitz join -o output.pdf -password <of output> input1 input2 ...

Specify each input file like so: filename[,password[,pages]].

To join the complete input, just specify the filename with an eventual password comma separated. If you only want specific pages, specify them 1-based, either as single integers or as a range "m-n", comma-separated from each other. To specify the last page, you can use the symbolic name "N" (capital N). Numbers / ranges can be in any sequence, non-unique and / or overlapping. If m > n for a range "m-n", then the pages are copied in reversed sequence.

For example consider joining the files

  1. file1.pdf: all pages, but back to front, no password
  2. file2.pdf: last page, first page, password: "secret"
  3. file3.pdf: pages 5 to last, no password

specify this and forget the rest of this Wiki:

python -m fitz join -o output.pdf file1.pdf,,N-1 file2.pdf,secret,N,1 file3.pdf,,5-N

Method insertPDF()

Method fitz.Document.insertPDF() allows you to insert page ranges from another PDF document. Usage looks like this:

doc1 = fitz.open("file1.pdf") # must be a PDF
doc2 = fitz.open("file2.pdf") # must be a PDF
doc1.insertPDF(
               doc2,          # cannot be the same object as doc1
               from_page=n,   # first page to copy, default: 0
               to_page=m,     # last page to copy, default: last page
               start_at=k,    # target location in doc1, default: at end
               rotate=deg,    # rotate copied pages
               links=True,    # also copy links
               annots=True,   # also copy annotations
    )

Except doc2, all parameters are optional.

Remarks

This makes available the MuPDF CLI tool mutool merge to Python. In technical PDF terms, for every page object, /Contents, /Resources, /MediaBox, /CropBox, /BleedBox, /TrimBox, /ArtBox, /Rotate, /UserUnit, /Annots are copied.

Bookmarks / outlines of doc2 are not copied. But the TOC structure of doc1 will remain intact with the copy operation.

In PyMuPDF we have extended the copy scope in the following way:

  1. Links are copied if they point to pages in the copy range, or to some outside resource.
  2. Optionally rotate copied pages.
  3. doc1 and doc2 must not be the same object, but may be the same file (opened twice under different objects)

Obviously, from_page may equal to_page - then only one page is copied.

Less obvious: if you specify from_page > to_page (!), then the range is copied back to front.

It is quite easy to create joined tables of content (TOC) when concatenating complete files - see below. For a more sophisticated solution look at this example. It can join arbitrary ranges of PDF files together with their respective TOC pieces.

Examples

This will concatenate two PDFs, also joining their tables of content:

len1 = len(doc1)                      # number of doc1 pages
toc1 = doc1.getToC(False)             # full TOC of doc1
toc2 = doc2.getToC(False)             # full TOC of doc2
for bm in toc2:                       # bookmarks of doc2 ...
    bm[2] += len1                     # need increased page numbers
toc = toc1 + toc2                     # concatenate full TOC's
doc1.insertPDF(doc2)                  # concatenate PDFs
doc1.setToC(toc)                      # new TOC

Copy pages 10 to 20 from some PDF, but rotated, in reversed order and in front of the doc1 pages:

doc1.insertPDF(
               doc2,
               from_page=20,
               to_page=10,
               start_at=0,
               rotate=-90,
    )

This snippet will create a new PDF from the last pages of a bunch of input files. Please especially note how we specify those last pages:

>>> import fitz
>>> flist = ("1.pdf", "2.pdf", "3.pdf", "4.pdf",)
>>> doc = fitz.open()
>>> for f in flist:
	infile = fitz.open(f)
        lastPage = len(infile) - 1
	doc.insertPDF(infile, from_page=lastPage, to_page=lastPage, rotate=90)
        infile.close()
>>> doc.save("out.pdf", deflate=True, garbage=3)
Clone this wiki locally