Skip to content
obeattie edited this page Sep 13, 2010 · 4 revisions

Contains objects for interpreting a PDF document.

function process_pdf(rsrc, device, fp, pagenos=None, maxpages=0, password='')

A helper function, abstracting away the need to interact directly with PDFDocument, PDFParser, and PDFPageInterpreter.

rsrc should be a PDFResourceManager instance
device should be an instance of pdfdevice.PDFDevice (or a subclass like converter.TagExtractor)
fp should be a file instance of the input PDF
pagenos should be an iterable containing zero-indexed page numbers to be processed
maxpages specifies the maximum number of pages to be processed (all pages are processed if unspecified)
password should be a string of the password with which the PDF is protected (if required)

class PDFResourceManager

Facilitates reuse of shares resources such as fonts and images, so that large objects are not allocated more than once.