preprocess.py extractpdf does not work with current version of pdfminer #370

BikeMikeAU · 2016-02-19T02:02:56Z

The API of pdfminer has changed in newer versions, requiring a small change in preprocess.py.

change the imports
remove process_pdf, replace with new code (3 lines)

https://github.com/euske/pdfminer#api-changes

diff of preprocess.py

@@ -323 +323,2 @@
-    from pdfminer.pdfinterp import PDFResourceManager, process_pdf
+    from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
+    from pdfminer.pdfpage import PDFPage
@@ -402 +403,3 @@
-        process_pdf(rsrcmgr, device, pdf_stream, pagenos=set(), password=password, caching=True, check_extractable=True)
+        interpreter = PDFPageInterpreter(rsrcmgr, device)
+        for page in PDFPage.get_pages(pdf_stream, password=password, caching=True, check_extractable=True):
+            interpreter.process_page(page)

The text was updated successfully, but these errors were encountered:

eppye-bots closed this as completed Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocess.py extractpdf does not work with current version of pdfminer #370

preprocess.py extractpdf does not work with current version of pdfminer #370

BikeMikeAU commented Feb 19, 2016

preprocess.py extractpdf does not work with current version of pdfminer #370

preprocess.py extractpdf does not work with current version of pdfminer #370

Comments

BikeMikeAU commented Feb 19, 2016