start-ocr Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files. pdfplumber's page.extract_text_lines() is experimental and thus can work or not depending on the pdf file. See documentation. Installation just start