Skip to content

Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.

Notifications You must be signed in to change notification settings

justmars/start-ocr

Repository files navigation

start-ocr

Github CI

  1. Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.
  2. pdfplumber's page.extract_text_lines() is experimental and thus can work or not depending on the pdf file.
  3. See documentation.

Installation

just start

About

Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.

Resources

Stars

Watchers

Forks