Skip to content

jasonlimcp/document_ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR on documents using OpenCV and PyTesseract

Last Update: Mar 2021

About

This project simulates a common use case in modern organizations - digitizing large volumes of documents, specifically ID documents such as PDFs of passports or drivers' licenses. For images of reasonable image quality and resolution, Tesseract's OCR engine can parse segments of the document into a tabular output.

The example use case in this repo, for a Passport image, is elaborated on in this Medium article.

Package Requirements

pytesseract
opencv-python

PyTesseract runs on the Tesseract-OCR engine, which is required to be installed on the host system or server for the package to function. Documentation and downloads for the Tesseract-OCR project can be found here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages