OCR on documents using OpenCV and PyTesseract

Last Update: Mar 2021

About

This project simulates a common use case in modern organizations - digitizing large volumes of documents, specifically ID documents such as PDFs of passports or drivers' licenses. For images of reasonable image quality and resolution, Tesseract's OCR engine can parse segments of the document into a tabular output.

The example use case in this repo, for a Passport image, is elaborated on in this Medium article.

Package Requirements

pytesseract
opencv-python

PyTesseract runs on the Tesseract-OCR engine, which is required to be installed on the host system or server for the package to function. Documentation and downloads for the Tesseract-OCR project can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
ocr.py		ocr.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

ocr.py

ocr.py

requirements.txt

requirements.txt

Repository files navigation

OCR on documents using OpenCV and PyTesseract

About

Package Requirements

About

Releases

Packages

Languages

jasonlimcp/document_ocr

Folders and files

Latest commit

History

Repository files navigation

OCR on documents using OpenCV and PyTesseract

About

Package Requirements

About

Resources

Stars

Watchers

Forks

Languages