PyTesseract - Simple Python Optical Character Recognition

Prerequisites

Kindly ensure you have the following installed on your machine:

Python 3 and above, Tesseract, Regexp, OpenCV, Datefinder, An IDE or Editor of your choice

#Simple ocr model for text extraction from the image file.

In this project I am using Pytesseract model for text extarction from image and this extracted file from the image is passed into the regular expression(for pre defined format)

extracted text is comapred with the regular expression date formats.if its matching to particular date for format and it will be printed

This is passed to front end using simple html file and flask framework and this is deployed in ubuntu server on AWS for remote access

My observations

This can can able to predict most date file formats , And some times it fails to predict because of blur images,dark images and handwritten images.

Image quality improvement

I tried to improve images by converting to black and white(using open cv),giving sharpness to image ,parsing particular portion of cropped image

By doing this kind of image tuning acuracy got increased 6 to 8% , Now accuracy level is about 55 to 58%

future changes can be made

still Accuracy can be increased by using deeplearning models on ocr recognition. To do this require lot of images to train the model to get better accuracy . This will learn the pattern and gives better accuracy by training image again and agian.So this will best approach to get better acuracy

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
templates		templates
.gitignore		.gitignore
Pipfile.lock		Pipfile.lock
README.md		README.md
app.py		app.py
date_finder_experiment.py		date_finder_experiment.py
ocr_core.py		ocr_core.py
procfile		procfile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyTesseract - Simple Python Optical Character Recognition

Prerequisites

My observations

Image quality improvement

future changes can be made

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sapthasv/Text_extraction_from_image

Folders and files

Latest commit

History

Repository files navigation

PyTesseract - Simple Python Optical Character Recognition

Prerequisites

My observations

Image quality improvement

future changes can be made

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages