Skip to content

jaelin215/wesplit-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Faded, Torn, Rotated Receipt OCR with Image Preprocessing

As part of KaggleX BIPOC Mentorship Program-Cohort2 (Dec 2022 - Mar 2023), I created a Receipt OCR Web App called WeSplit. This app utilizes Tesseract OCR engine, an open source library, and custom preprocessing steps that I designed to improve the quality of the input image. The subset of the code is shared here to demonstrate the preprocessing steps and its positive impact to the OCR output.

  • Author: Jaelin Lee
  • Date: Mar 18, 2023
  • LinkedIn

Installation

  1. Download this repository to your local machine
  2. Open the project directory
  3. In Terminal, run automated script to install packages
  • chmod +x install_packages.sh
  • sh install_packages.sh
  1. Create folders under the project directory
  • raw
  • preprocessed
  • output

Input

  • Add a scanned receipt (i.e. .JPG, .PNG) to raw folder

Run

  • python3 run.py

Output

  • Enhanced receipt (enhanced.jpg) is saved in preprocessedfolder
  • OCR text ourput (enhanced.txt) is saved in output folder

Enjoy!

Faded Receipt:

Screenshot 2023-03-18 at 6 48 51 PM

Rotated / Crumbled Receipt:

Screenshot 2023-03-18 at 6 56 52 PM

Torn Receipt:

Screenshot 2023-03-18 at 6 54 17 PM

Preprocessing Components

Screenshot 2023-03-18 at 6 59 03 PM

Reference:

License:

  • MIT License
  • I would appreciate if you quote my name and the link to this GitHub repo if you reference this code. Thanks!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published