Skip to content

WebApp for Extracting Text from Images and applying threshold with the help of pytesseract and OpenCV.

Notifications You must be signed in to change notification settings

rohankokkula/TEATH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TEATH - Text Extraction And THresholding

Watch demo

Demonstration on Youtube

A Streamlit-Heroku WebApp for Extracting Text from Images and applying various thresholding methods using Pytesseract and OpenCV.

Ranked 113th Position out of 5042 Participants in HackerEarth's Pride Month Challenge.

Link to Competition

Try the app. It's live here.. http://teath.herokuapp.com

Wanna try on your own and make changes?
Feel free to fork/clone!
Follow these steps.

  1. Clone the repository git clone https://github.com/rohankokkula/teath.git
  2. For Localhost:
    1. pip install -r requirements.txt
    2. Install tesseract executable from https://github.com/UB-Mannheim/tesseract/wiki,
      keep an eye on installation path.
    3. Open app.py, change tesseract_cmd path to installation path's executable file(eg. C:\Users\rohan\AppData\Local\Tesseract-OCR\tesseract.exe)
    4. Run cmd in current folder and enter streamlit run app.py
    5. App will be deployed at localhost:8501(mostly)
  3. For Heroku Deployment:
    1. Create Heroku account and Install Heroku CLI: https://devcenter.heroku.com/articles/heroku-cli#download-and-install
    2. Create APP from Heroku Dashboard.
    3. Go to your APP dashboard settings on heroku website and in the buildpacks URL,
      enter https://github.com/heroku/heroku-buildpack-apt
      Now, Reveal Config vars and add
      KEY: TESSDATA_PREFIX
      VALUE:./.apt/usr/share/tesseract-ocr/4.00/tessdata
    4. Run cmd in current folder and enter heroku login( Logging into your account)
    5. After successful login, follow these steps:
    git add .
    git commit -am "First commit"
    heroku git:remote -a app-name
    git push heroku master
    
    1. App will be deployed at app-name.herokuapp.com

References:

  1. Streamlit: https://docs.streamlit.io/en/stable/api.html
  2. OpenCV Thresholding: https://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html
  3. Pytesseract: https://pypi.org/project/pytesseract/
  4. TextBlob: https://pypi.org/project/textblob/
  5. Tesseract deployment on Heroku using Flask: https://towardsdatascience.com/deploy-python-tesseract-ocr-on-heroku-bbcc39391a8d

About

WebApp for Extracting Text from Images and applying threshold with the help of pytesseract and OpenCV.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published