A Streamlit-Heroku WebApp for Extracting Text from Images and applying various thresholding methods using Pytesseract and OpenCV.
Try the app. It's live here.. http://teath.herokuapp.com
Wanna try on your own and make changes?
Feel free to fork/clone!
Follow these steps.
- Clone the repository
git clone https://github.com/rohankokkula/teath.git
- For Localhost:
pip install -r requirements.txt
- Install tesseract executable from https://github.com/UB-Mannheim/tesseract/wiki,
keep an eye on installation path. - Open app.py, change tesseract_cmd path to installation path's executable file(eg. C:\Users\rohan\AppData\Local\Tesseract-OCR\tesseract.exe)
- Run cmd in current folder and enter
streamlit run app.py
- App will be deployed at localhost:8501(mostly)
- For Heroku Deployment:
- Create Heroku account and Install Heroku CLI: https://devcenter.heroku.com/articles/heroku-cli#download-and-install
- Create APP from Heroku Dashboard.
- Go to your APP dashboard settings on heroku website and in the buildpacks URL,
enterhttps://github.com/heroku/heroku-buildpack-apt
Now, Reveal Config vars and add
KEY:TESSDATA_PREFIX
VALUE:./.apt/usr/share/tesseract-ocr/4.00/tessdata
- Run cmd in current folder and enter
heroku login
( Logging into your account) - After successful login, follow these steps:
git add . git commit -am "First commit" heroku git:remote -a app-name git push heroku master
- App will be deployed at app-name.herokuapp.com
References:
- Streamlit: https://docs.streamlit.io/en/stable/api.html
- OpenCV Thresholding: https://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html
- Pytesseract: https://pypi.org/project/pytesseract/
- TextBlob: https://pypi.org/project/textblob/
- Tesseract deployment on Heroku using Flask: https://towardsdatascience.com/deploy-python-tesseract-ocr-on-heroku-bbcc39391a8d