Skip to content

sengkyaut/MyOCR

Repository files navigation

MyOCR

Some scripts that I copy from https://github.com/astutejoe/tesseract-tutorial and modify for shan language training (Myanmar sub language) .

Require / Sources

Installation (for training)

Clone langdata

git clone https://github.com/tesseract-ocr/langdata_lstm.git

Using install script

git clone https://github.com/sengkyaut/MyOCR.git
chmod +x MyOCR/install/install.sh
./MyOCR/install/install.sh

Using Docker

How to install Docker

curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

Run Docker (recommend)

git clone https://github.com/tesseract-ocr/langdata_lstm.git
cd langdata_lstm
sudo docker pull sengkyaut/t4cmp:latest
sudo docker run --rm -d -p 2222:22 -v ${PWD}:/langdata_lstm --name skt4cmp sengkyaut/t4cmp

or

git clone https://github.com/sengkyaut/MyOCR.git
cd MyOCR
docker-compose up

login to docker container

ssh root@localhost -p 2222 password is toor

  • tesserect source repo and MyOCR repo located in /root/workspace
  • langdata_lstm repo also mounted in /langdata_lstm

To do

  • Prepare training data
  • Retrain or Finetune_plusminusmyanmarmodel with full shan charsets
  • Check percent and Finetune other shan fonts

About

Tesseract OCR Train Script

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published