GitHub - qbeer/simple-ocr: Example optical character recognition baseline task for physdm course at ELTE

Simple OCR for synthetically generated text images

This repository contains code for training and evaluating a simple OCR system on synthetically generated text images. The code is written in Python 3 and uses PyTorch as the deep learning framework. The data is hosted on Kagge.

Example images

Image	Label
	`60`
	`789`
	`72103`

We assume that the data is stored in the data directory. The data can be downloaded from Kaggel. The data should be extracted into the data directory. The directory structure should look like this:

data
├── 2
│   ├── train
│   │   ├── 0.jpg
│   │   ├── 1.jpg
...
│   │   └── 9999.jpg
│   └── test
│       ├── 0.jpg
│       ├── 1.jpg
...
│       └── 9999.jpg
├── 3
...
├── 5
...
├── train.csv
└── test.csv

Task

The task is to train a model that can recognize the text in the images. The model should take an image as input and output the corresponding text. The text is always a sequence of digits and the length of the sequence is between 2 and 5. The model should be trained and validated then evaluated on the held-out test set. The evaluation metric is the word error rate (WER).

Resources

Environment setup

This project uses Conda to manage the environment. The following commands can be used to create and activate the environment.

conda create -n ocr python=3.9
conda activate ocr
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install lightning -c conda-forge
conda install matplotlib numpy pandas
conda install jupyter notebook
conda install scikit-learn scipy
conda install tensorboard

You can also use the environment.yml file to create the environment.

conda env create -f environment.yml

Training

Training is done using the train_baseline.ipbynb notebook. See the notebook for details.

Logging

The training logs are saved in the logs directory. The logs can be visualized using TensorBoard.

tensorboard --logdir logs

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figures		figures
src		src
.gitignore		.gitignore
README.md		README.md
train_baseline.ipynb		train_baseline.ipynb
train_lstm.ipynb		train_lstm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

src

src

.gitignore

.gitignore

README.md

README.md

train_baseline.ipynb

train_baseline.ipynb

train_lstm.ipynb

train_lstm.ipynb

Repository files navigation

Simple OCR for synthetically generated text images

Example images

Task

Resources

Environment setup

Training

Logging

About

Languages

qbeer/simple-ocr

Folders and files

Latest commit

History

Repository files navigation

Simple OCR for synthetically generated text images

Example images

Task

Resources

Environment setup

Training

Logging

About

Resources

Stars

Watchers

Forks

Languages