Optical character recognition software (EPITA S3)
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cache
dataset
doc
inputs
src
.gitignore
.glib.suppression
AUTHORS
Makefile
README.md

README.md

OCR

Optical character recognition software written in C.

Requirements

  • gcc
  • GTK3
  • SDL2
  • SDL2_image

If you plan to train the network you will also need:

  • python
  • xelatex
  • pdftoppm

Build

make

Usage

./ocr

Training

  1. Generate the custom dataset used to train the neural network
    cd dataset
    # This can take a while
    ./generate_dataset.sh
  2. (optional) Adjust training parameters in src/ocr_train.c
  3. Use the --train option when launching the OCR
    ./ocr --train

This will output the neural network in output/ocr_network_eX after each epoch.

You can save the pre-computed dataset to avoid wasting time before each training (see details about this in src/ocr_train.c).