kraken-gaza-iliad

https://ryanfb.github.io/kraken-gaza-iliad/groundtruth/

This is a project for generating an edition-specific OCR training file for Kraken for Theodorus Gaza's Attic paraphrase of the Iliad. By using the facing pages of the Iliad edition that are printed in the some font, we can quickly generate ground truth which can then (it is hoped) be used to train a model which can accurately OCR the Attic paraphrase.

See also: kraken-gaza-batrachomyomachia, kraken-voulgaris-aeneid

Data

The following Google Books volumes were used as source data:

Page images were extracted with pdfimages, Google logos were discarded, and the pages were automatically renamed. Images are available here: https://github.com/ryanfb/kraken-gaza-iliad/releases/download/v1.0.0/gazapng.zip

Training

Run make, or override defaults with e.g.

USE_DOCKER=false CUDA_DEVICE=cuda:0 make

Trained model

Two trained OCR models are provided:

gaza_best_nfd.mlmodel - trained using NFD normalization (Unicode canonical decomposition, i.e. accents and characters are treated as separate glyphs)
gaza_best_nfc.mlmodel - trained using NFC normalization (Unicode canonical composition, i.e. accented characters are treated as a single glyph)

Each of these normalization techniques has different accuracy tradeoffs for Ancient Greek. Ideally, we could combine the output of both for greater combined accuracy.

OCR Results

OCR results are available in hOCR format in the hocr-nfd and hocr-nfc directories. You can also browse the results here:

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
groundtruth		groundtruth
hocr-nfc		hocr-nfc
hocr-nfd		hocr-nfd
lines		lines
Makefile		Makefile
README.md		README.md
gaza_best_nfc.mlmodel		gaza_best_nfc.mlmodel
gaza_best_nfd.mlmodel		gaza_best_nfd.mlmodel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groundtruth

groundtruth

hocr-nfc

hocr-nfc

hocr-nfd

hocr-nfd

lines

lines

Makefile

Makefile

README.md

README.md

gaza_best_nfc.mlmodel

gaza_best_nfc.mlmodel

gaza_best_nfd.mlmodel

gaza_best_nfd.mlmodel

Repository files navigation

kraken-gaza-iliad

Data

Training

Trained model

OCR Results

About

Releases 1

Packages

Languages

ryanfb/kraken-gaza-iliad

Folders and files

Latest commit

History

Repository files navigation

kraken-gaza-iliad

Data

Training

Trained model

OCR Results

About

Resources

Stars

Watchers

Forks

Languages