Skip to content
TED (TED Enhances Digitization) - Software to facilitate OCR on incunables.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
_ma-thesis
basics
boxes
edge_detection
erosion
filters
gui
nnkohonen
nnmlp
thresholding
.gitignore
LICENSE
README.md
main.cpp

README.md

TED (TED Enhances Digitization)

I developed TED for my 2008 submitted Magister Artium thesis "Zur Erweiterungsfähigkeit bestehender OCR Verfahren auf den Bereich extrem früher Drucke" in which I facilitated Optical Character Recognition (OCR) on the digital images of incunables from the project "Verteilte Digitale Inkunabelbibliothek".

The character recognition process is based on a Self Organizing Map (SOM / Kohonen-Map) which works with digital images, intensively prepared by the following operations:

  • Image conversion
  • Binarization (many different algorithms: simple binarization by threshold to Otsu's Method)
  • Median and kFill filtering
  • Automatically cutting and deskewing of the image
  • Edge detection
  • Object / glyph isolation and recognition
  • Clustering of isolated glyphs with self organizing map
You can’t perform that action at this time.