GitHub is home to over 31 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
Collection of OCR-related python tools and wrappers from @OCR-D
Simple character-based language model using keras
Run tesseract with the tesserocr bindings with @OCR-D's interfaces
Website for OCR-D specs, formats, requirements
OCR-D guidelines for Ground Truth production
Test data for testing specs and software in @OCR-D
Workflows for OCR-D powered by Taverna.
Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
Train tesseract 4 with make
Converters for various file formats used for representing OCR
Microservice to manage the data and metadata of the OCR-D data. It provides read/write/update metadata (XML), registering XSD, validate XML and indexing of metadata.
Python-based tools for document analysis and OCR
OCRD CLI to ocropy
Wrapper for the kraken OCR engine
PAGE XML format collection for document image page content and more
Slides for the OCR-D talk at the Bibliotheca Baltica 2018 symposium in Rostock
Generating user docs for OCR-D from Markdown with DITA
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
a Python library to communicate with an instance of Phil Harvey's excellent ExifTool command-line application.
Slides for the OCR-D presentation at PhilTag 2018
Taverna Workflow Beispiel zu Demonstrationszwecken
Slides for the OCR-D presentation at the Transkribus User Conference 2017 in Vienna
Abstract and slides for the OCR-D talk at the workshop “Geisteswissenschaftliche Forschungsdaten. Methoden zur digitalen Erfassung, Aufbereitung und Präsentation”