GitHub is home to over 40 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
Train tesseract 4 with make
Collection of OCR-related python tools and wrappers from @OCR-D
OCR-D-compliant page segmentation
Run tesseract with the tesserocr bindings with @OCR-D's interfaces
Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
Website for OCR-D specs, formats, requirements
Test data for testing specs and software in @OCR-D
Wrapper for the kraken OCR engine
Simple character-based language model using keras
Converters for various file formats used for representing OCR
Create PAGE-XML Ground Truth from DTABf TEI
Workflows for OCR-D powered by Taverna.
OCRD CLI to ocropy
PAGE XML format collection for document image page content and more
Microservice to manage the data and metadata of the OCR-D data. It provides read/write/update metadata (XML), registering XSD, validate XML and indexing of metadata.
A repository for online OCRD training infrastructure.
OCR-D guidelines for Ground Truth production
Python-based tools for document analysis and OCR
Slides for the OCR-D talk at the Bibliotheca Baltica 2018 symposium in Rostock
Generating user docs for OCR-D from Markdown with DITA
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
a Python library to communicate with an instance of Phil Harvey's excellent ExifTool command-line application.
Slides for the OCR-D presentation at PhilTag 2018